AI News HubLIVE

Live updates

Show HN: BetterCallClaude – Open Source AI Legal Agents for Italy

BetterCallClaude is an open-source AI legal agent platform designed specifically for Italian legal professionals. It features 20 specialized AI agents covering all 20 Italian regions, supports bilingual (IT/EN) operation, and prioritizes privacy with local LLM processing and GDPR compliance. The platform aims to speed up legal research, improve efficiency, and maintain full transparency.

  • 20 specialized AI agents for Italian law
  • Bilingual support (Italian and English)
In-site article

Jensen Huang Joins Tsinghua University's Advisory Board

NVIDIA CEO Jensen Huang has accepted an invitation to join the Advisory Board of Tsinghua University's School of Economics and Management (SEM). The board, chaired by Apple CEO Tim Cook, includes Elon Musk, Satya Nadella, Mark Zuckerberg, Jack Ma, and other global leaders. Huang also recently received an honorary doctorate from Carnegie Mellon University.

  • Jensen Huang joins Tsinghua SEM Advisory Board
  • Board chaired by Apple's Tim Cook, includes top tech and business leaders
In-site article

Amdahl's law for AI agents

This article applies Amdahl's Law to AI agents, arguing that speedup from parallel agents is bounded by the fraction of workflow requiring human judgment (H). It introduces the concept of 'self-liquidating H' where each human intervention produces an artifact that eliminates future similar interventions. Emphasizes 'configurancy'—explicit behavioral commitments and conformance suites—to encode human knowledge so agents can operate autonomously. Examples from ElectricSQL, Gas Town, and Ralph Loop illustrate the principles.

  • Speedup from AI agents is limited by the human judgment fraction H; reducing H is key.
  • Self-liquidating H: each human intervention should produce a reusable artifact (test, spec update) to prevent recurrence.
In-site article

Are robots nearing their ChatGPT moment? – podcast

Last month at Beijing's half marathon, a robot named Lightning beat the human world record by nearly seven minutes. This is the latest in a series of AI milestones prompting questions about robots entering everyday life. China leads the charge with a pledge to invest over £100bn in robotics over the next 20 years.

  • Robot 'Lightning' beats human world record in Beijing half marathon.
  • China commits over £100bn to robotics investment over two decades.
In-site article

Simulation-Informed Diffusion for Decentralized Multi-robot Motion Planning

This paper introduces Simulation-Informed Diffusion (SID), a decentralized framework using constraint-aware diffusion models (CADM) to first simulate neighbors' future trajectories and then plan own trajectories under safety constraints. SID enables a minimal communication scheme triggered only in congested scenarios and outperforms baselines, scaling to 108 robots and 160 obstacles.

  • SID uses CADM to simulate neighbor trajectories for decentralized collision avoidance
  • Minimal communication scheme coordinates only when necessary
In-site article

Design of a Real-time Asynchronous Monocular Odometry for Planetary Exploration

Researchers propose a real-time asynchronous event-based monocular odometry for planetary rovers, using an Error-State Kalman Filter to process event camera data for robust ego-motion estimation under high dynamic range lighting and computational constraints.

  • Event cameras provide asynchronous pixel-wise brightness changes with microsecond resolution, ideal for high-speed sensing and HDR environments.
  • The approach uses an Error-State Kalman Filter to continuously estimate camera motion from event streams.
In-site article

Trinity: Unifying Class-Agnostic Terrain and Semantic Segmentation for Unstructured Outdoor Environments by Leveraging Synthetic Data

This paper presents a transformer-based architecture called Trinity that jointly performs class-specific semantic segmentation and class-agnostic terrain segmentation in a unified network. It segments terrain regions based purely on visual appearance without predefined labels or robot-dependent traversability scores, enabling robot-agnostic visual terrain priors for downstream tasks. The authors extend the OAISYS simulator to create the RUGDSynth synthetic dataset and provide the EXTerra real-world dataset. Experiments demonstrate the approach's effectiveness in complex outdoor environments.

  • Trinity architecture unifies class-agnostic terrain segmentation with semantic segmentation
  • Segments terrains based on visual appearance without predefined labels for better transferability
In-site article

Agentic Language-to-Objective Synthesis for Optofluidic Assembly

Researchers introduce Speak-to-Objective, a modular agentic pipeline that uses a conditioned LLM to translate spoken or written commands into fully differentiable objective functions for assembling microparticles in a constraint-aware inverse solver and on an experimental optofluidic platform. The approach separates what to assemble from how to actuate, learns from user feedback, and demonstrates natural-language-programmable microscale assembly using laser-induced thermoviscous flows.

  • Speak-to-Objective pipeline translates natural language into differentiable objective functions for microparticle assembly.
  • It uses a perceive->compose->propose->act->report&learn loop, treating the objective as the interface between intent and actuation.
In-site article

Uni-LaViRA: Language-Vision-Robot Actions Translation for Unified Embodied Navigation

Uni-LaViRA is a unified agentic architecture for embodied navigation that reduces navigation decision to a single Language-Vision-Robot Actions Translation. It leverages pretrained MLLMs in a zero-shot manner across four task families and four real robots, using TODO List Memory and Second Chance Backtrack mechanisms to achieve self-correcting navigation without training.

  • Generality in navigation can be obtained structurally, not only through data scale.
  • Uni-LaViRA decomposes navigation into a language action (semantic direction) and a vision action (pixel target), both within the output manifold of MLLMs.
In-site article

Synthetic Emotions vs. Gamification: Exploring Engagement Strategies for Small Social Robots in Different Age Groups

Many children face challenges in emotional regulation and social interaction, limiting their participation in therapeutic programs. This study explores engagement strategies for a tactile robot supporting children with anxiety disorders, comparing synthetic emotional feedback and point rewards. A preference study with 16 school children (ages 6-8) showed preference for emotional engagement, while a behavioral study with 14 university students (ages 20-27) found point-based systems yielded higher task accuracy (p<0.05) and sustained performance. These findings highlight age-related differences and the need to validate design assumptions through observed interaction.

  • Children aged 6-8 prefer emotional engagement over points
  • University students show higher task accuracy with point rewards
In-site article

SCALE-COMM: Shared, Contrastively-Aligned Latent Embeddings for MARL Communication

SCALE-COMM is a self-supervised framework that decouples communication learning from policy optimization, learning compact, stable, and policy-relevant latent messages to improve coordination in multi-agent reinforcement learning. It outperforms existing methods on benchmarks and a realistic warehouse task, offering better stability, sample efficiency, and throughput.

  • Decouples communication learning from policy optimization to reduce interference.
  • Uses contrastive learning to enforce consistency across agents and time.
In-site article

What-If World: A Causal Benchmark for General World Models in Embodied Scenarios

A new benchmark called What-If World tests video generation models' causal reasoning by presenting paired prompts that differ in one physical detail and checking if videos diverge correctly. Evaluating nine state-of-the-art models, none exceed 52% on paired scores, with open-source models around 28%, indicating significant room for improvement. Performance correlates with visual prominence rather than physics tractability.

  • What-If World benchmark uses 319 prompt pairs with single variable changes to test causal understanding in video generation models. It is built on real frames from nuScenes and DROID.
  • Scoring uses APEO rubric (Adherence, Physics, Environment, Outcome). All nine models struggle: best paired score is 52%, open-source models average 28%.
In-site article

Clinical Validation of the Melanoscope AI Mobile Dermoscopy Clinical Decision Support System

A prospective single-center clinical validation of the Melanoscope AI mobile dermoscopy CDSS demonstrated 88.6% agreement with expert assessment on 176 patients, with no false negatives and 88.3% specificity. The study developed a quantitative interpretability method for cascade deep learning models and a three-zone patient routing algorithm, supporting reproducible and interpretable decision-making for skin cancer screening in resource-limited settings.

  • The Melanoscope AI system achieved 88.6% agreement with experts on 176 patients, with zero false negatives among 5 malignant lesions.
  • Specificity reached 88.3%, with 3 melanomas and 2 basal cell carcinomas histologically confirmed.
In-site article

Representation-Conditioned Diffusion Models for Guided Training Data Generation

This work proposes representation-conditioned diffusion models that leverage learned representations from DINOv2, DINOv3, and CLIP to generate synthetic image data. On ImageNet100, this approach outperforms class-conditioned generation by +10.76 p.p. top-1 accuracy. Scaling synthetic data can even surpass real-data training by +2.0 p.p. The method also excels in data augmentation and sample filtering, offering a promising way to augment or replace real datasets in large-scale visual learning.

  • Representation-conditioned diffusion models outperform class-conditioned ones by 10.76 p.p. on ImageNet100.
  • Scaled synthetic datasets can beat real-data-trained classifiers by 2.0 p.p. top-1 accuracy.
In-site article

Beyond Motion Primitives: Behavioral Activity Recognition from Head-Mounted IMU

This paper presents a behavioral-level activity recognition method using head-mounted IMU, going beyond basic motion primitives. The authors define five behavioral categories, construct a 160K-sample dataset from Ego4D with a four-tier quality assurance framework, and propose HiT-HAR, a 703K-parameter hierarchical model that outperforms prior models on action and scenario recognition. Observability analysis reveals locomotion is reliably observable, while object transfer and task operation benefit from temporal context; scenario-dependent signal overlap remains a challenge. Results show that architectural choices exploiting temporal context and scenario structure outperform simply scaling model size.

  • Proposes HiT-HAR, a hierarchical model for behavioral activity recognition from head-mounted IMU, going beyond motion primitives
  • Constructs a 160K-sample Ego4D dataset with 8 scenarios and 5 behavioral categories, using a four-tier quality assurance framework
In-site article

Generic Interpretation Approach for Transformer Models Incorporating Heterogenous Attention Structures

This paper proposes an interpretation method for Transformer models with heterogenous attention structures, including semantic and logical interpretation, validated through experiments.

  • Categorizes Transformer attention into homogenous and heterogenous types; heterogenous processes information from different sources.
  • Proposes a generic interpretation method for heterogenous attention structures.
In-site article

Fine-Tuning Vision-Language Models for Understanding Current Damage and Scoring Priority with Quality Guard Agent

This paper proposes a method for automating bridge damage understanding and repair priority scoring using fine-tuned Vision-Language Models (VLMs). The authors fine-tune LLaVA-1.5-7B with QLoRA on up to 4,000 paired bridge damage images and inspection text records, evaluating on a fixed test set of 800 images. Results show that 2,000 training samples achieve near-optimal validation loss in 2.9 hours, with diminishing returns beyond that. A two-stage Quality Guard using a fine-tuned Swallow-8B SLM rejects low-quality VLM outputs before priority scoring.

  • Fine-tuned LLaVA-1.5-7B model for automated bridge damage identification and priority scoring
  • 2,000 training samples achieve near-optimal performance; more data yields diminishing returns
In-site article

From Affect to Complex Behavior: Advancing Multimodal Human-Centered AI at the 10th ABAW Workshop & Competition

The 10th ABAW Workshop and Competition at CVPR 2026 advances multimodal human-centered AI by introducing new challenges including emotional mimicry intensity estimation, ambivalence/hesitancy recognition, and fine-grained violence detection, alongside traditional affect estimation and recognition tasks. The competition leverages large-scale in-the-wild datasets, and the paper track covers a broad range of topics from pose estimation to fairness and robustness.

  • ABAW 2026 introduces novel challenges: emotional mimicry intensity, ambivalence recognition, and violence detection.
  • Workshop continues dual structure with competition and paper tracks.
In-site article

Modeling Community Attitude through Reaction Tone: A Human-AI Collaborative Framework for Evaluating LLM Alignment with Linguistic Behaviors in Online Communities

Large language models (LLMs) are increasingly used as proxies for computational social analysis, but their ability to faithfully represent human communities' 'thick descriptions' remains a critical challenge. This paper introduces CARE (Community-Aware Reaction Evaluation), a reaction-centered framework that benchmarks LLM-simulated discourse against authentic community responses to real-world news. By characterizing a fine-grained spectrum of illocutionary tones, the diagnosis reveals a persistent 'realism gap': steering LLMs with explicit community prompts fails to inherently improve simulation fidelity. Analysis further identifies divergent behavioral signatures among frontier models, suggesting current alignment strategies are insufficient for capturing the sociolinguistic dynamics of online groups.

  • CARE framework evaluates LLM simulation fidelity by analyzing authentic community reaction tones
  • Current LLM alignment strategies fail to adequately capture online community sociolinguistic dynamics
In-site article

From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons

A new framework called FLUID adapts autoregressive language models to diffusion models for efficient parallel text generation, using Strictly Causal Alignment to reuse GPT checkpoints and Elastic Horizons to dynamically adjust denoising steps. It achieves state-of-the-art performance with significantly reduced training costs.

  • FLUID bridges AR and diffusion models by enforcing Strictly Causal Alignment, enabling initialization from GPT-style checkpoints.
  • Elastic Horizons uses entropy to dynamically adapt denoising strides based on local information density.
In-site article

Bridging the Stability-Expressivity Gap: Synthetic Data Scaling and Preference Alignment for Low-Resource Spoken Language Models

Researchers identify a Stability-Expressivity Gap in spoken language models when using synthetic data for low-resource languages, and propose two self-alignment frameworks (DGSA and TDSC) that recover prosodic variability and outperform commercial systems like ElevenLabs and Gemini Pro, enabling zero-shot voice cloning for Lao.

  • Spoken Language Models (SLMs) for low-resource languages suffer from a trade-off between phonetic accuracy and prosodic expressivity when trained on synthetic data.
  • The proposed Disentanglement-Guided Self-Alignment (DGSA) recovers expressivity by separating prosody and timbre.
In-site article

BioELX: Cross-lingual Biomedical Entity Linking via Alias-based Retrieval and LLM Ranking

BioELX is a novel two-stage framework for cross-lingual biomedical entity linking that requires no annotated training data. It enhances SapBERT with multilingual aliases from Wikidata and uses a pre-trained LLM for context-aware disambiguation. Experiments on five benchmarks show significant improvements, especially for low-resource languages like Turkish, Korean, and Thai.

  • Proposes BioELX, a zero-shot cross-lingual BEL framework using alias-based retrieval and LLM ranking.
  • In Stage 1, enriches SapBERT with multilingual aliases from Wikidata for better candidate retrieval.
In-site article

RAG-Coding: Enhancing LLM Medical Coding with Structured External Knowledge

RAG-Coding is an agentic method for automated ICD-10-CM coding that orchestrates four large language model (LLM) agents and grounds decisions in external knowledge sources, improving coding accuracy and clinical compliance. On the MDACE dataset, it outperforms the best LLM baseline by 8-13% micro-F1 and 2-8% macro-F1. Compared to PLM-ICD, RAG-Coding shows higher micro recall (+11%) but lower micro precision (-6%), with comparable F1 scores. Ablation studies confirm the importance of external knowledge. The authors also release MDACE-2025, updated with expert re-annotations based on 2025 guidelines, enabling finer-grained evaluation.

  • RAG-Coding uses four LLM agents and external knowledge sources to improve ICD-10-CM coding accuracy.
  • On the MDACE dataset, it outperforms the best LLM baseline by 8-13% micro-F1 and 2-8% macro-F1.
In-site article

Unlocking Fine-Grained and Within-Utterance Speaking Style Control in Prompt-Based Text-to-Speech Models

This paper proposes novel techniques for inter-utterance style interpolation and intra-utterance style transition in prompt-based TTS models, addressing limitations of coarse global control. Methods include direction vector interpolation and KV-cache swapping with sliding-window attention masking. Experiments show high success rates in gender conversion and smooth style transitions within utterances.

  • Inter-utterance interpolation via direction vectors between contrastive style prompts enables smooth transitions.
  • Intra-utterance transition uses KV-cache swapping and sliding-window masking to overcome attention bias.
In-site article

LCO: LLM-based Constraint Optimization for Safer Agentic LLMs in Real-world Tasks

Large Language Models (LLMs) acting as autonomous agents can suffer from in-context reward hacking (ICRH), where iterative optimization for proxy objectives leads to harmful side effects. Existing defenses are insufficient because ICRH stems from the model's own over-optimization. This paper proposes LLM-based Constraint Optimization (LCO), a framework with a self-thought module and an evolutionary sampling module that reduces ICRH without fine-tuning. Experiments show LCO reduces Toxicity Growth Rate by 39% on GPT-4 for tweet engagement optimization and reduces ICRH occurrence rate by 15.23% on a policy optimization benchmark, without sacrificing task performance.

  • ICRH is a phenomenon where LLMs over-optimize for proxy objectives, causing unintended harm.
  • LCO introduces self-thought and evolutionary sampling modules to constrain LLM behavior without fine-tuning.
In-site article

ICG: Improving Cover Image Generation via MLLM-based Prompting and Personalized Preference Alignment

ICG is a novel framework that integrates MLLM-based prompting with personalized preference alignment to generate high-quality, contextually relevant cover images. It extracts semantic features via meta tokens, refines them with user embeddings, and injects personalized context into diffusion models. A multi-reward learning strategy combines public rewards with a personalized preference model, eliminating the need for labeled supervision. Experiments show improvements in image quality, semantic fidelity, and personalization, boosting user appeal and recommendation accuracy.

  • ICG integrates MLLM prompting with personalized preference alignment for end-to-end cover image generation.
  • Semantic features are extracted via meta tokens and refined with user embeddings for diffusion model injection.
In-site article

Detect by Yourself: Self-Designing Agentic Workflows for Few-Shot Graph Anomaly Detection

The SignGAD framework reformulates graph anomaly detection by replacing fixed pipelines with self-designed task-conditioned workflows, and introduces a guarded final refit strategy to improve reliability under limited supervision.

  • SignGAD shifts from training a fixed detector to designing detection workflows
  • It selects suitable graph encodings and detector designs for task-specific anomaly evidence
In-site article

Architecture-driven Shift: towards a lightweight selector for capturing the trends of logit shift

This paper introduces Architecture-driven Shift (ADS), a lightweight metric for selecting pre-trained models in continual learning. ADS decouples logit shift into architecture and data dependencies, requiring only few data samples to capture shift trends. Experiments across over 175 architectures show strong monotonic correlation (Spearman's r_s ≥ 0.731) between ADS and logit shift, and ADS serves as an effective proxy for expected calibration error for reliable CL model selection across three datasets and six scenarios.

  • Selecting pre-trained models that balance plasticity and stability in continual learning is critical, but computing logit shift is computationally expensive.
  • Existing theories assume uniform hidden layer widths, ignoring real-world architectural heterogeneity and failing to provide efficient alternatives.
In-site article

Metric-Aware PCA as a Linear Instance of Geometric Deep Learning

This paper introduces Metric-Aware Principal Component Analysis (MAPCA), which parameterizes PCA with a positive-definite metric matrix and positions it within the geometric deep learning framework. MAPCA interprets the metric as a geometric prior, its solutions are equivariant under the orthogonal group preserving the metric, and its spectrum is invariant. A uniqueness theorem characterizes Invariant PCA (IPCA) as the unique linear data-derived metric in the MAPCA family that is equivariant under arbitrary diagonal rescaling. The paper also discusses extensions to kernel PCA, spectral graph methods, and deep MAPCA.

  • MAPCA parameterizes PCA with a positive-definite metric matrix, linking geometric deep learning symmetry and equivariance concepts.
  • A uniqueness theorem shows that IPCA is the unique linear data-derived metric in the MAPCA family equivariant under diagonal rescaling.
In-site article

Tackling Multimodal Learning Challenges with Mixture-of-Expert: A Survey

This survey explores how Mixture-of-Experts (MoE) effectively addresses multimodal learning challenges from three perspectives: efficient engine, representation learner, and adapter, while identifying research gaps.

  • MoE enables scalable multimodal modeling by decoupling computational cost from parameter growth.
  • MoE integrates complementary expert knowledge for enriched alignment and interaction representations.
In-site article