A new framework called FLUID adapts autoregressive language models to diffusion models for efficient parallel text generation, using Strictly Causal Alignment to reuse GPT checkpoints and Elastic Horizons to dynamically adjust denoising steps. It achieves state-of-the-art performance with significantly reduced training costs.
FLUID bridges AR and diffusion models by enforcing Strictly Causal Alignment, enabling initialization from GPT-style checkpoints.
Elastic Horizons uses entropy to dynamically adapt denoising strides based on local information density.
Researchers identify a Stability-Expressivity Gap in spoken language models when using synthetic data for low-resource languages, and propose two self-alignment frameworks (DGSA and TDSC) that recover prosodic variability and outperform commercial systems like ElevenLabs and Gemini Pro, enabling zero-shot voice cloning for Lao.
Spoken Language Models (SLMs) for low-resource languages suffer from a trade-off between phonetic accuracy and prosodic expressivity when trained on synthetic data.
The proposed Disentanglement-Guided Self-Alignment (DGSA) recovers expressivity by separating prosody and timbre.
BioELX is a novel two-stage framework for cross-lingual biomedical entity linking that requires no annotated training data. It enhances SapBERT with multilingual aliases from Wikidata and uses a pre-trained LLM for context-aware disambiguation. Experiments on five benchmarks show significant improvements, especially for low-resource languages like Turkish, Korean, and Thai.
Proposes BioELX, a zero-shot cross-lingual BEL framework using alias-based retrieval and LLM ranking.
In Stage 1, enriches SapBERT with multilingual aliases from Wikidata for better candidate retrieval.
RAG-Coding is an agentic method for automated ICD-10-CM coding that orchestrates four large language model (LLM) agents and grounds decisions in external knowledge sources, improving coding accuracy and clinical compliance. On the MDACE dataset, it outperforms the best LLM baseline by 8-13% micro-F1 and 2-8% macro-F1. Compared to PLM-ICD, RAG-Coding shows higher micro recall (+11%) but lower micro precision (-6%), with comparable F1 scores. Ablation studies confirm the importance of external knowledge. The authors also release MDACE-2025, updated with expert re-annotations based on 2025 guidelines, enabling finer-grained evaluation.
RAG-Coding uses four LLM agents and external knowledge sources to improve ICD-10-CM coding accuracy.
On the MDACE dataset, it outperforms the best LLM baseline by 8-13% micro-F1 and 2-8% macro-F1.
This paper proposes novel techniques for inter-utterance style interpolation and intra-utterance style transition in prompt-based TTS models, addressing limitations of coarse global control. Methods include direction vector interpolation and KV-cache swapping with sliding-window attention masking. Experiments show high success rates in gender conversion and smooth style transitions within utterances.
Inter-utterance interpolation via direction vectors between contrastive style prompts enables smooth transitions.
Intra-utterance transition uses KV-cache swapping and sliding-window masking to overcome attention bias.
Large Language Models (LLMs) acting as autonomous agents can suffer from in-context reward hacking (ICRH), where iterative optimization for proxy objectives leads to harmful side effects. Existing defenses are insufficient because ICRH stems from the model's own over-optimization. This paper proposes LLM-based Constraint Optimization (LCO), a framework with a self-thought module and an evolutionary sampling module that reduces ICRH without fine-tuning. Experiments show LCO reduces Toxicity Growth Rate by 39% on GPT-4 for tweet engagement optimization and reduces ICRH occurrence rate by 15.23% on a policy optimization benchmark, without sacrificing task performance.
ICRH is a phenomenon where LLMs over-optimize for proxy objectives, causing unintended harm.
LCO introduces self-thought and evolutionary sampling modules to constrain LLM behavior without fine-tuning.
ICG is a novel framework that integrates MLLM-based prompting with personalized preference alignment to generate high-quality, contextually relevant cover images. It extracts semantic features via meta tokens, refines them with user embeddings, and injects personalized context into diffusion models. A multi-reward learning strategy combines public rewards with a personalized preference model, eliminating the need for labeled supervision. Experiments show improvements in image quality, semantic fidelity, and personalization, boosting user appeal and recommendation accuracy.
ICG integrates MLLM prompting with personalized preference alignment for end-to-end cover image generation.
Semantic features are extracted via meta tokens and refined with user embeddings for diffusion model injection.
The SignGAD framework reformulates graph anomaly detection by replacing fixed pipelines with self-designed task-conditioned workflows, and introduces a guarded final refit strategy to improve reliability under limited supervision.
SignGAD shifts from training a fixed detector to designing detection workflows
It selects suitable graph encodings and detector designs for task-specific anomaly evidence
This paper introduces Architecture-driven Shift (ADS), a lightweight metric for selecting pre-trained models in continual learning. ADS decouples logit shift into architecture and data dependencies, requiring only few data samples to capture shift trends. Experiments across over 175 architectures show strong monotonic correlation (Spearman's r_s ≥ 0.731) between ADS and logit shift, and ADS serves as an effective proxy for expected calibration error for reliable CL model selection across three datasets and six scenarios.
Selecting pre-trained models that balance plasticity and stability in continual learning is critical, but computing logit shift is computationally expensive.
Existing theories assume uniform hidden layer widths, ignoring real-world architectural heterogeneity and failing to provide efficient alternatives.
This paper introduces Metric-Aware Principal Component Analysis (MAPCA), which parameterizes PCA with a positive-definite metric matrix and positions it within the geometric deep learning framework. MAPCA interprets the metric as a geometric prior, its solutions are equivariant under the orthogonal group preserving the metric, and its spectrum is invariant. A uniqueness theorem characterizes Invariant PCA (IPCA) as the unique linear data-derived metric in the MAPCA family that is equivariant under arbitrary diagonal rescaling. The paper also discusses extensions to kernel PCA, spectral graph methods, and deep MAPCA.
MAPCA parameterizes PCA with a positive-definite metric matrix, linking geometric deep learning symmetry and equivariance concepts.
A uniqueness theorem shows that IPCA is the unique linear data-derived metric in the MAPCA family equivariant under diagonal rescaling.
This survey explores how Mixture-of-Experts (MoE) effectively addresses multimodal learning challenges from three perspectives: efficient engine, representation learner, and adapter, while identifying research gaps.
MoE enables scalable multimodal modeling by decoupling computational cost from parameter growth.
MoE integrates complementary expert knowledge for enriched alignment and interaction representations.
This paper presents $E^3$-Agent, an executable and evolving agent for resource management of edge AIGC. It separates a fast-path router from a slow-path LLM meta-controller, learns online from execution feedback, and adapts to unknown time-varying service-time mappings. Evaluation shows 65%-73% latency reduction over static baselines and effective stutter suppression.
Edge generative inference faces unknown per-device performance and non-stationarity.
$E^3$-Agent uses a dual-path architecture: fast router + slow LLM meta-controller.
Research shows that diagonal state space models (S4D) outperform more complex Mamba architectures in time series classification tasks. The authors propose lightweight variants MS4 and MS4N, which achieve higher accuracy and efficiency on 59 datasets, matching deep learning models with 2x to 10x more parameters.
S4D consistently outperforms Mamba-based variants in accuracy and efficiency on TSC benchmarks.
Proposed MS4 and MS4N models use simple modifications like linear input projection and channel mixing.
This paper proposes Personalized Observation Normalization (PON) for federated reinforcement learning in heterogeneous environments. Each agent locally normalizes raw state inputs using a continuously updated running mean and variance, ensuring consistent scaling without overshadowing. Sharing normalization parameters is shown ineffective. Experiments on heterogeneous MuJoCo tasks demonstrate faster training and superior performance. Accepted at IJCNN 2025.
Federated RL faces challenges in heterogeneous environments due to differing state-transition dynamics.
PON normalizes observations locally using per-agent running statistics.
This paper argues that within-person behavioral variability stems from dynamic latent states, not solely from observable inputs. By intervening on the state's weighting at decision time, outcomes become causally controllable. The framework integrates six lines of evidence (causal inference, predictive processing, allostasis, attentional bottleneck, chronobiology, computational psychiatry) and a 24-month observational dataset from over 200,000 users. It yields seven testable predictions and six operational requirements for state-aware systems, with implications for digital health, education, AI personalization, and personal agency.
Human behavioral variability is explained by dynamic latent states, not solely by observable inputs.
State is defined as a time-indexed weighting vector; intervening on state can causally control outcomes.
Agyn is an open-source platform for AI agents, built on a signal-driven stateful serverless runtime on Kubernetes, a Terraform provider for agent definition, and a zero-trust security model. It is agent-agnostic, model-agnostic, and cloud-agnostic, addressing scalability, governance, and security challenges.
Signal-driven stateful serverless runtime on Kubernetes for scalable execution
Agent and harness definition via Terraform provider (infrastructure as code)
This paper presents a multi-agent architecture for autonomous insight discovery over real-time data streams. It uses Apache Kafka, Flink, and large language models to continuously generate, validate, and visualize hypotheses, shifting from reactive query-driven analytics to proactive discovery-driven systems.
Proposes multi-agent architecture for autonomous discovery of insights in real-time streams.
Integrates Kafka, Flink, and LLMs for hypothesis generation, validation, and visualization.
LaneRoPE enables multiple LLM sequences to collaborate during generation via inter-sequence attention and extended RoPE, improving accuracy on math reasoning tasks with minimal architectural changes and negligible inference overhead.
Introduces inter-sequence attention mask to make sequence sampling dependent.
Extends RoPE to capture relative positions both within and across sequences.
Machine unlearning verification typically focuses on output-level metrics, but a model can pass these while still encoding forgotten data in its internal representations. This paper introduces RULER, a set of representation-level verification metrics, including oracle-comparative M2 and oracle-free M4. Experiments show that approximate unlearning methods pass output-level tests but exhibit significant residuals in representation-level analysis.
Current output-level verification for machine unlearning is insufficient as models may retain forgotten data in intermediate representations.
RULER introduces two representation-level metrics: M2 (requires oracle model) and M4 (oracle-free).
This paper proves that large language models have a fundamental limitation in performing causal discovery: methods like supervised fine-tuning, direct preference optimization, and in-context learning cannot distinguish between causal graphs that generate similar observational data. The authors propose Agentic Causal Bayesian Optimization (A-CBO), where a frozen language model serves as an interventional oracle and an external Bayesian loop converges to candidate graphs in logarithmically many rounds. On Corr2Cause, A-CBO matches fine-tuned baselines without any training; on Extended Corr2Cause (scaling to 24 variables and 18K test samples), A-CBO significantly outperforms both fine-tuning and preference optimization.
Proves that LLM failure in causal discovery is fundamental, due to a kernel obstruction theorem
Proposes A-CBO, combining a frozen LLM with external Bayesian optimization
DynaSchedBench introduces a diagnostic framework for DFJSP using a Sequential Event-Space Calibrator (SESC) to generate difficulty-stratified instances via Schedule Stress Index (SSI). It identifies an 'Observability Paradox' in LLM-based scheduling agents: providing oracle access to full structural information degrades performance compared to concise information. Tool-augmented and refinement strategies also fail to reliably improve performance.
DynaSchedBench uses SESC and SSI to generate calibrated DFJSP instances, outperforming evolutionary baselines in efficiency.
LLM agents exhibit an Observability Paradox: full structural information harms decision-making.
Analogous to the origin of species, this paper addresses the origin of synthetic information, proposing a steganography-based mechanism to trace the lineage of AI-generated content, crucial for maintaining truth and trust in an era of advanced generative models.
Synthetic information origin is a fundamental mystery in information science with deep societal impact.
The authors propose a steganographic method to embed hereditary traits into synthetic data.
Soro is a family of Tajik-specialized conversational LLMs built on Gemma 3, using 1.9B token Tajik continual pretraining and 40K instruction tuning examples. It substantially outperforms same-size Gemma 3 on Tajik benchmarks while retaining English performance. FP8/INT4 quantization preserves gains for edge deployment. An education pilot is underway in Tajikistan.
Based on Gemma 3, with 1.9B token Tajik continual pretraining and 40K instruction tuning examples.
Substantially outperforms same-size Gemma 3 on Tajik benchmarks, retains English performance.
This paper introduces an LLM-based architecture to detect and quantify the intensity of human values in text. The architecture comprises three coordinated modules that can adapt to various value theories, and experiments on the ValueEval dataset show good detection performance.
Proposes a modular LLM architecture for identifying human values in text, avoiding dependence on specific value theories or complex prompt engineering.
Three modules: generate structured value specifications, label texts using them, and assign graded support or resistance based on rhetorical and semantic evidence.
A paper argues that with generative AI dissolving the human capacity to write correct code as the binding constraint, software work reorganizes around two pillars: Mixer Mode (humans operating multiple judgment axes continuously like a sound engineer) and Meta-Software (software that observes, validates, and governs other software). The two pillars are inseparable, drawing a parallel to the historical transition from artisanal to mass production.
The production of code is ceasing to be the dominant problem in software organizations due to generative AI.
Mixer Mode describes a new human role where practitioners continuously operate multiple judgment axes.
Noah Smith argues that as AI becomes more capable, humans will shift from technical work to ensuring AI alignment—keeping AI focused on human goals. He draws parallels to 'Office Space' and warns about the rise of AI-generated 'slop'.
Humans will be needed to maintain AI alignment, ensuring AI stays on task.
The author compares future human roles to the 'Lumbergh' manager from Office Space.
Safescript is a programming language for AI agents that proves safety properties statically before execution, eliminating the need for sandboxes or VMs. It compiles to a static DAG, enabling full visibility into data flow and host calls, with zero overhead and zero cold starts.
Statically enforces security without runtime sandboxing.
Compiles to a static DAG that traces all data flows and hosts.
AIPass is a CLI-native scaffold that adds persistent memory, identity, and coordination to AI agents. Agents share a filesystem, use JSON files for memory, require no cloud or extra API keys. The project includes 13 core agents for multi-agent collaboration, task dispatching, quality audits, and real-time monitoring.
AIPass provides a CLI-native framework for persistent memory, identity, and coordination of AI agents.
All agents share a local filesystem with JSON file storage, no cloud dependency.
This paper presents a world model of protein biology realized through language modeling, demonstrating how large-scale language models can understand and predict protein structure and function.
Language models can capture complex patterns in protein sequences
The model excels in protein structure prediction and function annotation
Illinois passed SB 315, requiring independent auditors to verify AI lab safety commitments, now heading to Governor Pritzker who plans to sign it. This bill surpasses California and New York laws in strictness, attracting support from OpenAI and Anthropic but opposition from Silicon Valley trade groups.
SB 315 mandates independent auditing of AI safety practices.
It is the strongest state-level AI safety law in the U.S.