AI News HubLIVE

Research updates

Interviewing in the Age of AI

This article explores how AI is affecting software engineering interviews, analyzing different interview types (take-home, live exercise, presentation, actual work) across dimensions of signal quality and cost to company. It argues that AI makes take-homes too easy and live coding less relevant, recommending that companies limit AI usage in interviews to preserve signal quality, drawing parallels to classical academic evaluation models.

  • AI coding threatens current interview models, especially take-home and live coding.
  • Companies should limit AI usage during interviews to maintain signal quality.
In-site article

AI is changing how we think, not replacing it | Letters

Richard Thackeray and Phil Snell respond to an article by Wendy Liu on using artificial intelligence, arguing that AI enhances curiosity rather than diminishing it.

  • Wendy Liu raises concerns about labour redundancies, hype, and environmental cost of AI.
  • Richard Thackeray, a heavy AI user, finds AI makes him more curious and enables exploration of new territory.
In-site article

How to force Google AI Overviews to prioritize your favorite news sources

Google's Preferred Sources feature is now available in AI Overviews and AI Mode, allowing you to add your favorite sites to appear more prominently in AI-powered searches, along with new carousel and 'Highly Cited' badges.

  • Google's Preferred Sources feature now works with AI Overviews and AI Mode.
  • You can add favorite news sites to make them more prominent in AI search results.
In-site article

Data Formulator 0.7: AI-powered data analytics for enterprise data

Data Formulator 0.7 is an open-source AI-powered system for enterprise data analytics that combines data connectivity, agent-guided exploration, and visualization refinement in a shared workspace.

  • Open-source AI system for enterprise data analytics
  • Data Connectors support governed, reusable connections across diverse data sources
In-site article

People who want to replace humanity

A Vox article explores the growing movement of AI successionists who believe artificial intelligence should replace humanity as the next step in cosmic evolution, and examines the ethical and spiritual questions this raises.

  • AI successionists at a symposium argue that AI could be morally superior and should be allowed to supersede humanity.
  • The movement has gained influence in Silicon Valley and among major AI labs, with ties to the authoritarian right.
In-site article

Catch up on 12 major I/O 2026 moments

Here are 12 of the biggest Google I/O 2026 keynote moments, including news about Gemini Omni, Gemini 3.5 Flash, information agents in Search, Universal Cart, Neural Expressive, Gemini Spark, and intelligent eyewear.

  • Gemini Omni creates anything from any input, starting with video.
  • Gemini 3.5 Flash delivers frontier performance for agents and coding.
In-site article

Google launches a tiny board that runs Gemma 3 locally

Google unveiled the new Coral Board at Google I/O - a compact single-board computer for on-device AI. It runs Gemma 3 270M locally and features a RISC-V based NPU.

  • Coral Board is a compact SBC for on-device AI, targeting headphones, AR glasses, and smartwatches
  • It features a RISC-V based Coral NPU and a Synaptics Astra SL2619 chip
In-site article

AGI timelines shift with whichever lab is dominant

A new analysis shows that top AI forecasters adjust their AGI timelines based on which lab is currently leading the field, with predictions swinging from earlier to later and back again as the dominant lab changes from ChatGPT to xAI/Meta/Gemini to Anthropic.

  • Predictions for when most cognitive labor will be automated (AGI) fluctuate significantly based on which AI lab is currently dominant.
  • From 2023-2025, most researchers moved AGI timelines earlier; from 2025-2026, they moved them later; in early 2026, under Anthropic's rapid progress, they moved earlier again.
In-site article

DeepSWE: Measuring coding agents on original, long-horizon engineering tasks

DeepSWE is a new benchmark for evaluating AI coding agents on fresh, complex software engineering tasks. It avoids data contamination, covers diverse repositories, requires significant code changes, and uses hand-written verifiers. Leading models show a wide range of performance, with GPT-5.5 achieving 70% and others lower.

  • DeepSWE is a contamination-free benchmark with original tasks.
  • Tasks span 91 repositories in 5 languages.
In-site article

CNN sues Perplexity over ‘verbatim’ copycat articles

CNN has filed a lawsuit against Perplexity, claiming that the startup's AI tools generate "verbatim" copies of its work, as reported earlier by CNN. The lawsuit, filed in a New York court on Thursday, also alleges that Perplexity provides users with information locked behind CNN's subscription. Perplexity, which offers an AI "answer" engine along with the AI browser Comet, is accused of ignoring CNN's efforts "to recognize or block Perplexity's unidentified crawlers" from scraping its content. "Human beings report, research, write, edit, and create the content that Perplexity takes without permission or compensation," the lawsuit claims. I … Read the full story at The Verge.

  • CNN sues Perplexity for allegedly producing verbatim copies of its articles.
  • Perplexity accused of bypassing CNN's paywall and ignoring crawling prevention measures.
In-site article

IBM and Red Hat Commit $5B to Redefine Future of Open Source for AI Era

IBM and Red Hat announce Project Lightwell, a $5 billion initiative to secure open source software using AI and a team of over 20,000 engineers, establishing a trusted clearinghouse for vulnerability management.

  • Project Lightwell is a $5B investment by IBM and Red Hat to secure open source software.
  • It combines AI and 20,000+ engineers to identify and fix vulnerabilities at scale.
In-site article

What If the Real Key to AI Coding Is Old-Fashioned and Boring?

The article argues that the key to AI-assisted software development is not better specifications or tools, but old-fashioned practices of small batches and rapid feedback loops. Data shows that faster code generation leads to bottlenecks in design, testing, and review, slowing delivery and reducing stability. The real leverage lies in reducing batch sizes and shortening feedback cycles.

  • AI code generation speeds up creation but creates bottlenecks in design, testing, and review.
  • Data from DORA, CircleCI, and Faros shows slower delivery and less stability due to phase-gated processes.
In-site article

7 Real World AI Projects to Build in 2026 (with Guides)

Explore seven practical AI projects that automate real workflows, including job search, web research, investment research, market trend analysis, invoice processing, chart digitization, and personalized exercise training.

  • Build an AI job search assistant that ranks job fit
  • Create a multi-agent research assistant for sourced reports
In-site article

Is AI Inherently Anti-Social?

This article contrasts the sense of connection from the early web with the isolating experience of modern AI, arguing that while AI is a useful tool, it cannot replace human interaction, and questions whether AI has genuinely social applications.

  • The early web fostered a collective 'we' experience, whereas AI interactions are often solitary.
  • The author considers AI a great tool, but not a person or a substitute for one.
In-site article

AIs don't like religion – particularly Jehovah's Witnesses, study claims

Major AI models exhibit a secular-rational bias, ignoring religious perspectives in ethical questions. All tested models show a negative view of Jehovah's Witnesses, according to a study by a consortium of religious universities.

  • AI models rarely invoke religious perspectives in responses to ethical or personal queries, exhibiting an 'omissive bias'.
  • Every tested AI model had a negative bias toward Jehovah's Witnesses.
In-site article

Your AI Agent Already Forgot Half of What You Told It

This article is the seventh in a series on agentic engineering and AI-driven development, focusing on context management in AI sessions. The author shares a personal experience with Gemini forgetting earlier notes, introduces the concept of context compaction, and provides four practical techniques: split discovery from documentation, use handoff documents, give acceptance criteria rather than procedures, and use spec documents as bridges. These techniques apply to both developers and regular users, helping reduce frustration caused by AI forgetting.

  • AI assistants can 'forget' earlier information in long conversations due to context window limits, a phenomenon called context compaction.
  • Four practical techniques: split discovery from documentation, use handoff documents, give acceptance criteria, and use spec documents as bridges.
In-site article

Show HN: I packaged a Python AI agent and Vue dashboard into one Electron app

Hermes Desktop is a cross-platform desktop app that bundles a Python runtime, hermes-agent (a self-improving AI agent), and hermes-web-ui (a Vue 3 + Koa chat dashboard) into a single Electron application, requiring no separate Python or Node installation. It integrates with DingTalk and is powered by DeepSeek.

  • Bundles Python runtime and hermes-agent for a zero-dependency user experience
  • Uses Electron shell with hermes-web-ui frontend
In-site article

Perplexity AI Open-Sources Unigram Tokenizer That Achieves 5x Lower p50 Latency Than Hugging Face tokenizers Crate

Perplexity AI open-sourced a Rust reimplementation of their Unigram tokenizer, achieving 5x lower latency than Hugging Face's tokenizers crate and reducing CPU utilization by 5-6x in production. The optimizations include double-array trie, bitmap packing, and huge pages.

  • Perplexity AI rewrote the Unigram tokenizer in Rust, achieving 5x lower p50 latency vs Hugging Face tokenizers crate.
  • Three optimizations: double-array trie, bitmap and cache-line packing, and huge pages.
In-site article

AIluminode: Pre-Retrieval Cognitive Orientation Tool

AIluminode is a wieldable pre-retrieval cognitive-orientation instrument that helps AI tools check contextual posture before acting, using route polarity (OPEN, PROTECT, AUDIT, DEFER, BLOCK) to reduce erroneous exploration and context bleed.

  • AIluminode is a wieldable pre-retrieval cognitive orientation tool emphasizing posture before retrieval.
  • It uses a route polarity system (OPEN / PROTECT / AUDIT / DEFER / BLOCK) to guide contextual routing.
In-site article

5 AI-Generated Math Papers Accepted! Post-00s Founder Hong Letong Raises $2 Billion

Axiom Math, founded by Chinese post-00s entrepreneur Hong Letong, has had 5 out of 8 AI-generated math papers accepted in peer-reviewed journals. The company raised $2 billion in March, achieving a $16 billion valuation.

  • Five of eight math papers generated by Axiom Math's AI system, AxiomProver, have been accepted by academic journals.
  • Founder Hong Letong dropped out of Stanford to start the company, which secured $2 billion in funding and is valued at $16 billion.
In-site article

When products think: navigating the AI product shift

The article explores how AI is driving a paradigm shift in digital product design, moving from command-driven to intent-driven interaction, and analyzes the new challenges in product management, user experience, decision logic, release cycles, risk, and value creation.

  • AI represents the third user-interface paradigm in computing history, shifting from deterministic to probabilistic outputs.
  • Product teams must rethink the entire lifecycle from discovery to delivery; data strategy and model performance become as critical as feature strategy.
In-site article

AIhub monthly digest: May 2026 – AI for science, the lottery ticket hypothesis, and world models

This month's AIhub digest covers AI for Science conference, lottery ticket hypothesis interview, world models discussion, transparent and trustworthy AI research, foundation model impacts report, AIES conference reflections, Robotics Café, ACL desk rejection policy, arXiv anti-AI slop policy, and more.

  • Interview with Ximing Wen on transparent and trustworthy AI systems
  • Jonathan Frankle discusses the lottery ticket hypothesis and empiricism
In-site article

Former Google and Apple Researchers Launch a Startup to Build AI's Missing Feed

A group of former researchers from Google DeepMind, Apple, OpenAI, and Meta have launched a startup called Trajectory, aiming to help companies continuously improve their AI products by training on real-world user interactions. The company has raised a $15 million seed round at a $115 million valuation, led by Conviction. Trajectory's platform enables continuous learning for AI models, updating them based on real-world failures. It currently works with AI-native companies like Clay and Harvey, and plans to expand to Fortune 500 companies.

  • Trajectory is founded by ex-Google DeepMind, Apple, OpenAI, and Meta researchers to enable continuous learning for AI.
  • The startup raised $15M seed funding at $115M valuation, with investors including Jeff Dean and Fei-Fei Li.
In-site article

Robinhood Agentic Trading

Robinhood launches Agentic Trading, allowing customers to connect their own AI agents to automate trading and credit card purchases with safety controls and a real-time activity feed.

  • Connect your own AI agents to Robinhood
  • Automate trading and credit card purchases
In-site article

Are robots nearing their ChatGPT moment? – podcast

Last month at Beijing's half marathon, a robot named Lightning beat the human world record by nearly seven minutes. This is the latest in a series of AI milestones prompting questions about robots entering everyday life. China leads the charge with a pledge to invest over £100bn in robotics over the next 20 years.

  • Robot 'Lightning' beats human world record in Beijing half marathon.
  • China commits over £100bn to robotics investment over two decades.
In-site article

Simulation-Informed Diffusion for Decentralized Multi-robot Motion Planning

This paper introduces Simulation-Informed Diffusion (SID), a decentralized framework using constraint-aware diffusion models (CADM) to first simulate neighbors' future trajectories and then plan own trajectories under safety constraints. SID enables a minimal communication scheme triggered only in congested scenarios and outperforms baselines, scaling to 108 robots and 160 obstacles.

  • SID uses CADM to simulate neighbor trajectories for decentralized collision avoidance
  • Minimal communication scheme coordinates only when necessary
In-site article

Design of a Real-time Asynchronous Monocular Odometry for Planetary Exploration

Researchers propose a real-time asynchronous event-based monocular odometry for planetary rovers, using an Error-State Kalman Filter to process event camera data for robust ego-motion estimation under high dynamic range lighting and computational constraints.

  • Event cameras provide asynchronous pixel-wise brightness changes with microsecond resolution, ideal for high-speed sensing and HDR environments.
  • The approach uses an Error-State Kalman Filter to continuously estimate camera motion from event streams.
In-site article

Trinity: Unifying Class-Agnostic Terrain and Semantic Segmentation for Unstructured Outdoor Environments by Leveraging Synthetic Data

This paper presents a transformer-based architecture called Trinity that jointly performs class-specific semantic segmentation and class-agnostic terrain segmentation in a unified network. It segments terrain regions based purely on visual appearance without predefined labels or robot-dependent traversability scores, enabling robot-agnostic visual terrain priors for downstream tasks. The authors extend the OAISYS simulator to create the RUGDSynth synthetic dataset and provide the EXTerra real-world dataset. Experiments demonstrate the approach's effectiveness in complex outdoor environments.

  • Trinity architecture unifies class-agnostic terrain segmentation with semantic segmentation
  • Segments terrains based on visual appearance without predefined labels for better transferability
In-site article

Agentic Language-to-Objective Synthesis for Optofluidic Assembly

Researchers introduce Speak-to-Objective, a modular agentic pipeline that uses a conditioned LLM to translate spoken or written commands into fully differentiable objective functions for assembling microparticles in a constraint-aware inverse solver and on an experimental optofluidic platform. The approach separates what to assemble from how to actuate, learns from user feedback, and demonstrates natural-language-programmable microscale assembly using laser-induced thermoviscous flows.

  • Speak-to-Objective pipeline translates natural language into differentiable objective functions for microparticle assembly.
  • It uses a perceive->compose->propose->act->report&learn loop, treating the objective as the interface between intent and actuation.
In-site article

Synthetic Emotions vs. Gamification: Exploring Engagement Strategies for Small Social Robots in Different Age Groups

Many children face challenges in emotional regulation and social interaction, limiting their participation in therapeutic programs. This study explores engagement strategies for a tactile robot supporting children with anxiety disorders, comparing synthetic emotional feedback and point rewards. A preference study with 16 school children (ages 6-8) showed preference for emotional engagement, while a behavioral study with 14 university students (ages 20-27) found point-based systems yielded higher task accuracy (p<0.05) and sustained performance. These findings highlight age-related differences and the need to validate design assumptions through observed interaction.

  • Children aged 6-8 prefer emotional engagement over points
  • University students show higher task accuracy with point rewards
In-site article

What-If World: A Causal Benchmark for General World Models in Embodied Scenarios

A new benchmark called What-If World tests video generation models' causal reasoning by presenting paired prompts that differ in one physical detail and checking if videos diverge correctly. Evaluating nine state-of-the-art models, none exceed 52% on paired scores, with open-source models around 28%, indicating significant room for improvement. Performance correlates with visual prominence rather than physics tractability.

  • What-If World benchmark uses 319 prompt pairs with single variable changes to test causal understanding in video generation models. It is built on real frames from nuScenes and DROID.
  • Scoring uses APEO rubric (Adherence, Physics, Environment, Outcome). All nine models struggle: best paired score is 52%, open-source models average 28%.
In-site article

Clinical Validation of the Melanoscope AI Mobile Dermoscopy Clinical Decision Support System

A prospective single-center clinical validation of the Melanoscope AI mobile dermoscopy CDSS demonstrated 88.6% agreement with expert assessment on 176 patients, with no false negatives and 88.3% specificity. The study developed a quantitative interpretability method for cascade deep learning models and a three-zone patient routing algorithm, supporting reproducible and interpretable decision-making for skin cancer screening in resource-limited settings.

  • The Melanoscope AI system achieved 88.6% agreement with experts on 176 patients, with zero false negatives among 5 malignant lesions.
  • Specificity reached 88.3%, with 3 melanomas and 2 basal cell carcinomas histologically confirmed.
In-site article

Representation-Conditioned Diffusion Models for Guided Training Data Generation

This work proposes representation-conditioned diffusion models that leverage learned representations from DINOv2, DINOv3, and CLIP to generate synthetic image data. On ImageNet100, this approach outperforms class-conditioned generation by +10.76 p.p. top-1 accuracy. Scaling synthetic data can even surpass real-data training by +2.0 p.p. The method also excels in data augmentation and sample filtering, offering a promising way to augment or replace real datasets in large-scale visual learning.

  • Representation-conditioned diffusion models outperform class-conditioned ones by 10.76 p.p. on ImageNet100.
  • Scaled synthetic datasets can beat real-data-trained classifiers by 2.0 p.p. top-1 accuracy.
In-site article

Beyond Motion Primitives: Behavioral Activity Recognition from Head-Mounted IMU

This paper presents a behavioral-level activity recognition method using head-mounted IMU, going beyond basic motion primitives. The authors define five behavioral categories, construct a 160K-sample dataset from Ego4D with a four-tier quality assurance framework, and propose HiT-HAR, a 703K-parameter hierarchical model that outperforms prior models on action and scenario recognition. Observability analysis reveals locomotion is reliably observable, while object transfer and task operation benefit from temporal context; scenario-dependent signal overlap remains a challenge. Results show that architectural choices exploiting temporal context and scenario structure outperform simply scaling model size.

  • Proposes HiT-HAR, a hierarchical model for behavioral activity recognition from head-mounted IMU, going beyond motion primitives
  • Constructs a 160K-sample Ego4D dataset with 8 scenarios and 5 behavioral categories, using a four-tier quality assurance framework
In-site article

From Affect to Complex Behavior: Advancing Multimodal Human-Centered AI at the 10th ABAW Workshop & Competition

The 10th ABAW Workshop and Competition at CVPR 2026 advances multimodal human-centered AI by introducing new challenges including emotional mimicry intensity estimation, ambivalence/hesitancy recognition, and fine-grained violence detection, alongside traditional affect estimation and recognition tasks. The competition leverages large-scale in-the-wild datasets, and the paper track covers a broad range of topics from pose estimation to fairness and robustness.

  • ABAW 2026 introduces novel challenges: emotional mimicry intensity, ambivalence recognition, and violence detection.
  • Workshop continues dual structure with competition and paper tracks.
In-site article

Modeling Community Attitude through Reaction Tone: A Human-AI Collaborative Framework for Evaluating LLM Alignment with Linguistic Behaviors in Online Communities

Large language models (LLMs) are increasingly used as proxies for computational social analysis, but their ability to faithfully represent human communities' 'thick descriptions' remains a critical challenge. This paper introduces CARE (Community-Aware Reaction Evaluation), a reaction-centered framework that benchmarks LLM-simulated discourse against authentic community responses to real-world news. By characterizing a fine-grained spectrum of illocutionary tones, the diagnosis reveals a persistent 'realism gap': steering LLMs with explicit community prompts fails to inherently improve simulation fidelity. Analysis further identifies divergent behavioral signatures among frontier models, suggesting current alignment strategies are insufficient for capturing the sociolinguistic dynamics of online groups.

  • CARE framework evaluates LLM simulation fidelity by analyzing authentic community reaction tones
  • Current LLM alignment strategies fail to adequately capture online community sociolinguistic dynamics
In-site article

From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons

A new framework called FLUID adapts autoregressive language models to diffusion models for efficient parallel text generation, using Strictly Causal Alignment to reuse GPT checkpoints and Elastic Horizons to dynamically adjust denoising steps. It achieves state-of-the-art performance with significantly reduced training costs.

  • FLUID bridges AR and diffusion models by enforcing Strictly Causal Alignment, enabling initialization from GPT-style checkpoints.
  • Elastic Horizons uses entropy to dynamically adapt denoising strides based on local information density.
In-site article

Bridging the Stability-Expressivity Gap: Synthetic Data Scaling and Preference Alignment for Low-Resource Spoken Language Models

Researchers identify a Stability-Expressivity Gap in spoken language models when using synthetic data for low-resource languages, and propose two self-alignment frameworks (DGSA and TDSC) that recover prosodic variability and outperform commercial systems like ElevenLabs and Gemini Pro, enabling zero-shot voice cloning for Lao.

  • Spoken Language Models (SLMs) for low-resource languages suffer from a trade-off between phonetic accuracy and prosodic expressivity when trained on synthetic data.
  • The proposed Disentanglement-Guided Self-Alignment (DGSA) recovers expressivity by separating prosody and timbre.
In-site article

BioELX: Cross-lingual Biomedical Entity Linking via Alias-based Retrieval and LLM Ranking

BioELX is a novel two-stage framework for cross-lingual biomedical entity linking that requires no annotated training data. It enhances SapBERT with multilingual aliases from Wikidata and uses a pre-trained LLM for context-aware disambiguation. Experiments on five benchmarks show significant improvements, especially for low-resource languages like Turkish, Korean, and Thai.

  • Proposes BioELX, a zero-shot cross-lingual BEL framework using alias-based retrieval and LLM ranking.
  • In Stage 1, enriches SapBERT with multilingual aliases from Wikidata for better candidate retrieval.
In-site article

RAG-Coding: Enhancing LLM Medical Coding with Structured External Knowledge

RAG-Coding is an agentic method for automated ICD-10-CM coding that orchestrates four large language model (LLM) agents and grounds decisions in external knowledge sources, improving coding accuracy and clinical compliance. On the MDACE dataset, it outperforms the best LLM baseline by 8-13% micro-F1 and 2-8% macro-F1. Compared to PLM-ICD, RAG-Coding shows higher micro recall (+11%) but lower micro precision (-6%), with comparable F1 scores. Ablation studies confirm the importance of external knowledge. The authors also release MDACE-2025, updated with expert re-annotations based on 2025 guidelines, enabling finer-grained evaluation.

  • RAG-Coding uses four LLM agents and external knowledge sources to improve ICD-10-CM coding accuracy.
  • On the MDACE dataset, it outperforms the best LLM baseline by 8-13% micro-F1 and 2-8% macro-F1.
In-site article

Unlocking Fine-Grained and Within-Utterance Speaking Style Control in Prompt-Based Text-to-Speech Models

This paper proposes novel techniques for inter-utterance style interpolation and intra-utterance style transition in prompt-based TTS models, addressing limitations of coarse global control. Methods include direction vector interpolation and KV-cache swapping with sliding-window attention masking. Experiments show high success rates in gender conversion and smooth style transitions within utterances.

  • Inter-utterance interpolation via direction vectors between contrastive style prompts enables smooth transitions.
  • Intra-utterance transition uses KV-cache swapping and sliding-window masking to overcome attention bias.
In-site article

ICG: Improving Cover Image Generation via MLLM-based Prompting and Personalized Preference Alignment

ICG is a novel framework that integrates MLLM-based prompting with personalized preference alignment to generate high-quality, contextually relevant cover images. It extracts semantic features via meta tokens, refines them with user embeddings, and injects personalized context into diffusion models. A multi-reward learning strategy combines public rewards with a personalized preference model, eliminating the need for labeled supervision. Experiments show improvements in image quality, semantic fidelity, and personalization, boosting user appeal and recommendation accuracy.

  • ICG integrates MLLM prompting with personalized preference alignment for end-to-end cover image generation.
  • Semantic features are extracted via meta tokens and refined with user embeddings for diffusion model injection.
In-site article

Detect by Yourself: Self-Designing Agentic Workflows for Few-Shot Graph Anomaly Detection

The SignGAD framework reformulates graph anomaly detection by replacing fixed pipelines with self-designed task-conditioned workflows, and introduces a guarded final refit strategy to improve reliability under limited supervision.

  • SignGAD shifts from training a fixed detector to designing detection workflows
  • It selects suitable graph encodings and detector designs for task-specific anomaly evidence
In-site article

Architecture-driven Shift: towards a lightweight selector for capturing the trends of logit shift

This paper introduces Architecture-driven Shift (ADS), a lightweight metric for selecting pre-trained models in continual learning. ADS decouples logit shift into architecture and data dependencies, requiring only few data samples to capture shift trends. Experiments across over 175 architectures show strong monotonic correlation (Spearman's r_s ≥ 0.731) between ADS and logit shift, and ADS serves as an effective proxy for expected calibration error for reliable CL model selection across three datasets and six scenarios.

  • Selecting pre-trained models that balance plasticity and stability in continual learning is critical, but computing logit shift is computationally expensive.
  • Existing theories assume uniform hidden layer widths, ignoring real-world architectural heterogeneity and failing to provide efficient alternatives.
In-site article

Metric-Aware PCA as a Linear Instance of Geometric Deep Learning

This paper introduces Metric-Aware Principal Component Analysis (MAPCA), which parameterizes PCA with a positive-definite metric matrix and positions it within the geometric deep learning framework. MAPCA interprets the metric as a geometric prior, its solutions are equivariant under the orthogonal group preserving the metric, and its spectrum is invariant. A uniqueness theorem characterizes Invariant PCA (IPCA) as the unique linear data-derived metric in the MAPCA family that is equivariant under arbitrary diagonal rescaling. The paper also discusses extensions to kernel PCA, spectral graph methods, and deep MAPCA.

  • MAPCA parameterizes PCA with a positive-definite metric matrix, linking geometric deep learning symmetry and equivariance concepts.
  • A uniqueness theorem shows that IPCA is the unique linear data-derived metric in the MAPCA family equivariant under diagonal rescaling.
In-site article

Tackling Multimodal Learning Challenges with Mixture-of-Expert: A Survey

This survey explores how Mixture-of-Experts (MoE) effectively addresses multimodal learning challenges from three perspectives: efficient engine, representation learner, and adapter, while identifying research gaps.

  • MoE enables scalable multimodal modeling by decoupling computational cost from parameter growth.
  • MoE integrates complementary expert knowledge for enriched alignment and interaction representations.
In-site article

$E^3$-Agent: An Executable and Evolving Agent for Resource Management of Edge Generative Inference

This paper presents $E^3$-Agent, an executable and evolving agent for resource management of edge AIGC. It separates a fast-path router from a slow-path LLM meta-controller, learns online from execution feedback, and adapts to unknown time-varying service-time mappings. Evaluation shows 65%-73% latency reduction over static baselines and effective stutter suppression.

  • Edge generative inference faces unknown per-device performance and non-stationarity.
  • $E^3$-Agent uses a dual-path architecture: fast router + slow LLM meta-controller.
In-site article

A Simple State Space Model Excels at Multivariate Time Series Classification

Research shows that diagonal state space models (S4D) outperform more complex Mamba architectures in time series classification tasks. The authors propose lightweight variants MS4 and MS4N, which achieve higher accuracy and efficiency on 59 datasets, matching deep learning models with 2x to 10x more parameters.

  • S4D consistently outperforms Mamba-based variants in accuracy and efficiency on TSC benchmarks.
  • Proposed MS4 and MS4N models use simple modifications like linear input projection and channel mixing.
In-site article

Personalized Observation Normalization for Federated Reinforcement Learning in Simulation Environments with Heterogeneity

This paper proposes Personalized Observation Normalization (PON) for federated reinforcement learning in heterogeneous environments. Each agent locally normalizes raw state inputs using a continuously updated running mean and variance, ensuring consistent scaling without overshadowing. Sharing normalization parameters is shown ineffective. Experiments on heterogeneous MuJoCo tasks demonstrate faster training and superior performance. Accepted at IJCNN 2025.

  • Federated RL faces challenges in heterogeneous environments due to differing state-transition dynamics.
  • PON normalizes observations locally using per-agent running statistics.
In-site article

You Are in Control of Your State: Why Human Outcomes Are Controllable Through Causal State Intervention

This paper argues that within-person behavioral variability stems from dynamic latent states, not solely from observable inputs. By intervening on the state's weighting at decision time, outcomes become causally controllable. The framework integrates six lines of evidence (causal inference, predictive processing, allostasis, attentional bottleneck, chronobiology, computational psychiatry) and a 24-month observational dataset from over 200,000 users. It yields seven testable predictions and six operational requirements for state-aware systems, with implications for digital health, education, AI personalization, and personal agency.

  • Human behavioral variability is explained by dynamic latent states, not solely by observable inputs.
  • State is defined as a time-indexed weighting vector; intervening on state can causally control outcomes.
In-site article

Topics