AI News HubLIVE

Today's highlights

Models

Mistral AI Taps Legal Sector With Harvey Partnership

The generative AI vendor is expanding into the legal industry in a move reminiscent of Anthropic’s legal AI deals.

  • Mistral AI partners with Harvey to enter legal sector.
  • The move mirrors Anthropic's legal AI collaborations.
In-site article

Microsoft Copilot Cowork Exfiltrates Files

A vulnerability in Microsoft Copilot Cowork allows attackers to exfiltrate OneDrive files through prompt injection and external images in automatically sent emails.

  • Copilot Cowork agents can send emails to user's inbox without approval
  • External images in emails can trigger network requests, leaking data
In-site article

Quoting Paul Graham

Paul Graham criticizes founders for using AI to write emails, noting that the journalistic style is easily identifiable and undermines trust.

  • Paul Graham observes many founder emails are now written in a hard-hitting journalistic style, indicating AI use.
  • He has never finished reading an email signed by a human but written by AI, feeling deceived.
In-site article

Sundar Pichai on AI, the future of search, and what’s happening to the web

In a Decoder interview after Google I/O, CEO Sundar Pichai discusses Google's AI-first pivot, the restructuring of DeepMind, the controversial AI Overviews in Search, the 'Google Zero' phenomenon, and his thoughts on AGI.

  • Google merged Brain and DeepMind into Google DeepMind and centralized AI infrastructure.
  • Search is evolving with AI Overviews and the Gemini Spark agent platform.
In-site article

AI-hallucinated citations are creeping into papers that shape clinical guidelines, researchers warn

An audit of 2.5 million biomedical papers by Columbia University and other institutions shows that the rate of fabricated references has increased more than twelvefold since 2023. The researchers suspect a link to the widespread use of language models - the fake references match their paper's topic, follow correct formatting, and are nearly impossible to spot. 98 percent of the affected papers have received no response from their publishers.

  • Audit of 2.5M biomedical papers shows fabricated reference rate up >12x since 2023
  • Fake references match topics, formatting; nearly undetectable
In-site article

The Sequence Knowledge #866: Three Text Diffusion Models You Need To Know About

Text diffusion models challenge the autoregressive paradigm by generating text through iterative denoising, treating generation as editing rather than typing. Three key systems define the field: LLaDA (proof of scaling), Mercury (commercial speed advantage), and Gemini Diffusion (frontier validation), representing the three phases of a new architecture class: scientific proof, industrial deployment, and frontier validation.

  • Text diffusion models generate text by iterative refinement from noise, using bidirectional context.
  • LLaDA proved diffusion can scale to a large language model.
In-site article

Introducing DSA Attention to Multimodal: Kuaishou Keye 2.0 Opens a New Paradigm of Enhanced Reasoning

Kuaishou releases Keye-VL-2.0-30B-A3B, a multimodal large language model that first applies DeepSeek Sparse Attention (DSA) to multimodal scenarios, enabling 256K ultra-long context deep perception. It achieves SOTA on long-video temporal understanding benchmarks and introduces built-in Agent collaboration, paving the way for enhanced reasoning and real-world business applications.

  • First to integrate DSA attention into multimodal, solving long-video understanding bottlenecks.
  • Achieves SOTA on TimeLens, LongVideoBench, MLVU; reverses long-context decay by boosting accuracy from 35.34% to 42.44% when scaling from 64 to 512 frames.
In-site article

Design a Complete Multimodal RLVR Pipeline with Open-MM-RL, Vision-Language Prompting, Reward Scoring, and GRPO Export

In this tutorial, we explore the TuringEnterprises/Open-MM-RL dataset as a practical foundation for multimodal reasoning and reinforcement learning with verifiable rewards. We load the dataset, inspect its schema, analyze domains, formats, question lengths, answer types, and image distributions, and visualize representative examples from each domain. We also build a lightweight reward function that checks exact, numeric, fractional, LaTeX, and symbolic answers, giving us a useful way to evaluate model outputs. Finally, we format prompts for vision-language models, optionally test SmolVLM on sample examples, and export the dataset into a GRPO-style structure for future multimodal RL training.

  • Load and analyze the Open-MM-RL dataset, including domain distribution, image statistics, and answer types.
  • Build a lightweight verifiable reward function supporting exact, numeric, fractional, LaTeX, and symbolic answers.
In-site article

Multi-Agent LLM Orchestration with Docker Compose and MCP

This book repository from Packt covers the full lifecycle of operationalizing AI with Docker, including running local LLMs, integrating MCP, building autonomous agents, and orchestrating multi-agent systems on Kubernetes.

  • Practical guide for production AI with Docker's integrated toolkit.
  • Covers Docker Model Runner, MCP Gateway, multi-agent architectures, and Kubernetes orchestration.
In-site article

Alibaba's Qwen3.7-Max Ranks Second Globally in Coding Benchmark, Trailing Only Claude

Alibaba's latest flagship model Qwen3.7-Max achieved a score of 1541 on the authoritative Code Arena leaderboard, surpassing GPT-5.5 and other models, ranking second globally behind the Claude series.

  • Qwen3.7-Max scored 1541 on Code Arena, ranking second only to Claude.
  • Code Arena is a blind-test platform where developers submit full web app challenges.
In-site article

LWiAI Podcast #246 - Gemini 3.5 + Omni, Musk Loses, OpenAI vs Erdős

Google unveils Gemini 3.5 and Gemini Spark agent, plus Gemini Omni multimodal video generation; Elon Musk loses OpenAI lawsuit on statute of limitations; Anthropic agrees to $30B funding at $900B valuation; AI solves 80-year-old Erdős geometry problem.

  • Google launches Gemini 3.5 and always-on agent Gemini Spark with MCP tool support.
  • Gemini Omni converts images, audio, and text into video.
In-site article

GPT Image 2 left me amazed but exhausted – so I built a little tool

GPT Image 2 is OpenAI's latest image model with sharp text rendering and photorealism. The article introduces imagesv2.ai, a platform offering free credits, templates, and tools like panorama, tweet screenshot, and WeChat chat generators. Pricing starts at $4.16/month with yearly plans.

  • GPT Image 2 excels at text rendering and photorealistic images.
  • imagesv2.ai provides free credits and 50+ templates.
In-site article

Domestic Agent Model Breaks into Global Top Tier! Limited-Time Free Access

Kunlun Tech releases SkyClaw-v1.0 and its lightweight version SkyClaw-v1.0-lite, native Agent models that rival top players like Claude Opus 4.6. Priced at half or less of mainstream models, with limited-time free access and future open-source plans, they deeply integrate with OpenClaw, Claude Code, and other mainstream frameworks, and are compatible with OpenAI APIs.

  • Kunlun Tech launches SkyClaw-v1.0 and SkyClaw-v1.0-lite, native Agent models achieving global top-tier performance.
  • Priced at half or less than leading models, currently free for a limited time, with planned open-source releases.
In-site article

Terrain-Adaptive Grouser Wheel for Optimal Planetary Exploration: Design and Experimental Investigation

Planetary rovers face mobility challenges on varying terrains. Researchers introduce a multimodal wheel that continuously adjusts grouser height. In 750 trials across four surfaces, adaptive deployment reduced slip by 30-58% and improved travel time and energy by up to 77.4% on granular terrains, highlighting limitations of fixed wheels.

  • A novel wheel with adjustable grouser height adapts to different terrains
  • 750 experiments show slip reduction of 30-58% and up to 77.4% improvement in travel time and energy on granular surfaces
In-site article

Anisotropic Diffusion-Driven Ergodic Coverage in Multi-Robot Systems

Researchers propose a new anisotropic diffusion method for ergodic search in multi-robot systems, overcoming the uniform error propagation of traditional isotropic diffusion, using Perona-Malik diffusion gradient to guide robot motion for more flexible coverage.

  • Traditional ergodic search uses isotropic diffusion (heat equation), causing uniform error propagation in all directions.
  • The new method introduces anisotropic diffusion (Perona-Malik), using gradient to guide robot motion for more flexible matching of target distribution.
In-site article

ActQuant: Sub-4-bit Action-Guided Quantization for Vision-Language-Action Models

ActQuant is an action-guided mixed-precision post-training quantization framework for Vision-Language-Action (VLA) models, enabling sub-4-bit weight quantization through a two-stage approach that maintains high success rates on the LIBERO benchmark and a real UR3 robotic arm, significantly reducing memory footprint.

  • ActQuant employs action-aware mixed-precision quantization to preserve VLA model performance under sub-4-bit weight quantization.
  • The two-stage framework includes an inter-tensor bit allocator and an intra-tensor scale optimizer focusing on action-critical weights.
In-site article

Brain-to-Image Retrieval and Reconstruction via Multimodal EEG Alignment

Researchers propose a brain-to-image system that decodes visual stimuli from EEG signals recorded during natural image viewing. It handles EEG-to-image retrieval (86.30% Top-1 accuracy among 200 candidates) and EEG-to-image reconstruction (CLIP score 0.903). The method uses multi-level blurring, EVNet features, InfoNCE loss, and multi-modal CLIP alignment with SDXL-Turbo generation, demonstrating feasibility of decoding rich visual representations from EEG.

  • EEG-to-image retrieval achieves 86.30% Top-1 and 98.55% Top-5 accuracy over 200 candidate images.
  • EEG-to-image reconstruction uses CognitionCapturerPro with multi-modal CLIP embeddings and SDXL-Turbo, achieving CLIP score 0.903.
In-site article

Nano World Models: A Minimalist Implementation of Future Video Prediction

Nano World Models is a minimalist codebase for future video prediction centered on diffusion forcing. It provides a unified interface for generative objectives, model scales, action-conditioning mechanisms, latent observation spaces, datasets, evaluation protocols, and long-horizon rollouts, enabling controlled studies of world-modeling components. Experiments across control environments, games, and real-robot data validate its effectiveness. Code, configs, and pretrained checkpoints are released for open, reproducible research.

  • Nano World Models is a minimal, reproducible codebase for future video prediction research.
  • It integrates key design components like generative objectives, model scales, and action conditioning around diffusion forcing.
In-site article

A World Model of Radiologist Reading for Medical Image Representation Learning

GazeWorld is a medical imaging world model that treats the image as the world and radiologist fixation sequences as trajectories. It autoregressively predicts latent representations of fixated patches while using a spatial-completion branch for unvisited regions. At inference, it generates patch representations from the image alone without real gaze data. Frozen GazeWorld features achieve state-of-the-art diagnostic accuracy on all nine supervised settings across CheXpert, RSNA Pneumonia, and SIIM-ACR Pneumothorax, and highest zero-shot accuracy on all three benchmarks. On GazeSearch, a generic decoder trained on the same frozen features outperforms the purpose-built LogitGaze-Med by over 16% in ScanMatch and 22% in SED. The work demonstrates that modeling how experts read, not just their conclusions, offers a promising pretraining paradigm for medical imaging AI.

  • GazeWorld leverages radiologist eye-tracking data as reading trajectories for autoregressive prediction and spatial completion.
  • It requires no real gaze data at inference, generating patch sequences from images alone.
In-site article

Faithful or Fabricated? A Causal Framework for Rationalization Bias in LLM Judges

Large language models (LLMs) are increasingly used as automatic judges for summarization and dialogue evaluation. Prior work has documented biases such as position, verbosity, and style preferences, but largely focuses on outcomes. This paper asks whether LLM judges are cue-invariant, introducing a causal framework with interventions and metrics to test stability of rankings and explanations under non-evidential cue perturbations. Results show substantial cue-anchored rationalization, effectively mitigated by the PROOF-BEFORE-PREFERENCE method.

  • LLM judges exhibit cue-anchored rationalization bias, where non-evidential cues affect their explanations.
  • The paper develops interventions (Blind, Truth, Flip, Placebo, Reveal-After) and tie-aware metrics to quantify outcome and rationale anchoring.
In-site article

TriVAL: A Tri-Validation Framework for Faithful Automatic Optimization Modeling

A tri-validation framework that performs explicit validation at three stages of automatic optimization modeling: semantic specification, mathematical formulation, and code generation, with a new benchmark NL4COP for combinatorial problems.

  • TriVAL performs explicit validation at three stages of automatic optimization modeling.
  • The framework uses a construct-validate-revise loop to catch errors early and prevent accumulation.
In-site article

Improving the Completeness and Comparability of Segment Disclosures: A Large Language Model Approach

This study develops a large language model-based framework to extract segment disclosures directly from Form 10-K filings, preserving both reportable and nested segment information. A retrieval augmented system is designed to incorporate information across multiple filings to support comparability. The framework is demonstrated in longitudinal analysis within firms and cross-firm alignment of geographic segments, showing accurate extraction and effective handling of cross-period queries.

  • Segment disclosures are central to financial reporting but face completeness and comparability issues due to dispersed formats in 10-K filings.
  • Proposes an LLM-based framework to extract segment info, including nested segments, directly from 10-Ks.
In-site article

Multi-Persona Debate System for Automated Scientific Hypothesis Generation

This paper presents the Multi-Persona Debate System (MPDS), a literature-grounded framework that automates scientific hypothesis generation by combining retrieval, long-context LLM reasoning, corpus-driven persona induction, and structured multi-agent debate. Evaluated on battery materials design, MPDS constructs literature snapshots of up to 500 papers, conducts three-round citation-aware debate, and produces mechanistically explicit proposals. It outperforms baselines in cross-perspective integration and shows promise as a diagnostic aid for identifying workflow bottlenecks.

  • MPDS automates hypothesis generation through multi-persona debate over literature snapshots, addressing fragmentation in scientific knowledge.
  • The system uses up to 500 papers, three rounds of citation-aware debate, and moderator synthesis with evidence traceability.
In-site article

Raon-Speech Technical Report

Raon-Speech is a 9B-parameter speech language model for English and Korean, achieving top performance on speech understanding and generation while preserving text capabilities. Its full-duplex extension Raon-SpeechChat enables natural real-time conversation. The models are open-sourced.

  • Raon-Speech is a 9B-parameter SpeechLM trained on 1.38M hours of curated data.
  • It outperforms eight similar models on speech tasks while retaining strong text QA performance.
In-site article

Document Classification Pattern Recognition via Information Fusion: A Systematic Review of Multimodal and Multiview Representation Approaches

This systematic review of 139 studies proposes a unified framework and meta-analysis. Results show multimodal fusion improves accuracy by 5.28% on average, multiview fusion boosts accuracy by 4.67% and F1 by 3.08%, but only a minority of studies used statistical tests, raising reproducibility concerns.

  • Meta-analysis reveals that multimodal and multiview fusion significantly improve document classification accuracy.
  • Multimodal fusion yields a +5.28% accuracy gain; multiview yields +4.67% accuracy and +3.08% F1 score.
In-site article

Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing

This paper studies truthful online preference aggregation for LLM fine-tuning in mobile crowdsourcing, addressing strategic misreporting by workers. It proposes a dynamic Bayesian game model and an online weighted aggregation mechanism that dynamically adjusts worker weights based on feedback accuracy, ensuring truthful feedback and achieving sublinear regret O(√T). Experiments on real-world datasets show significant performance gains.

  • Dynamic Bayesian game model formulated for multi-agent online learning between platform and strategic workers.
  • Online weighted aggregation mechanism adjusts weights to incentivize truthful feedback.
In-site article

Mixture of Complementary Agents for Robust LLM Ensemble

This paper reframes proposer selection in LLM ensembles as a combinatorial selection problem akin to feature selection, emphasizing complementarity over accuracy or diversity. It explores computationally feasible greedy algorithms that assess complementarity using a small labeled set, validating complementarity as a guiding principle and identifying methods with the best performance-cost trade-offs.

  • Proposer selection is reformulated as a combinatorial problem focusing on complementarity among models.
  • Greedy algorithms are proposed to overcome the prohibitive time complexity of standard feature selection.
In-site article

LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs

Proposes LLM-AutoSciLab, a closed-loop framework that combines hypothesis generation with hypothesis-conditioned experiment selection and mechanism refinement. It iteratively proposes plausible hypotheses, selects informative experiments, and updates state based on evidence. Introduces ActiveSciBench with 57 enzyme-kinetics tasks and 45 gene-regulatory-network tasks. Achieves 67.6% symbolic accuracy on NewtonBench, 35.1% on ActiveSciBench-Chem, and 31.1% exact graph recovery on ActiveSciBench-GRN, with 2-5x sample efficiency over baselines.

  • LLM-AutoSciLab iteratively proposes hypotheses, selects informative experiments, and refines mechanisms in a closed loop.
  • Introduces ActiveSciBench, a benchmark with enzyme-kinetics and gene-regulatory-network tasks for evaluating active scientific discovery.
In-site article

Towards Verifiable Transformers: Solver-Checkable Circuit Explanations

This paper introduces a framework called Verifiable Transformers that converts task-localized Transformer circuits into bounded, solver-checkable claims. It employs direct verification for exactly encodable operators and surrogate-mediated verification for complex ones, demonstrating exhaustive verification on small symbolic tasks and applicability at GPT-2 scale, aiming to formalize mechanistic circuit explanations into provable or refutable propositions.

  • Proposes a framework to convert task-localized Transformer circuits into bounded, solver-checkable claims.
  • Uses direct verification and surrogate-mediated verification for different operator types.
In-site article

CAFD: Concept-Aware DNN Fault Detection using VLMs

This paper introduces CAFD, a learning-based approach that integrates model-based signals, distance features, and a novel Concept Failure Ratio (CFR) feature extracted via Vision-Language Models to achieve superior fault detection performance while maintaining efficiency, with an average 18.3% FDR improvement over state-of-the-art baselines.

  • CAFD is a lightweight learning-based method that effectively combines multiple information sources for DNN fault detection
  • It introduces Concept Failure Ratio (CFR), a novel feature leveraging VLMs to extract semantic concepts from images
In-site article

Parameter Efficient Multi-Class Intelligent Scheduling for Multimodal Online Distributed Industrial Anomaly Detection

This paper proposes a novel framework called MODIAD for multimodal online distributed industrial anomaly detection. It formulates a Multi-class Intelligent Scheduling (MIS) problem and designs a Sequential Marginal Gain Greedy (SMG) algorithm to solve it efficiently. Furthermore, a Resource Efficient Class-Wise Low Rank Adaptation (REC-LoRA) strategy reduces training overhead. Experiments on MVTec 3D-AD and Eyecandies datasets demonstrate superior performance and efficiency.

  • Existing industrial anomaly detection methods are mostly centralized and offline, ignoring distributed and streaming data.
  • The MODIAD framework integrates multi-class scheduling and edge intelligence for online distributed training.
In-site article

Practical Quantum CIM Empowerment via All-Domestic-Core Agentic Large Model

This study integrates a femtosecond laser-pumped Coherent Ising Machine (CIM) with an LLM-driven agentic system using LangGraph and LangChain frameworks. LLMs effectively perform QUBO/Ising model calibration, constraint weight iteration, and validation of literature-reported schemes. All tasks use domestic large models and CIM hardware, achieving practical quantum CIM empowerment fully based on domestic core technologies. A new paradigm is discovered where agent-assisted quantum computing iterations reciprocally enhance the agent's own problem-solving capability.

  • Integration of femtosecond laser-pumped CIM with LLM-driven agentic system
  • LLMs perform QUBO/Ising calibration, constraint iteration, and validation
In-site article

When Correct Beliefs Collapse: Epistemic Resilience of LLMs under Clinical Pressure

A new study introduces Med-Stress, a stress test framework that reveals a dissociation between medical knowledge and belief stability in LLMs under escalating clinical pressure. The authors propose two defenses: RBED (inference-time) and R-FT (training-time), with R-FT nearly eliminating belief change.

  • LLMs can abandon correct diagnoses under pressure despite high benchmark accuracy.
  • Med-Stress evaluates belief stability across nine frontier LLMs, finding large knowledge-robustness gaps.
In-site article

BODHI: Precise OS Kernel Specification Inference

Researchers propose BODHI, a domain-knowledge prompting method that significantly improves LLM performance in generating formal OS kernel specifications. On the OSV-Bench benchmark, BODHI with Claude Opus 4.6 achieves 96.73% Pass@1, substantially surpassing previous best results.

  • BODHI augments few-shot prompts with a structured C-to-Python translation guide covering 15 domain-specific patterns.
  • It improves Pass@1 from 55.10% to 96.73% on OSV-Bench with 245 tasks.
In-site article

Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs

This paper analyzes the fundamental tradeoffs among latency, reliability, and cost in LLM-enabled agentic workflows. It introduces performance models using a parametric exponential reliability function for LLM agents and proposes a water-filling token allocation policy under latency and cost constraints.

  • LLM agentic workflows involve tradeoffs among latency, reliability, and cost.
  • A parametric exponential reliability function models LLM agent performance.
In-site article

How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning

This paper quantifies redundancy in reasoning LLMs, finding that 61-93% of chain-of-thought steps can be truncated without affecting correctness, and proves this redundancy is a structural consequence of length-agnostic outcome rewards.

  • Formal definition of reasoning redundancy: fraction of trailing steps that can be truncated while still yielding correct answer
  • Measured redundancy of 61-93% across four frontier models and two math benchmarks
In-site article

Confidence Calibration in Large Language Models

Research finds that large language models (LLMs) exhibit human-like calibration biases: overconfidence on hard tasks and underconfidence on easy ones. The authors introduce LifeEval, a benchmark for evaluating calibration across difficulty levels.

  • LLMs are on average overconfident, with confidence exceeding accuracy
  • A hard-easy effect is observed: overconfidence on difficult tests, underconfidence on easy tests
In-site article

Notes on Pope Leo XIV's encyclical on AI

Pope Leo XIV issued the encyclical 'Magnifica Humanitas' on safeguarding human dignity in the age of AI. The article highlights key sections on interpretability, development, biases, environment, algorithmic accountability, power amplification, and data as a common good, and recounts a podcast prediction from earlier in 2026 that the Pope would weigh in on AI.

  • Pope Leo XIV's encyclical on AI, 'Magnifica Humanitas,' released on May 25, 2026
  • The encyclical emphasizes AI systems are 'cultivated' not 'built,' with limited understanding of their inner workings
In-site article

Together AI Open-Sources OSCAR: An Attention-Aware 2-Bit KV Cache Quantization System for Long-Context LLM Serving

Together AI has released OSCAR (Offline Spectral Covariance-Aware Rotation), an INT2 KV cache quantization method for long-context LLM serving. Unlike prior rotation-based approaches that apply data-oblivious Hadamard transforms, OSCAR derives separate rotations for keys and values from attention-aware covariance structures estimated offline. At 2.28 bits per KV element, OSCAR reduces the BF16 accuracy gap to 3.78 points on Qwen3-4B-Thinking-2507 and 1.42 points on Qwen3-8B, while delivering approximately 8× KV memory reduction and up to 3× decode speedup at 100K context length.

  • OSCAR is a 2-bit KV cache quantization method using attention-aware rotations that maintain near-BF16 accuracy.
  • It derives rotations from query and value covariances via offline calibration, directing quantization noise to attention-insensitive directions.
In-site article
Agents

Some ideas for what comes next, May 2026

2026 continues to accelerate AI progress with open models lagging in agentic capabilities, Google's Gemini not yet competitive with Claude Code/Codex, American open models rising, a fierce competition between Anthropic and OpenAI, and power structures asserting control.

  • Open models are 5-6 months behind in agentic capabilities, likely extending to 12+ months.
  • Google's Gemini lacks a clear competitor to Claude Code and Codex.
In-site article

Visual Debugging Tools for Machine Learning Workflows

In this article, we cover three topics: what to visualize during training, the tools that provide those visualizations, and the methods to capture model computations directly using hooks and breakpoints.

  • Visualizing loss curves and gradient magnitudes helps detect overfitting and vanishing gradients.
  • TensorBoard, Weights & Biases, Sacred, and Guild.ai are popular debugging tools.
In-site article

Chunk sidecars

CircleCI introduces Chunk sidecars to validate agent-generated code before it reaches CI, enhancing quality and security.

  • Chunk sidecars pre-validate AI-generated code.
  • Prevents faulty code from entering CI pipeline.
In-site article

Who Authorized That? The Delegation Problem in Multi-Agent AI

AI agents delegate tasks across systems, but current architectures lack authorization models for these delegation chains, creating security gaps like ghost permissions and broken audit trails.

  • Multi-agent delegation often creates 'ghost permissions' that no one explicitly authorized.
  • Current protocols (MCP, A2A) solve connectivity but not authorization in delegation chains.
In-site article

AgenticCalling AI

Give your AI the power to make phone calls

  • AgenticCalling AI enables AI to make phone calls
  • Integrates with existing AI systems
In-site article

The AI justice gap solution is slowly turning into an existential paperwork nightmare for US federal courts

A new study from MIT and the University of Southern California shows that lawsuits filed without a lawyer at US federal courts have nearly doubled since ChatGPT went mainstream. One in five complaints now contains AI-generated text. Judges are resorting to drastic measures to cope with the flood of filings.

  • Pro se litigation rate jumped from 11% to 16.8%, with 41,490 cases in 2025, nearly double pre-AI average.
  • AI text detection shows 18% of federal complaints contain AI-generated text in early 2026.
In-site article

Autonomous AI systems test governance in physical environments

As autonomous AI systems expand from software into warehouses, delivery networks, and public spaces, existing governance frameworks face new challenges. Singapore's IMDA released an updated framework for agentic AI, focusing on risk assessment, accountability, and technical controls. Companies like Grab, JPMorgan, and Walmart are testing autonomous systems, while issues of safety, responsibility, and monitoring remain key concerns.

  • Autonomous AI systems introduce new risks in physical environments, affecting infrastructure and safety
  • Singapore's IMDA releases Agentic AI governance framework with iterative risk management
In-site article

Calling Skills for AI Agents

CometChat launches Calling Skills, allowing AI coding agents to integrate HD voice and video calling with a single skill file, supporting Ringing and Session modes, 23-point verification, and multiple frameworks.

  • CometChat introduces Calling Skills for AI agents, enabling quick integration of voice/video calling.
  • Two integration paths: Ringing (full call surface) and Session (link-driven).
In-site article

10 Everyday Tasks You Can Automate with AI Today (With n8n Templates)

This article presents 10 everyday tasks that can be automated using AI and the low-code platform n8n, complete with ready-to-use workflow templates. Tasks include job application assistance, email management, meeting notes, calendar scheduling, daily briefings, newsletters, social media posting, blog repurposing, lead generation, and invoice processing. Each section describes what the workflow does and provides a link to the template. The article emphasizes starting small and customizing workflows for personal needs.

  • AI automation with n8n requires minimal coding, making it accessible to non-developers.
  • Covers 10 common scenarios: job hunting, email, meetings, calendar, briefings, newsletters, social media, blog repurposing, lead generation, and invoices.
In-site article

AI Builds AI: Chinese Company Achieves World First with Self-Written Training Framework

ModelBest (面壁智能) unveils ForgeTrain, the world's first production-grade LLM pretraining framework entirely written by AI, which outperforms NVIDIA's Megatron by 10%. The framework was used to train MiniCPM5-1B, a compact model that sets new records for intelligence density among sub-2B models.

  • ForgeTrain is the first production-grade LLM pretraining framework fully generated by AI.
  • It achieves 10% faster training than NVIDIA Megatron on equivalent hardware.
In-site article

AI Claims 9 Erdős Problems: Google DeepMind’s AlphaProof Nexus Solves Decades-Old Math Puzzles

Google DeepMind's AlphaProof Nexus, powered by Gemini 3.1 Pro and the Lean theorem prover, has cracked 9 open problems from the Erdős list, including one unsolved for 56 years. It also proved 44 OEIS conjectures, solved a 15-year-old algebraic geometry problem, and improved a convex optimization bound — all at a cost of a few hundred dollars per problem.

  • AlphaProof Nexus solved 9 Erdős problems, 44 OEIS conjectures, and a 15-year-old algebraic geometry problem.
  • The system uses a loop of LLM (Gemini 3.1 Pro) and Lean compiler feedback, with four increasingly sophisticated agent variants.
In-site article

Meet OmniVoice Studio: A Local, Open-Source Alternative to ElevenLabs

OmniVoice Studio runs voice cloning, video dubbing, real-time dictation, and speaker diarization entirely on your own hardware. No API keys, no cloud account, and no subscription required. The project supports 646 languages for TTS and exposes an MCP server for integration with Claude, Cursor, or any MCP client.

  • Fully local operation with no cloud dependencies or subscription fees.
  • Supports 646 languages for TTS and 99 for transcription via WhisperX.
In-site article

Karpathy's Latest Title at Anthropic: Member of Technical Staff (MTS)

Andrej Karpathy updated his X bio to 'MTS @Anthropic', sparking debate about flat hierarchies. While supporters praise the anti-bureaucratic culture, critics argue it devalues individual achievements and may harm career mobility for lesser-known employees.

  • Karpathy's MTS title at Anthropic ignites online controversy
  • Many top talents at Anthropic and OpenAI share the MTS title, with salaries ranging from $210k to $530k
In-site article

Huawei Launches AI DC Data Infrastructure Full-Stack Solution to Accelerate Industry Intelligence Leap

Huawei unveiled its AI DC full-stack data infrastructure solution at the 2026 Innovation Data Infrastructure Forum in Paris, covering data lake, knowledge & memory platform, model engineering, agent framework, and data resilience to accelerate enterprise AI adoption.

  • Huawei launched AI DC full-stack data infrastructure solution at Paris forum
  • Solution includes data lake, knowledge/memory platform, model engineering, agent framework, and data resilience
In-site article

Why and How to Run Local Models in Zed

Local models offer privacy, cost savings, control, and availability. While not as capable as frontier models, they are improving. This post explains how to set up local models in Zed using LM Studio, Ollama, or llama.cpp, and offers tips for effective use.

  • Local models provide privacy, lower cost, control, and always-availability.
  • They are less capable and slower than frontier models, but suitable for many tasks.
In-site article

Show HN: Desktop GUI sandbox for AI agents and MCP servers

nilbox is a desktop GUI sandbox that provides real VM isolation for AI agents, using a zero-token architecture to keep API keys secure. It supports MCP servers, domain gating, and token usage monitoring.

  • nilbox runs AI agents inside a full virtual machine, not a container.
  • API keys are never exposed to the guest; the host proxy swaps them for trusted domains.
In-site article

BobCA

BobCA is a sovereign agent that learns to code with your preferences.

  • Learns to code autonomously
  • Adapts to user preferences
In-site article

IsaacIPC: Coupling High-Fidelity Simulation and Realistic Rendering for Contact-Rich Robotic Systems

This paper presents IsaacIPC, a robotic simulation framework coupling GPU-accelerated Incremental Potential Contact (IPC) with IsaacSim/Lab. It maps simulated deformation between simulation and visual meshes for real-time realistic rendering, aiding data collection and policy evaluation. It also introduces the Geometric Mortar Contact Potential (GMCP) for improved tactile sensing contact-pressure resolution. Evaluated on contact benchmarks and demonstrated on rigid-deformable simulations including a quadruped robot, dexterous hand, and UMI gripper.

  • IsaacIPC bridges high-fidelity simulation with real-time realistic rendering for contact-rich robotics.
  • Introduces Geometric Mortar Contact Potential (GMCP) to better resolve contact-pressure distributions on tactile surfaces.
In-site article

MASt3R-Nav: WayPixel Navigation in Relative 3D Maps

A new visual navigation method called MASt3R-Nav uses pixel-relative connectivity to build geometrically accurate maps without requiring global consistency, enabling more capable navigation than traditional topological graphs.

  • Proposes pixel-relative connectivity map as a novel representation.
  • Uses 3D grounded image matching for inter-image pixel correspondences.
In-site article

Deep Learning-Based Automated Quantification of TIMI Myocardial Perfusion Frame Count (DL-TMPFC) from Coronary Angiography: A Novel Framework for Rapid Assessment of Microvascular Dysfunction

Coronary microvascular dysfunction (CMVD) affects approximately 40%-60% of patients with ischemia and non-obstructive coronary arteries, yet diagnosis remains challenging due to reliance on invasive functional testing or subjective Thrombolysis In Myocardial Infarction (TIMI) flow grade. The TIMI Myocardial Perfusion Frame Count (TMPFC) offers an objective, angiography-based quantitative measure of CMVD, but its clinical translation is hindered by cumbersome manual calculation and insufficient validation. This study aims to develop and validate a deep learning-powered TMPFC calculation (DL-TMPFC), enabling integration into clinical workflows. In a cohort of 655 patients from three independent institutions, DL-TMPFC showed excellent agreement with expert manual measurements (bias: -0.93 frames; 95% LoA: -5.33 to +3.47; r =0.98). DL-TMPFC markedly enhanced clinical feasibility by fully automating TMPFC and removing observer dependence, accurately identifying CMVD across a full spectrum of coronary pathologies and capturing continuous severity for quantitative risk stratification.

  • DL-TMPFC automates TMPFC calculation using a stenosis detection network and a territory-aware segmentation network.
  • Validated on 655 patients from three institutions with high agreement to manual measurements (r=0.98).
In-site article

RAW: Robust Avatar Watermarking -- Benchmarking and Baseline

Digital avatar watermarking faces unique challenges: avatars are routinely post-processed with background replacement, reframing, and format conversion before deployment. This paper introduces the RAW benchmark with 50 synthetic avatar videos from 5 providers and 6 attacks simulating real-world workflows. Evaluation of 7 existing methods reveals that avatar-specific attacks degrade watermark recovery. The proposed WALT method embeds watermarks in UV texture space via 3D face reconstruction, achieving 92.4% robustness to zoom and 95.6% on background removal. The benchmark is released to facilitate research.

  • Avatar watermarking faces challenges like background removal and reframing.
  • RAW benchmark includes 50 synthetic avatar videos and 6 attacks.
In-site article

Operationalizing Reconstructive Authority: Runtime Construction, Dependency Resolution, and Execution Gating in Autonomous Agent Systems

This paper introduces a runtime execution model that enforces Reconstructive Authority (RAM) in autonomous agent systems: actions are permitted only if authority can be constructed from the current state. It extends the admit/deny state space with a third state, halt, for cases where authority is undefined due to incomplete or uncertain observability. A concrete execution protocol is defined including dynamic dependency resolution, authority reconstruction, and explicit decision semantics. A Recovery Loop integrates drift detection (IML) with execution control (ACP) to suspend execution, acquire missing information, and retry authority reconstruction. The model guarantees safety (no action without constructible authority) and conditional liveness (execution resumes when authority-defining variables become observable).

  • Autonomous agent systems fail not only due to incorrect decisions, but due to executing decisions whose authority no longer holds at runtime.
  • The model introduces a third state 'halt' to handle cases where authority is undefined due to uncertain observability.
In-site article

Quantum Frog: Emergent Cooperation and Difficulty Scaling in a Quantized-Time Cooperative Game

This paper introduces Quantum Frog, a two-player cooperative game with a quantized-time mechanic. Using reinforcement learning, the authors analyze difficulty scaling, optimal single-agent policy, cooperation gap, and emergent strategies. Key findings: the rush strategy is optimal; adding an uncoordinated player is harder than sextupling traffic; cooperative training boosts success rate by 32–34 percentage points; the emergent strategy is synchronized rushing.

  • The quantized-time mechanic makes the rush strategy universally optimal by minimizing time exposure to traffic.
  • Adding an uncoordinated second player is harder than sextupling traffic for a single expert player.
In-site article

Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction

A new paper presents Context, the intelligence layer of the Magarshak Architecture, which replaces reactive chatbots with proactive goal-directed agents. The architecture relies on write-time context assembly, composable sandboxed wisdom programs, and proactive goal stream state machines. It proves six theorems including context stability and proactive dominance.

  • Replaces reactive chatbots with proactive agents that advance tasks without waiting for prompts.
  • Three mechanisms: write-time context assembly, composable sandboxed programs, proactive state machines.
In-site article

Show HN: AgentToolBench-Code – security benchmark for AI coding agents

Allen Wu introduces AgentToolBench-Code, an open-source benchmark that evaluates AI coding agents on 16 security scenarios. Testing Claude Code Sonnet 4.6 and Haiku 4.5 reveals that Sonnet scores +9 (12 caught, 3 silent fail, 1 noop) vs Haiku's +3 (8 caught, 5 silent fail, 3 noop). The initial tie was due to a small corpus; the expanded set shows Sonnet's advantage in pattern recognition. Both models share structural failures in dependency trust and budget discipline. The work is reproducible for ~$3.50 in API costs and encourages community contribution.

  • AgentToolBench-Code is an open-source benchmark for security failures in AI coding agents.
  • Expanded to 16 CVE-class scenarios; Sonnet 4.6 significantly outperforms Haiku 4.5.
In-site article

AIntegriX: The First Open-Source ACP Orchestrator for Multi-Agent Coordination

AIntegriX is an open-source server that coordinates multiple ACP agents via a single API, enabling parallel execution, pipelines, and intelligent routing.

  • AIntegriX acts as an ACP multiplexer, spawning agents as subprocesses and exposing them through a single MCP/REST endpoint.
  • It supports orchestration modes: parallel, race, jury, and pipelines, along with auto-routing and webhook triggers.
In-site article

Corey Quinn: Pope's AI Encyclical Is 'Greatest Act of Vendor Lobbying'

Corey Quinn comments on Pope Leo XIV's AI encyclical Magnifica Humanitas, which was influenced by Anthropic co-founder Christopher Olah. Quinn calls it the single greatest act of vendor lobbying.

  • Pope Leo XIV releases first AI encyclical Magnifica Humanitas
  • Anthropic co-founder Christopher Olah influenced the document
In-site article

Cited AI Workspace: No More Re-Uploading Files

UUMuse is a cloud AI knowledge base platform where you upload files once and use them across GPT, Claude, DeepSeek, Qwen, and more — with cited answers, persistent memory, agent mode, a multi-expert debate feature (Spark), and flexible deployment as docs sites, APIs, or MCP servers.

  • Upload files once and query multiple AI models (GPT, Claude, DeepSeek, Qwen) with source citations.
  • Persistent memory remembers your writing style and project context across conversations.
In-site article

AI SEO: compare with your competitor

This article introduces the process of SEO competitor analysis, including keyword gap analysis, a five-step method, and regular check cadence. It also highlights Fox AI's free competitor analyzer tool that uses Lighthouse audits to generate actionable playbooks.

  • SEO competitor analysis studies why rival sites outrank you. Keyword gap analysis uncovers untapped keywords. Full analysis quarterly, light check monthly. Fox AI provides free tool with real audit data.
In-site article

What is 'pink-slime' journalism and has it infiltrated Australian media?

Experts warn that AI-generated news sites masquerading as local outlets, known as 'pink-slime' journalism, have appeared in regional Australia, raising concerns about misinformation and erosion of trust in media. The sites were traced to an Australian living overseas who called it a failed experiment.

  • AI-generated news sites targeting regional WA communities were traced to an Australian living overseas.
  • The sites, including The Bunbury Guardian, were taken down after ABC investigation.
In-site article

The evolution of AI-assisted software engineering paradigms

The software industry is undergoing an unprecedented metamorphosis. From simple statistical completion of early coding assistants, through conversational chatbots and the failure of multi-agent systems, we have arrived at the era of the Agentic Loop. This comprehensive guide analyzes the entire evolution, from the Completion paradigm to the revolutionary Ralph Loop that is redefining how we write code.

  • AI-assisted coding evolved from statistical code completion (2021-2022) to the Agentic Loop paradigm.
  • Tools like Codex and GitHub Copilot were based on statistical models, lacking task understanding and long-term reasoning.
In-site article

Step by Step Guide to Build and Compare FedAvg and FedProx Federated Learning on Non-IID CIFAR-10 with NVIDIA FLARE

This tutorial provides a detailed guide to building an advanced federated learning experiment using NVIDIA FLARE, comparing FedAvg and FedProx on a non-IID CIFAR-10 dataset. Client data is partitioned using a Dirichlet distribution to simulate realistic label imbalance. The NVFlare Job API is used to define and launch federated jobs, while the Client API handles local training and model exchange. Complete code implementation and experimental results visualization are provided.

  • Build federated learning experiments with NVIDIA FLARE to compare FedAvg and FedProx.
  • Use Dirichlet distribution (alpha=0.3) to partition CIFAR-10 into 3 non-IID clients.
In-site article

Anthropic co-founder Chris Olah's remarks on Pope Leo XIV's encyclical "Magnifica humanitas"

Anthropic co-founder Chris Olah spoke at the Vatican on Pope Leo XIV's encyclical on AI, highlighting the need for ethical scrutiny, global responsibility, and moral imagination. He identified three key questions for the Church: duty to the global poor, human flourishing, and discernment of AI's nature.

  • Chris Olah addressed the Vatican on the Pope's AI encyclical.
  • He acknowledged the incentives that can conflict with doing right in AI labs.
In-site article
Policy

Spotify boss defends move to AI music, saying it is better than ‘slop’

Streaming platform says remix tool agreed with Universal Music Group will protect artists from piracy

  • Spotify CEO defends AI-generated music as alternative to piracy and unregulated AI slop
  • New feature allows premium users to create AI remixes and covers from participating artists
In-site article

Google Cloud COO says AI security belongs in the boardroom, not just the server room

Google Cloud COO Francis de Souza urges companies to integrate security into their AI strategy from day one, emphasizing that AI security is a boardroom issue, not just a technical one.

  • Google Cloud COO calls for security to be built into AI strategy from the start
  • AI security needs attention and resources at the board level
In-site article

AI warfare is already here

From the 2017 "Slaughterbots" video to Anthropic's ongoing battle with the Pentagon, AI's role in warfare has moved from science fiction to reality. This article traces the evolution of AI warfare, highlighting Project Maven, the ambiguity of autonomous weapons definitions, the failure of international regulation, and the complex relationship between tech companies and the military.

  • The 2017 Slaughterbots video and Project Maven demonstrated the real-world threat of AI weapons, with Google initially involved.
  • Anthropic's attempt to impose red lines against autonomous lethal weapons faces pushback from the US government.
In-site article

The Good Robot podcast: the future of data centres and digital sovereignty with Friederike von Franqué

This episode of The Good Robot explores how feminist principles and decentralized infrastructure could transform cloud infrastructure from a corporate service into a public commons. Friederike von Franqué, policy advisor at Wikimedia Germany, discusses examples from Frankfurt's energy-intensive data centres to Stockholm's municipally owned fibre network, advocating for environmental accountability and community-driven design.

  • Friederike von Franqué advocates for feminist and decentralized approaches to cloud infrastructure.
  • The episode contrasts Frankfurt's high-energy data centres with Stockholm's communal fibre network.
In-site article

Pawse.ai

An acoustic regulation system for dogs.

  • Pawse.ai is an acoustic regulation system for dogs.
  • It uses sound to regulate dog behavior.
In-site article

Cognitive Security as an AI Safety Cause Area

This article explores cognitive security as a key subfield of AI safety, focusing on protecting human cognition from AI-induced harms such as misinformation and cognitive biases. It discusses the relationship between cognitive security and AI safety, research directions, and challenges.

  • Cognitive security is an important cause area within AI safety.
  • It addresses risks like misinformation and exploitation of cognitive biases by AI.
In-site article

Investigating the Effect of a Series Elastic Actuation Retrofit to Black-Box Actuators

A retrofit of a custom series elastic element to a black-box actuator improved force control bandwidth from 10.32 Hz to 30.32 Hz (2.93x), outperforming a commercial sensor by 7.63% at a cost of 25 GBP.

  • A torsional SE element was designed with stiffness 2155.4 Nm/rad via FE analysis.
  • Open-loop force control bandwidth increased by 2.93x after retrofit.
In-site article

Algometrics: Forecasting Under Algorithmic Feedback

The paper introduces 'algometrics', a framework for time series forecasting when predictive models influence the data they aim to predict. It distinguishes historical risk from deployment risk and proves three key results: deployment risk is not identifiable from passive historical data alone; historical model rankings can invert due to crowding; and randomized actions can identify short-horizon linear feedback with a finite-sample bound. The findings suggest that benchmarks in algorithmic markets should report feedback sensitivity alongside predictive accuracy.

  • Introduces algometrics framework for forecasting with algorithmic feedback.
  • Proves deployment risk is not identifiable from passive historical data alone.
In-site article

This big university system is embracing AI. Students and faculty aren't on board

The California State University system has inked multi-million dollar contracts with OpenAI to provide ChatGPT Edu, but a survey reveals majorities of students and faculty are skeptical of AI's educational benefits, worrying about impacts on jobs, creativity, and the environment.

  • California State University signed a $13 million annual contract with OpenAI to become the first AI-powered university system.
  • Survey shows 65% of students and 59% of faculty doubt AI's overall benefit to education, despite widespread use.
In-site article

Wyoming Company Uses High-Tech AI Sprinklers to Save Homes from Wildfire

As Wyoming faces another tinderbox fire season, high-tech home fire systems are starting to catch on across the West. One of the fastest growing is a Jackson Hole, Wyoming, company that makes AI sprinklers that are saving homes from wildfires.

  • Frontline Wildfire Defense's AI sprinkler system activated at 61 properties during California's Palisades Fire, losing only 2 homes due to embers entering ventilation.
  • Wyoming faces extreme drought and fire risk in 2026, reminiscent of the 1988 Yellowstone fires.
In-site article
Tools

Nobody wants to tell me why they only listen to their own Suno slop

A look at the alarming trend of Suno AI users exclusively listening to their own AI-generated music, and the possible reasons behind it. The author found that many users boast about abandoning Spotify, but none were willing to explain why they prefer AI slop over real music. Theories include narcissism or laziness, with the author leaning towards laziness.

  • Users on the Suno subreddit boast about listening only to AI-generated music, abandoning Spotify.
  • The author couldn't find anyone willing to explain why they prefer AI slop over real music.
In-site article

Crypto Code Commits Fall 75% as Developers Move to AI Projects

Blockchain ecosystems are losing developers across the board while AI projects dominate GitHub growth. Weekly crypto commits dropped from about 850,000 to 210,000 since early 2025, and active developers declined 56% to around 4,600.

  • Weekly crypto commits have fallen roughly 75% since early 2025.
  • Active developers declined 56% to about 4,600.
In-site article

ContextVault – Local-First AI Conversation Recorder for ChatGPT, Claude, Gemini

ContextVault is a browser extension that captures AI conversations in real-time across major LLM platforms like ChatGPT, Claude, and Gemini, storing them locally in IndexedDB. It allows one-click export as Markdown or ZIP, ensuring your data never leaves your device. Free, open source, no accounts or backend required.

  • Real-time capture across 7 LLM platforms including ChatGPT, Claude, and Gemini.
  • All data stored locally in IndexedDB, no cloud sync or third-party access.
In-site article
Chips

Import AI 458: Reckoning with the future; and a singularity story

This issue features a lecture from Oxford University exploring the choice between exploring the future or retreating from the present in the face of rapid AI progress. The author details AI milestones, the potential for recursive self-improvement, and his personal journey with AI from typo checker to intellectual partner, highlighting the profound changes already underway.

  • AI progress, as measured by the Epoch Capabilities Index, is accelerating rapidly, with milestones from bar exam to Math Olympiad gold medals.
  • The author argues that we must choose to explore the future rather than retreat, embracing the power and risks of AI.
In-site article

RED: Adaptive Real-Time DAG Scheduling for Robotic Inference under Environmental Dynamics

RED is a real-time scheduling framework for multi-task deep neural network workloads on resource-constrained robotic platforms. It adapts to runtime environmental changes by assigning intermediate sub-deadlines, leveraging MIMONet weight sharing, and reconstructing computation graphs. Implemented on NVIDIA Jetson and Apple M-series platforms, RED consistently outperforms existing methods in throughput, deadline satisfaction, robustness, adaptability, and overhead.

  • RED assigns intermediate sub-deadlines to accommodate evolving computation graphs and asynchronous inference.
  • It leverages MIMONet's shared parameters to improve schedulability through workload refinement and graph reconstruction.
In-site article

Best Buy just dropped this 64GB Kingston DDR5 RAM kit to under $1,000

With AI companies snapping up RAM and storage drives to build data centers and train LLMs, prices on vital components and pre-built rigs have skyrocketed. This means that gamers and DIY PC builders are putting upgrades and new purchases on hold in the hope that the market will eventually correct itself. If you've been putting off a new build or upgrade, this Memorial Day weekend, you can save $176 on the kit at Best Buy, bringing the price back to just under $1,000.

  • 64GB (2×32GB) DDR5 RAM kit discounted by $176 to $999.99.
  • Supports AMD Expo and Intel XMP 3.0 overclocking up to 6400MHz.
In-site article
Research

Uber president says AI spending is getting ‘harder to justify’

Uber president Andrew Macdonald says it's 'hard to draw a line' between AI spending and deliverable features, as the company reportedly exhausted its annual AI budget four months into 2026.

  • Uber exhausted its annual AI budget four months into 2026
  • President questions direct link between AI spending and user features
In-site article

PACT: Proactive Asking for Continual Task Assistance in Human-Robot Collaboration

Robotic assistants in long-term human-robot collaboration need to assist users under partial observations while leveraging cross-day interaction history. Since human traits are often unknown initially, passive infer-then-act is ineffective. We propose PACT, an ask-or-act framework that evaluates contextual sufficiency to decide whether to seek clarification before acting. Using reinforcement learning, PACT improves assistance accuracy and clarification utility over passive baselines in multi-day embodied scenarios.

  • PACT framework enables robots to proactively ask for clarification when needed, improving assistance reliability.
  • Implemented via reinforcement learning, introducing a clarification utility metric. Outperforms passive inference in multi-day collaborations.
In-site article

AcroRL: Learning Aggressive Quadrotor Inversion using Bidirectional Thrust

This paper proposes a reinforcement learning framework that modulates a constant reference trajectory to perform compact, position-constrained quadrotor inversions while remaining compatible with traditional trajectory generation and tracking. In simulation, the method reduces position RMSE by 32% and settling time by 57% relative to the strongest optimization-based baseline. Hardware experiments demonstrate successful inversion across multiple yaw configurations with position RMSE below 0.35m.

  • Bidirectional thrust enables inverted flight, perching, and sensing for quadrotors.
  • Prior methods struggle with actuator saturation and motor reversal delay.
In-site article

Remote sensing data imputation using deep learning for multispectral imagery

A study compares deep learning models with linear interpolation for imputing missing satellite data due to cloud cover in aquatic monitoring. CNN-based models, especially CNN, outperformed linear interpolation across four lakes. The imputed data improved the reliability of algal bloom detection using PlanetScope SuperDove imagery.

  • Deep learning models significantly outperform linear interpolation for filling gaps in multispectral satellite data.
  • CNN achieved the best performance across most of the four studied lakes.
In-site article

A Large-Scale Dataset and Benchmark: Do Protein-Ligand Models Learn Binding Sites or Just Binding Likelihood?

Protein-ligand modeling underpins computational drug discovery. Existing benchmarks typically evaluate whether a protein and ligand interact and how strongly they bind, but provide limited evidence of whether models can localize binding sites or identify non-covalent interactions. To address this, we introduce InteractBind, a large-scale dataset of ~100k protein-ligand pairs with a benchmark for fine-grained evaluation. The core task is binding-site localization using interaction maps of six non-covalent interaction types. Evaluating eight existing models reveals limited binding-site localization despite strong binary binding prediction, with marked variation across interaction types. InteractBind encourages development of more interpretable and physically grounded models.

  • InteractBind includes ~100k protein-ligand pairs and a benchmark focused on binding-site localization.
  • It uses residue-atom interaction maps covering six non-covalent interaction types to assess model understanding.
In-site article
Startups
Robotics

This lab-tested robot vacuum picked up more dirt than any other - and it's on sale

The Ecovacs X8 Pro Omni won the ZDNET Lab Award for best pickup performance in a robot vacuum, and it's currently $67 off for Memorial Day.

  • The Ecovacs X8 Pro Omni achieved the highest average sand pickup score (60.28%) among 10 tested robot vacuums.
  • It features a self-cleaning mop roller and dual water tanks for simultaneous vacuuming and mopping.