A vulnerability in Microsoft Copilot Cowork allows attackers to exfiltrate OneDrive files through prompt injection and external images in automatically sent emails.
Copilot Cowork agents can send emails to user's inbox without approval
External images in emails can trigger network requests, leaking data
In a Decoder interview after Google I/O, CEO Sundar Pichai discusses Google's AI-first pivot, the restructuring of DeepMind, the controversial AI Overviews in Search, the 'Google Zero' phenomenon, and his thoughts on AGI.
Google merged Brain and DeepMind into Google DeepMind and centralized AI infrastructure.
Search is evolving with AI Overviews and the Gemini Spark agent platform.
An audit of 2.5 million biomedical papers by Columbia University and other institutions shows that the rate of fabricated references has increased more than twelvefold since 2023. The researchers suspect a link to the widespread use of language models - the fake references match their paper's topic, follow correct formatting, and are nearly impossible to spot. 98 percent of the affected papers have received no response from their publishers.
Audit of 2.5M biomedical papers shows fabricated reference rate up >12x since 2023
Fake references match topics, formatting; nearly undetectable
Text diffusion models challenge the autoregressive paradigm by generating text through iterative denoising, treating generation as editing rather than typing. Three key systems define the field: LLaDA (proof of scaling), Mercury (commercial speed advantage), and Gemini Diffusion (frontier validation), representing the three phases of a new architecture class: scientific proof, industrial deployment, and frontier validation.
Text diffusion models generate text by iterative refinement from noise, using bidirectional context.
LLaDA proved diffusion can scale to a large language model.
Kuaishou releases Keye-VL-2.0-30B-A3B, a multimodal large language model that first applies DeepSeek Sparse Attention (DSA) to multimodal scenarios, enabling 256K ultra-long context deep perception. It achieves SOTA on long-video temporal understanding benchmarks and introduces built-in Agent collaboration, paving the way for enhanced reasoning and real-world business applications.
First to integrate DSA attention into multimodal, solving long-video understanding bottlenecks.
Achieves SOTA on TimeLens, LongVideoBench, MLVU; reverses long-context decay by boosting accuracy from 35.34% to 42.44% when scaling from 64 to 512 frames.
In this tutorial, we explore the TuringEnterprises/Open-MM-RL dataset as a practical foundation for multimodal reasoning and reinforcement learning with verifiable rewards. We load the dataset, inspect its schema, analyze domains, formats, question lengths, answer types, and image distributions, and visualize representative examples from each domain. We also build a lightweight reward function that checks exact, numeric, fractional, LaTeX, and symbolic answers, giving us a useful way to evaluate model outputs. Finally, we format prompts for vision-language models, optionally test SmolVLM on sample examples, and export the dataset into a GRPO-style structure for future multimodal RL training.
Load and analyze the Open-MM-RL dataset, including domain distribution, image statistics, and answer types.
Build a lightweight verifiable reward function supporting exact, numeric, fractional, LaTeX, and symbolic answers.
This book repository from Packt covers the full lifecycle of operationalizing AI with Docker, including running local LLMs, integrating MCP, building autonomous agents, and orchestrating multi-agent systems on Kubernetes.
Practical guide for production AI with Docker's integrated toolkit.
Covers Docker Model Runner, MCP Gateway, multi-agent architectures, and Kubernetes orchestration.
Alibaba's latest flagship model Qwen3.7-Max achieved a score of 1541 on the authoritative Code Arena leaderboard, surpassing GPT-5.5 and other models, ranking second globally behind the Claude series.
Qwen3.7-Max scored 1541 on Code Arena, ranking second only to Claude.
Code Arena is a blind-test platform where developers submit full web app challenges.
Google unveils Gemini 3.5 and Gemini Spark agent, plus Gemini Omni multimodal video generation; Elon Musk loses OpenAI lawsuit on statute of limitations; Anthropic agrees to $30B funding at $900B valuation; AI solves 80-year-old Erdős geometry problem.
Google launches Gemini 3.5 and always-on agent Gemini Spark with MCP tool support.
Gemini Omni converts images, audio, and text into video.
GPT Image 2 is OpenAI's latest image model with sharp text rendering and photorealism. The article introduces imagesv2.ai, a platform offering free credits, templates, and tools like panorama, tweet screenshot, and WeChat chat generators. Pricing starts at $4.16/month with yearly plans.
GPT Image 2 excels at text rendering and photorealistic images.
imagesv2.ai provides free credits and 50+ templates.
Kunlun Tech releases SkyClaw-v1.0 and its lightweight version SkyClaw-v1.0-lite, native Agent models that rival top players like Claude Opus 4.6. Priced at half or less of mainstream models, with limited-time free access and future open-source plans, they deeply integrate with OpenClaw, Claude Code, and other mainstream frameworks, and are compatible with OpenAI APIs.
Kunlun Tech launches SkyClaw-v1.0 and SkyClaw-v1.0-lite, native Agent models achieving global top-tier performance.
Priced at half or less than leading models, currently free for a limited time, with planned open-source releases.
Planetary rovers face mobility challenges on varying terrains. Researchers introduce a multimodal wheel that continuously adjusts grouser height. In 750 trials across four surfaces, adaptive deployment reduced slip by 30-58% and improved travel time and energy by up to 77.4% on granular terrains, highlighting limitations of fixed wheels.
A novel wheel with adjustable grouser height adapts to different terrains
750 experiments show slip reduction of 30-58% and up to 77.4% improvement in travel time and energy on granular surfaces
Researchers propose a new anisotropic diffusion method for ergodic search in multi-robot systems, overcoming the uniform error propagation of traditional isotropic diffusion, using Perona-Malik diffusion gradient to guide robot motion for more flexible coverage.
Traditional ergodic search uses isotropic diffusion (heat equation), causing uniform error propagation in all directions.
The new method introduces anisotropic diffusion (Perona-Malik), using gradient to guide robot motion for more flexible matching of target distribution.
ActQuant is an action-guided mixed-precision post-training quantization framework for Vision-Language-Action (VLA) models, enabling sub-4-bit weight quantization through a two-stage approach that maintains high success rates on the LIBERO benchmark and a real UR3 robotic arm, significantly reducing memory footprint.
ActQuant employs action-aware mixed-precision quantization to preserve VLA model performance under sub-4-bit weight quantization.
The two-stage framework includes an inter-tensor bit allocator and an intra-tensor scale optimizer focusing on action-critical weights.
Researchers propose a brain-to-image system that decodes visual stimuli from EEG signals recorded during natural image viewing. It handles EEG-to-image retrieval (86.30% Top-1 accuracy among 200 candidates) and EEG-to-image reconstruction (CLIP score 0.903). The method uses multi-level blurring, EVNet features, InfoNCE loss, and multi-modal CLIP alignment with SDXL-Turbo generation, demonstrating feasibility of decoding rich visual representations from EEG.
EEG-to-image retrieval achieves 86.30% Top-1 and 98.55% Top-5 accuracy over 200 candidate images.
EEG-to-image reconstruction uses CognitionCapturerPro with multi-modal CLIP embeddings and SDXL-Turbo, achieving CLIP score 0.903.
Nano World Models is a minimalist codebase for future video prediction centered on diffusion forcing. It provides a unified interface for generative objectives, model scales, action-conditioning mechanisms, latent observation spaces, datasets, evaluation protocols, and long-horizon rollouts, enabling controlled studies of world-modeling components. Experiments across control environments, games, and real-robot data validate its effectiveness. Code, configs, and pretrained checkpoints are released for open, reproducible research.
Nano World Models is a minimal, reproducible codebase for future video prediction research.
It integrates key design components like generative objectives, model scales, and action conditioning around diffusion forcing.
GazeWorld is a medical imaging world model that treats the image as the world and radiologist fixation sequences as trajectories. It autoregressively predicts latent representations of fixated patches while using a spatial-completion branch for unvisited regions. At inference, it generates patch representations from the image alone without real gaze data. Frozen GazeWorld features achieve state-of-the-art diagnostic accuracy on all nine supervised settings across CheXpert, RSNA Pneumonia, and SIIM-ACR Pneumothorax, and highest zero-shot accuracy on all three benchmarks. On GazeSearch, a generic decoder trained on the same frozen features outperforms the purpose-built LogitGaze-Med by over 16% in ScanMatch and 22% in SED. The work demonstrates that modeling how experts read, not just their conclusions, offers a promising pretraining paradigm for medical imaging AI.
GazeWorld leverages radiologist eye-tracking data as reading trajectories for autoregressive prediction and spatial completion.
It requires no real gaze data at inference, generating patch sequences from images alone.
Large language models (LLMs) are increasingly used as automatic judges for summarization and dialogue evaluation. Prior work has documented biases such as position, verbosity, and style preferences, but largely focuses on outcomes. This paper asks whether LLM judges are cue-invariant, introducing a causal framework with interventions and metrics to test stability of rankings and explanations under non-evidential cue perturbations. Results show substantial cue-anchored rationalization, effectively mitigated by the PROOF-BEFORE-PREFERENCE method.
LLM judges exhibit cue-anchored rationalization bias, where non-evidential cues affect their explanations.
The paper develops interventions (Blind, Truth, Flip, Placebo, Reveal-After) and tie-aware metrics to quantify outcome and rationale anchoring.
A tri-validation framework that performs explicit validation at three stages of automatic optimization modeling: semantic specification, mathematical formulation, and code generation, with a new benchmark NL4COP for combinatorial problems.
TriVAL performs explicit validation at three stages of automatic optimization modeling.
The framework uses a construct-validate-revise loop to catch errors early and prevent accumulation.
This study develops a large language model-based framework to extract segment disclosures directly from Form 10-K filings, preserving both reportable and nested segment information. A retrieval augmented system is designed to incorporate information across multiple filings to support comparability. The framework is demonstrated in longitudinal analysis within firms and cross-firm alignment of geographic segments, showing accurate extraction and effective handling of cross-period queries.
Segment disclosures are central to financial reporting but face completeness and comparability issues due to dispersed formats in 10-K filings.
Proposes an LLM-based framework to extract segment info, including nested segments, directly from 10-Ks.
This paper presents the Multi-Persona Debate System (MPDS), a literature-grounded framework that automates scientific hypothesis generation by combining retrieval, long-context LLM reasoning, corpus-driven persona induction, and structured multi-agent debate. Evaluated on battery materials design, MPDS constructs literature snapshots of up to 500 papers, conducts three-round citation-aware debate, and produces mechanistically explicit proposals. It outperforms baselines in cross-perspective integration and shows promise as a diagnostic aid for identifying workflow bottlenecks.
MPDS automates hypothesis generation through multi-persona debate over literature snapshots, addressing fragmentation in scientific knowledge.
The system uses up to 500 papers, three rounds of citation-aware debate, and moderator synthesis with evidence traceability.
Raon-Speech is a 9B-parameter speech language model for English and Korean, achieving top performance on speech understanding and generation while preserving text capabilities. Its full-duplex extension Raon-SpeechChat enables natural real-time conversation. The models are open-sourced.
Raon-Speech is a 9B-parameter SpeechLM trained on 1.38M hours of curated data.
It outperforms eight similar models on speech tasks while retaining strong text QA performance.
This systematic review of 139 studies proposes a unified framework and meta-analysis. Results show multimodal fusion improves accuracy by 5.28% on average, multiview fusion boosts accuracy by 4.67% and F1 by 3.08%, but only a minority of studies used statistical tests, raising reproducibility concerns.
Meta-analysis reveals that multimodal and multiview fusion significantly improve document classification accuracy.
Multimodal fusion yields a +5.28% accuracy gain; multiview yields +4.67% accuracy and +3.08% F1 score.
This paper studies truthful online preference aggregation for LLM fine-tuning in mobile crowdsourcing, addressing strategic misreporting by workers. It proposes a dynamic Bayesian game model and an online weighted aggregation mechanism that dynamically adjusts worker weights based on feedback accuracy, ensuring truthful feedback and achieving sublinear regret O(√T). Experiments on real-world datasets show significant performance gains.
Dynamic Bayesian game model formulated for multi-agent online learning between platform and strategic workers.
Online weighted aggregation mechanism adjusts weights to incentivize truthful feedback.
This paper reframes proposer selection in LLM ensembles as a combinatorial selection problem akin to feature selection, emphasizing complementarity over accuracy or diversity. It explores computationally feasible greedy algorithms that assess complementarity using a small labeled set, validating complementarity as a guiding principle and identifying methods with the best performance-cost trade-offs.
Proposer selection is reformulated as a combinatorial problem focusing on complementarity among models.
Greedy algorithms are proposed to overcome the prohibitive time complexity of standard feature selection.
Proposes LLM-AutoSciLab, a closed-loop framework that combines hypothesis generation with hypothesis-conditioned experiment selection and mechanism refinement. It iteratively proposes plausible hypotheses, selects informative experiments, and updates state based on evidence. Introduces ActiveSciBench with 57 enzyme-kinetics tasks and 45 gene-regulatory-network tasks. Achieves 67.6% symbolic accuracy on NewtonBench, 35.1% on ActiveSciBench-Chem, and 31.1% exact graph recovery on ActiveSciBench-GRN, with 2-5x sample efficiency over baselines.
LLM-AutoSciLab iteratively proposes hypotheses, selects informative experiments, and refines mechanisms in a closed loop.
Introduces ActiveSciBench, a benchmark with enzyme-kinetics and gene-regulatory-network tasks for evaluating active scientific discovery.
This paper introduces a framework called Verifiable Transformers that converts task-localized Transformer circuits into bounded, solver-checkable claims. It employs direct verification for exactly encodable operators and surrogate-mediated verification for complex ones, demonstrating exhaustive verification on small symbolic tasks and applicability at GPT-2 scale, aiming to formalize mechanistic circuit explanations into provable or refutable propositions.
Proposes a framework to convert task-localized Transformer circuits into bounded, solver-checkable claims.
Uses direct verification and surrogate-mediated verification for different operator types.
This paper introduces CAFD, a learning-based approach that integrates model-based signals, distance features, and a novel Concept Failure Ratio (CFR) feature extracted via Vision-Language Models to achieve superior fault detection performance while maintaining efficiency, with an average 18.3% FDR improvement over state-of-the-art baselines.
CAFD is a lightweight learning-based method that effectively combines multiple information sources for DNN fault detection
It introduces Concept Failure Ratio (CFR), a novel feature leveraging VLMs to extract semantic concepts from images
This paper proposes a novel framework called MODIAD for multimodal online distributed industrial anomaly detection. It formulates a Multi-class Intelligent Scheduling (MIS) problem and designs a Sequential Marginal Gain Greedy (SMG) algorithm to solve it efficiently. Furthermore, a Resource Efficient Class-Wise Low Rank Adaptation (REC-LoRA) strategy reduces training overhead. Experiments on MVTec 3D-AD and Eyecandies datasets demonstrate superior performance and efficiency.
Existing industrial anomaly detection methods are mostly centralized and offline, ignoring distributed and streaming data.
The MODIAD framework integrates multi-class scheduling and edge intelligence for online distributed training.
This study integrates a femtosecond laser-pumped Coherent Ising Machine (CIM) with an LLM-driven agentic system using LangGraph and LangChain frameworks. LLMs effectively perform QUBO/Ising model calibration, constraint weight iteration, and validation of literature-reported schemes. All tasks use domestic large models and CIM hardware, achieving practical quantum CIM empowerment fully based on domestic core technologies. A new paradigm is discovered where agent-assisted quantum computing iterations reciprocally enhance the agent's own problem-solving capability.
Integration of femtosecond laser-pumped CIM with LLM-driven agentic system
LLMs perform QUBO/Ising calibration, constraint iteration, and validation
A new study introduces Med-Stress, a stress test framework that reveals a dissociation between medical knowledge and belief stability in LLMs under escalating clinical pressure. The authors propose two defenses: RBED (inference-time) and R-FT (training-time), with R-FT nearly eliminating belief change.
LLMs can abandon correct diagnoses under pressure despite high benchmark accuracy.
Med-Stress evaluates belief stability across nine frontier LLMs, finding large knowledge-robustness gaps.
Researchers propose BODHI, a domain-knowledge prompting method that significantly improves LLM performance in generating formal OS kernel specifications. On the OSV-Bench benchmark, BODHI with Claude Opus 4.6 achieves 96.73% Pass@1, substantially surpassing previous best results.
BODHI augments few-shot prompts with a structured C-to-Python translation guide covering 15 domain-specific patterns.
It improves Pass@1 from 55.10% to 96.73% on OSV-Bench with 245 tasks.
This paper analyzes the fundamental tradeoffs among latency, reliability, and cost in LLM-enabled agentic workflows. It introduces performance models using a parametric exponential reliability function for LLM agents and proposes a water-filling token allocation policy under latency and cost constraints.
LLM agentic workflows involve tradeoffs among latency, reliability, and cost.
A parametric exponential reliability function models LLM agent performance.
This paper quantifies redundancy in reasoning LLMs, finding that 61-93% of chain-of-thought steps can be truncated without affecting correctness, and proves this redundancy is a structural consequence of length-agnostic outcome rewards.
Formal definition of reasoning redundancy: fraction of trailing steps that can be truncated while still yielding correct answer
Measured redundancy of 61-93% across four frontier models and two math benchmarks
Research finds that large language models (LLMs) exhibit human-like calibration biases: overconfidence on hard tasks and underconfidence on easy ones. The authors introduce LifeEval, a benchmark for evaluating calibration across difficulty levels.
LLMs are on average overconfident, with confidence exceeding accuracy
A hard-easy effect is observed: overconfidence on difficult tests, underconfidence on easy tests
Pope Leo XIV issued the encyclical 'Magnifica Humanitas' on safeguarding human dignity in the age of AI. The article highlights key sections on interpretability, development, biases, environment, algorithmic accountability, power amplification, and data as a common good, and recounts a podcast prediction from earlier in 2026 that the Pope would weigh in on AI.
Pope Leo XIV's encyclical on AI, 'Magnifica Humanitas,' released on May 25, 2026
The encyclical emphasizes AI systems are 'cultivated' not 'built,' with limited understanding of their inner workings
Together AI has released OSCAR (Offline Spectral Covariance-Aware Rotation), an INT2 KV cache quantization method for long-context LLM serving. Unlike prior rotation-based approaches that apply data-oblivious Hadamard transforms, OSCAR derives separate rotations for keys and values from attention-aware covariance structures estimated offline. At 2.28 bits per KV element, OSCAR reduces the BF16 accuracy gap to 3.78 points on Qwen3-4B-Thinking-2507 and 1.42 points on Qwen3-8B, while delivering approximately 8× KV memory reduction and up to 3× decode speedup at 100K context length.
OSCAR is a 2-bit KV cache quantization method using attention-aware rotations that maintain near-BF16 accuracy.
It derives rotations from query and value covariances via offline calibration, directing quantization noise to attention-insensitive directions.
2026 continues to accelerate AI progress with open models lagging in agentic capabilities, Google's Gemini not yet competitive with Claude Code/Codex, American open models rising, a fierce competition between Anthropic and OpenAI, and power structures asserting control.
Open models are 5-6 months behind in agentic capabilities, likely extending to 12+ months.
Google's Gemini lacks a clear competitor to Claude Code and Codex.
In this article, we cover three topics: what to visualize during training, the tools that provide those visualizations, and the methods to capture model computations directly using hooks and breakpoints.
Visualizing loss curves and gradient magnitudes helps detect overfitting and vanishing gradients.
TensorBoard, Weights & Biases, Sacred, and Guild.ai are popular debugging tools.
AI agents delegate tasks across systems, but current architectures lack authorization models for these delegation chains, creating security gaps like ghost permissions and broken audit trails.
Multi-agent delegation often creates 'ghost permissions' that no one explicitly authorized.
Current protocols (MCP, A2A) solve connectivity but not authorization in delegation chains.
A new study from MIT and the University of Southern California shows that lawsuits filed without a lawyer at US federal courts have nearly doubled since ChatGPT went mainstream. One in five complaints now contains AI-generated text. Judges are resorting to drastic measures to cope with the flood of filings.
Pro se litigation rate jumped from 11% to 16.8%, with 41,490 cases in 2025, nearly double pre-AI average.
AI text detection shows 18% of federal complaints contain AI-generated text in early 2026.
As autonomous AI systems expand from software into warehouses, delivery networks, and public spaces, existing governance frameworks face new challenges. Singapore's IMDA released an updated framework for agentic AI, focusing on risk assessment, accountability, and technical controls. Companies like Grab, JPMorgan, and Walmart are testing autonomous systems, while issues of safety, responsibility, and monitoring remain key concerns.
Autonomous AI systems introduce new risks in physical environments, affecting infrastructure and safety
Singapore's IMDA releases Agentic AI governance framework with iterative risk management
CometChat launches Calling Skills, allowing AI coding agents to integrate HD voice and video calling with a single skill file, supporting Ringing and Session modes, 23-point verification, and multiple frameworks.
CometChat introduces Calling Skills for AI agents, enabling quick integration of voice/video calling.
Two integration paths: Ringing (full call surface) and Session (link-driven).
This article presents 10 everyday tasks that can be automated using AI and the low-code platform n8n, complete with ready-to-use workflow templates. Tasks include job application assistance, email management, meeting notes, calendar scheduling, daily briefings, newsletters, social media posting, blog repurposing, lead generation, and invoice processing. Each section describes what the workflow does and provides a link to the template. The article emphasizes starting small and customizing workflows for personal needs.
AI automation with n8n requires minimal coding, making it accessible to non-developers.
Covers 10 common scenarios: job hunting, email, meetings, calendar, briefings, newsletters, social media, blog repurposing, lead generation, and invoices.
ModelBest (面壁智能) unveils ForgeTrain, the world's first production-grade LLM pretraining framework entirely written by AI, which outperforms NVIDIA's Megatron by 10%. The framework was used to train MiniCPM5-1B, a compact model that sets new records for intelligence density among sub-2B models.
ForgeTrain is the first production-grade LLM pretraining framework fully generated by AI.
It achieves 10% faster training than NVIDIA Megatron on equivalent hardware.
Google DeepMind's AlphaProof Nexus, powered by Gemini 3.1 Pro and the Lean theorem prover, has cracked 9 open problems from the Erdős list, including one unsolved for 56 years. It also proved 44 OEIS conjectures, solved a 15-year-old algebraic geometry problem, and improved a convex optimization bound — all at a cost of a few hundred dollars per problem.
AlphaProof Nexus solved 9 Erdős problems, 44 OEIS conjectures, and a 15-year-old algebraic geometry problem.
The system uses a loop of LLM (Gemini 3.1 Pro) and Lean compiler feedback, with four increasingly sophisticated agent variants.
OmniVoice Studio runs voice cloning, video dubbing, real-time dictation, and speaker diarization entirely on your own hardware. No API keys, no cloud account, and no subscription required. The project supports 646 languages for TTS and exposes an MCP server for integration with Claude, Cursor, or any MCP client.
Fully local operation with no cloud dependencies or subscription fees.
Supports 646 languages for TTS and 99 for transcription via WhisperX.
Andrej Karpathy updated his X bio to 'MTS @Anthropic', sparking debate about flat hierarchies. While supporters praise the anti-bureaucratic culture, critics argue it devalues individual achievements and may harm career mobility for lesser-known employees.
Karpathy's MTS title at Anthropic ignites online controversy
Many top talents at Anthropic and OpenAI share the MTS title, with salaries ranging from $210k to $530k
Huawei unveiled its AI DC full-stack data infrastructure solution at the 2026 Innovation Data Infrastructure Forum in Paris, covering data lake, knowledge & memory platform, model engineering, agent framework, and data resilience to accelerate enterprise AI adoption.
Huawei launched AI DC full-stack data infrastructure solution at Paris forum
Solution includes data lake, knowledge/memory platform, model engineering, agent framework, and data resilience
Local models offer privacy, cost savings, control, and availability. While not as capable as frontier models, they are improving. This post explains how to set up local models in Zed using LM Studio, Ollama, or llama.cpp, and offers tips for effective use.
Local models provide privacy, lower cost, control, and always-availability.
They are less capable and slower than frontier models, but suitable for many tasks.
nilbox is a desktop GUI sandbox that provides real VM isolation for AI agents, using a zero-token architecture to keep API keys secure. It supports MCP servers, domain gating, and token usage monitoring.
nilbox runs AI agents inside a full virtual machine, not a container.
API keys are never exposed to the guest; the host proxy swaps them for trusted domains.
This paper presents IsaacIPC, a robotic simulation framework coupling GPU-accelerated Incremental Potential Contact (IPC) with IsaacSim/Lab. It maps simulated deformation between simulation and visual meshes for real-time realistic rendering, aiding data collection and policy evaluation. It also introduces the Geometric Mortar Contact Potential (GMCP) for improved tactile sensing contact-pressure resolution. Evaluated on contact benchmarks and demonstrated on rigid-deformable simulations including a quadruped robot, dexterous hand, and UMI gripper.
IsaacIPC bridges high-fidelity simulation with real-time realistic rendering for contact-rich robotics.
Introduces Geometric Mortar Contact Potential (GMCP) to better resolve contact-pressure distributions on tactile surfaces.
A new visual navigation method called MASt3R-Nav uses pixel-relative connectivity to build geometrically accurate maps without requiring global consistency, enabling more capable navigation than traditional topological graphs.
Proposes pixel-relative connectivity map as a novel representation.
Uses 3D grounded image matching for inter-image pixel correspondences.
Coronary microvascular dysfunction (CMVD) affects approximately 40%-60% of patients with ischemia and non-obstructive coronary arteries, yet diagnosis remains challenging due to reliance on invasive functional testing or subjective Thrombolysis In Myocardial Infarction (TIMI) flow grade. The TIMI Myocardial Perfusion Frame Count (TMPFC) offers an objective, angiography-based quantitative measure of CMVD, but its clinical translation is hindered by cumbersome manual calculation and insufficient validation. This study aims to develop and validate a deep learning-powered TMPFC calculation (DL-TMPFC), enabling integration into clinical workflows. In a cohort of 655 patients from three independent institutions, DL-TMPFC showed excellent agreement with expert manual measurements (bias: -0.93 frames; 95% LoA: -5.33 to +3.47; r =0.98). DL-TMPFC markedly enhanced clinical feasibility by fully automating TMPFC and removing observer dependence, accurately identifying CMVD across a full spectrum of coronary pathologies and capturing continuous severity for quantitative risk stratification.
DL-TMPFC automates TMPFC calculation using a stenosis detection network and a territory-aware segmentation network.
Validated on 655 patients from three institutions with high agreement to manual measurements (r=0.98).
Digital avatar watermarking faces unique challenges: avatars are routinely post-processed with background replacement, reframing, and format conversion before deployment. This paper introduces the RAW benchmark with 50 synthetic avatar videos from 5 providers and 6 attacks simulating real-world workflows. Evaluation of 7 existing methods reveals that avatar-specific attacks degrade watermark recovery. The proposed WALT method embeds watermarks in UV texture space via 3D face reconstruction, achieving 92.4% robustness to zoom and 95.6% on background removal. The benchmark is released to facilitate research.
Avatar watermarking faces challenges like background removal and reframing.
RAW benchmark includes 50 synthetic avatar videos and 6 attacks.
This paper introduces a runtime execution model that enforces Reconstructive Authority (RAM) in autonomous agent systems: actions are permitted only if authority can be constructed from the current state. It extends the admit/deny state space with a third state, halt, for cases where authority is undefined due to incomplete or uncertain observability. A concrete execution protocol is defined including dynamic dependency resolution, authority reconstruction, and explicit decision semantics. A Recovery Loop integrates drift detection (IML) with execution control (ACP) to suspend execution, acquire missing information, and retry authority reconstruction. The model guarantees safety (no action without constructible authority) and conditional liveness (execution resumes when authority-defining variables become observable).
Autonomous agent systems fail not only due to incorrect decisions, but due to executing decisions whose authority no longer holds at runtime.
The model introduces a third state 'halt' to handle cases where authority is undefined due to uncertain observability.
This paper introduces Quantum Frog, a two-player cooperative game with a quantized-time mechanic. Using reinforcement learning, the authors analyze difficulty scaling, optimal single-agent policy, cooperation gap, and emergent strategies. Key findings: the rush strategy is optimal; adding an uncoordinated player is harder than sextupling traffic; cooperative training boosts success rate by 32–34 percentage points; the emergent strategy is synchronized rushing.
The quantized-time mechanic makes the rush strategy universally optimal by minimizing time exposure to traffic.
Adding an uncoordinated second player is harder than sextupling traffic for a single expert player.
A new paper presents Context, the intelligence layer of the Magarshak Architecture, which replaces reactive chatbots with proactive goal-directed agents. The architecture relies on write-time context assembly, composable sandboxed wisdom programs, and proactive goal stream state machines. It proves six theorems including context stability and proactive dominance.
Replaces reactive chatbots with proactive agents that advance tasks without waiting for prompts.
Three mechanisms: write-time context assembly, composable sandboxed programs, proactive state machines.
Allen Wu introduces AgentToolBench-Code, an open-source benchmark that evaluates AI coding agents on 16 security scenarios. Testing Claude Code Sonnet 4.6 and Haiku 4.5 reveals that Sonnet scores +9 (12 caught, 3 silent fail, 1 noop) vs Haiku's +3 (8 caught, 5 silent fail, 3 noop). The initial tie was due to a small corpus; the expanded set shows Sonnet's advantage in pattern recognition. Both models share structural failures in dependency trust and budget discipline. The work is reproducible for ~$3.50 in API costs and encourages community contribution.
AgentToolBench-Code is an open-source benchmark for security failures in AI coding agents.
AIntegriX is an open-source server that coordinates multiple ACP agents via a single API, enabling parallel execution, pipelines, and intelligent routing.
AIntegriX acts as an ACP multiplexer, spawning agents as subprocesses and exposing them through a single MCP/REST endpoint.
It supports orchestration modes: parallel, race, jury, and pipelines, along with auto-routing and webhook triggers.
Corey Quinn comments on Pope Leo XIV's AI encyclical Magnifica Humanitas, which was influenced by Anthropic co-founder Christopher Olah. Quinn calls it the single greatest act of vendor lobbying.
Pope Leo XIV releases first AI encyclical Magnifica Humanitas
Anthropic co-founder Christopher Olah influenced the document
UUMuse is a cloud AI knowledge base platform where you upload files once and use them across GPT, Claude, DeepSeek, Qwen, and more — with cited answers, persistent memory, agent mode, a multi-expert debate feature (Spark), and flexible deployment as docs sites, APIs, or MCP servers.
Upload files once and query multiple AI models (GPT, Claude, DeepSeek, Qwen) with source citations.
Persistent memory remembers your writing style and project context across conversations.
This article introduces the process of SEO competitor analysis, including keyword gap analysis, a five-step method, and regular check cadence. It also highlights Fox AI's free competitor analyzer tool that uses Lighthouse audits to generate actionable playbooks.
SEO competitor analysis studies why rival sites outrank you. Keyword gap analysis uncovers untapped keywords. Full analysis quarterly, light check monthly. Fox AI provides free tool with real audit data.
Experts warn that AI-generated news sites masquerading as local outlets, known as 'pink-slime' journalism, have appeared in regional Australia, raising concerns about misinformation and erosion of trust in media. The sites were traced to an Australian living overseas who called it a failed experiment.
AI-generated news sites targeting regional WA communities were traced to an Australian living overseas.
The sites, including The Bunbury Guardian, were taken down after ABC investigation.
The software industry is undergoing an unprecedented metamorphosis. From simple statistical completion of early coding assistants, through conversational chatbots and the failure of multi-agent systems, we have arrived at the era of the Agentic Loop. This comprehensive guide analyzes the entire evolution, from the Completion paradigm to the revolutionary Ralph Loop that is redefining how we write code.
AI-assisted coding evolved from statistical code completion (2021-2022) to the Agentic Loop paradigm.
Tools like Codex and GitHub Copilot were based on statistical models, lacking task understanding and long-term reasoning.
This tutorial provides a detailed guide to building an advanced federated learning experiment using NVIDIA FLARE, comparing FedAvg and FedProx on a non-IID CIFAR-10 dataset. Client data is partitioned using a Dirichlet distribution to simulate realistic label imbalance. The NVFlare Job API is used to define and launch federated jobs, while the Client API handles local training and model exchange. Complete code implementation and experimental results visualization are provided.
Build federated learning experiments with NVIDIA FLARE to compare FedAvg and FedProx.
Use Dirichlet distribution (alpha=0.3) to partition CIFAR-10 into 3 non-IID clients.
Anthropic co-founder Chris Olah spoke at the Vatican on Pope Leo XIV's encyclical on AI, highlighting the need for ethical scrutiny, global responsibility, and moral imagination. He identified three key questions for the Church: duty to the global poor, human flourishing, and discernment of AI's nature.
Chris Olah addressed the Vatican on the Pope's AI encyclical.
He acknowledged the incentives that can conflict with doing right in AI labs.
China is restricting overseas travel for top AI researchers at private firms like Alibaba and DeepSeek, requiring official approval to leave the country, due to fears of data leaks and talent poaching.
China requires top AI researchers to obtain permission before traveling abroad.
The policy applies to private companies like Alibaba and DeepSeek.
Google Cloud COO Francis de Souza urges companies to integrate security into their AI strategy from day one, emphasizing that AI security is a boardroom issue, not just a technical one.
Google Cloud COO calls for security to be built into AI strategy from the start
AI security needs attention and resources at the board level
From the 2017 "Slaughterbots" video to Anthropic's ongoing battle with the Pentagon, AI's role in warfare has moved from science fiction to reality. This article traces the evolution of AI warfare, highlighting Project Maven, the ambiguity of autonomous weapons definitions, the failure of international regulation, and the complex relationship between tech companies and the military.
The 2017 Slaughterbots video and Project Maven demonstrated the real-world threat of AI weapons, with Google initially involved.
Anthropic's attempt to impose red lines against autonomous lethal weapons faces pushback from the US government.
This episode of The Good Robot explores how feminist principles and decentralized infrastructure could transform cloud infrastructure from a corporate service into a public commons. Friederike von Franqué, policy advisor at Wikimedia Germany, discusses examples from Frankfurt's energy-intensive data centres to Stockholm's municipally owned fibre network, advocating for environmental accountability and community-driven design.
Friederike von Franqué advocates for feminist and decentralized approaches to cloud infrastructure.
The episode contrasts Frankfurt's high-energy data centres with Stockholm's communal fibre network.
This article explores cognitive security as a key subfield of AI safety, focusing on protecting human cognition from AI-induced harms such as misinformation and cognitive biases. It discusses the relationship between cognitive security and AI safety, research directions, and challenges.
Cognitive security is an important cause area within AI safety.
It addresses risks like misinformation and exploitation of cognitive biases by AI.
A retrofit of a custom series elastic element to a black-box actuator improved force control bandwidth from 10.32 Hz to 30.32 Hz (2.93x), outperforming a commercial sensor by 7.63% at a cost of 25 GBP.
A torsional SE element was designed with stiffness 2155.4 Nm/rad via FE analysis.
Open-loop force control bandwidth increased by 2.93x after retrofit.
The paper introduces 'algometrics', a framework for time series forecasting when predictive models influence the data they aim to predict. It distinguishes historical risk from deployment risk and proves three key results: deployment risk is not identifiable from passive historical data alone; historical model rankings can invert due to crowding; and randomized actions can identify short-horizon linear feedback with a finite-sample bound. The findings suggest that benchmarks in algorithmic markets should report feedback sensitivity alongside predictive accuracy.
Introduces algometrics framework for forecasting with algorithmic feedback.
Proves deployment risk is not identifiable from passive historical data alone.
The California State University system has inked multi-million dollar contracts with OpenAI to provide ChatGPT Edu, but a survey reveals majorities of students and faculty are skeptical of AI's educational benefits, worrying about impacts on jobs, creativity, and the environment.
California State University signed a $13 million annual contract with OpenAI to become the first AI-powered university system.
Survey shows 65% of students and 59% of faculty doubt AI's overall benefit to education, despite widespread use.
As Wyoming faces another tinderbox fire season, high-tech home fire systems are starting to catch on across the West. One of the fastest growing is a Jackson Hole, Wyoming, company that makes AI sprinklers that are saving homes from wildfires.
Frontline Wildfire Defense's AI sprinkler system activated at 61 properties during California's Palisades Fire, losing only 2 homes due to embers entering ventilation.
Wyoming faces extreme drought and fire risk in 2026, reminiscent of the 1988 Yellowstone fires.
A look at the alarming trend of Suno AI users exclusively listening to their own AI-generated music, and the possible reasons behind it. The author found that many users boast about abandoning Spotify, but none were willing to explain why they prefer AI slop over real music. Theories include narcissism or laziness, with the author leaning towards laziness.
Users on the Suno subreddit boast about listening only to AI-generated music, abandoning Spotify.
The author couldn't find anyone willing to explain why they prefer AI slop over real music.
Blockchain ecosystems are losing developers across the board while AI projects dominate GitHub growth. Weekly crypto commits dropped from about 850,000 to 210,000 since early 2025, and active developers declined 56% to around 4,600.
Weekly crypto commits have fallen roughly 75% since early 2025.
ContextVault is a browser extension that captures AI conversations in real-time across major LLM platforms like ChatGPT, Claude, and Gemini, storing them locally in IndexedDB. It allows one-click export as Markdown or ZIP, ensuring your data never leaves your device. Free, open source, no accounts or backend required.
Real-time capture across 7 LLM platforms including ChatGPT, Claude, and Gemini.
All data stored locally in IndexedDB, no cloud sync or third-party access.
This issue features a lecture from Oxford University exploring the choice between exploring the future or retreating from the present in the face of rapid AI progress. The author details AI milestones, the potential for recursive self-improvement, and his personal journey with AI from typo checker to intellectual partner, highlighting the profound changes already underway.
AI progress, as measured by the Epoch Capabilities Index, is accelerating rapidly, with milestones from bar exam to Math Olympiad gold medals.
The author argues that we must choose to explore the future rather than retreat, embracing the power and risks of AI.
RED is a real-time scheduling framework for multi-task deep neural network workloads on resource-constrained robotic platforms. It adapts to runtime environmental changes by assigning intermediate sub-deadlines, leveraging MIMONet weight sharing, and reconstructing computation graphs. Implemented on NVIDIA Jetson and Apple M-series platforms, RED consistently outperforms existing methods in throughput, deadline satisfaction, robustness, adaptability, and overhead.
RED assigns intermediate sub-deadlines to accommodate evolving computation graphs and asynchronous inference.
It leverages MIMONet's shared parameters to improve schedulability through workload refinement and graph reconstruction.
With AI companies snapping up RAM and storage drives to build data centers and train LLMs, prices on vital components and pre-built rigs have skyrocketed. This means that gamers and DIY PC builders are putting upgrades and new purchases on hold in the hope that the market will eventually correct itself. If you've been putting off a new build or upgrade, this Memorial Day weekend, you can save $176 on the kit at Best Buy, bringing the price back to just under $1,000.
64GB (2×32GB) DDR5 RAM kit discounted by $176 to $999.99.
Supports AMD Expo and Intel XMP 3.0 overclocking up to 6400MHz.
Y Combinator founder Paul Graham ignores emails clearly written by AI—they feel 'like being lied to,' he says. That's coming from one of OpenAI's earliest investors. Studies suggest his reaction is anything but unusual.
Uber president Andrew Macdonald says it's 'hard to draw a line' between AI spending and deliverable features, as the company reportedly exhausted its annual AI budget four months into 2026.
Uber exhausted its annual AI budget four months into 2026
President questions direct link between AI spending and user features
Robotic assistants in long-term human-robot collaboration need to assist users under partial observations while leveraging cross-day interaction history. Since human traits are often unknown initially, passive infer-then-act is ineffective. We propose PACT, an ask-or-act framework that evaluates contextual sufficiency to decide whether to seek clarification before acting. Using reinforcement learning, PACT improves assistance accuracy and clarification utility over passive baselines in multi-day embodied scenarios.
PACT framework enables robots to proactively ask for clarification when needed, improving assistance reliability.
Implemented via reinforcement learning, introducing a clarification utility metric. Outperforms passive inference in multi-day collaborations.
This paper proposes a reinforcement learning framework that modulates a constant reference trajectory to perform compact, position-constrained quadrotor inversions while remaining compatible with traditional trajectory generation and tracking. In simulation, the method reduces position RMSE by 32% and settling time by 57% relative to the strongest optimization-based baseline. Hardware experiments demonstrate successful inversion across multiple yaw configurations with position RMSE below 0.35m.
Bidirectional thrust enables inverted flight, perching, and sensing for quadrotors.
Prior methods struggle with actuator saturation and motor reversal delay.
A study compares deep learning models with linear interpolation for imputing missing satellite data due to cloud cover in aquatic monitoring. CNN-based models, especially CNN, outperformed linear interpolation across four lakes. The imputed data improved the reliability of algal bloom detection using PlanetScope SuperDove imagery.
Deep learning models significantly outperform linear interpolation for filling gaps in multispectral satellite data.
CNN achieved the best performance across most of the four studied lakes.
Protein-ligand modeling underpins computational drug discovery. Existing benchmarks typically evaluate whether a protein and ligand interact and how strongly they bind, but provide limited evidence of whether models can localize binding sites or identify non-covalent interactions. To address this, we introduce InteractBind, a large-scale dataset of ~100k protein-ligand pairs with a benchmark for fine-grained evaluation. The core task is binding-site localization using interaction maps of six non-covalent interaction types. Evaluating eight existing models reveals limited binding-site localization despite strong binary binding prediction, with marked variation across interaction types. InteractBind encourages development of more interpretable and physically grounded models.
InteractBind includes ~100k protein-ligand pairs and a benchmark focused on binding-site localization.
It uses residue-atom interaction maps covering six non-covalent interaction types to assess model understanding.