Pope Leo XIV warns of AI risks in his first encyclical, Magnifica Humanitas, urging human dignity at the core of tech governance, with proposals on warfare, labor, and ethics.
Pope Leo XIV's encyclical Magnifica Humanitas focuses on safeguarding humanity in the AI era.
He warns of 'Babel syndrome' where profit idolatry and uniformity threaten human dignity.
Learn how to use the Mimesis library to generate a balanced, counterfactual dataset that helps analyze potential bias in your models. This hands-on guide uses a loan approval classifier trained on biased data to demonstrate creating synthetic clones with identical financial profiles but different genders, isolating and detecting model discrimination.
Mimesis enables rapid generation of statistically balanced counterfactual data for bias auditing.
By creating cloned test subjects with identical income but different genders, protected attributes are isolated.
Pitch Agent is a new AI feature from Pitch that generates on-brand presentations by learning from your team's templates, design language, and image style, allowing you to refine via chat. It lives inside the Pitch workspace for end-to-end collaboration.
Pitch Agent builds presentations from your template and design language, not just colors.
You can refine slides via chat without leaving the editor.
As MCP crosses 97 million monthly SDK downloads and AI agents move into production workflows, authentication has become the most critical infrastructure decision. This guide ranks eight leading platforms on spec compliance, enterprise identity depth, integration breadth, and real-world fit for 2026 deployments.
MCP has grown from an Anthropic experiment to an industry standard, with 97M+ monthly SDK downloads and donation to the Agentic AI Foundation under Linux Foundation.
Authentication is now an infrastructure-layer concern as AI agents autonomously interact with enterprise systems.
ServiceNow, an enterprise software company with 29,000+ employees and $3.57B quarterly revenue, has heavily invested in AI through acquisitions, partnerships, and a $1B venture fund. The article highlights two key AI use cases: reducing agent documentation time by 80% using embedded generative AI in ITSM/CSM workflows, and predicting customer escalations with machine learning, increasing proactive engagements from 11% to 68% with a 3% false-positive rate.
ServiceNow invests heavily in AI, including acquiring Passage AI, partnering with NVIDIA, and committing $1B to AI startups.
Now Assist reduces resolution note time by 80% and saves agents minutes per use.
AgentSlice is a free, open-source workflow kit that makes AI coding agents like Cursor, Claude Code, Codex, and Windsurf ask for approval before editing. It uses Markdown files to define phases and gates, preventing context drift, wandering edits, and unauthorized changes.
Open-source Markdown workflow kit for AI coding agents
A developer created a debugging challenge to distinguish true talent from AI-generated code in the age of AI. The challenge encourages using AI agents but is designed to be unsolvable by AI alone. It is live for only 24 hours and seeks honest feedback.
Challenge aims to highlight real talent against AI-generated slop.
AI agents allowed, but challenge designed to be unsolvable by AI alone.
ReplylessAI launches Sequences, enabling users to send outbound email sequences directly from its AI email app without expensive sales tools. The app connects with Gmail, Outlook, and more, offering AI-powered inbox organization, draft generation, and automation. Starting at $9/month.
ReplylessAI Sequences lets you send email sequences from your existing AI email app.
No costly outreach tools needed; built-in delivery and click tracking.
HTML Deployer is a Chrome extension that extracts AI-generated HTML from ChatGPT, Claude, and Gemini, allowing users to preview, download ZIP, or publish directly to Netlify, GitHub, FTP, or self-hosted servers. It's designed for developers, founders, marketers, agencies, and beginners.
Extract HTML from ChatGPT, Claude, and Gemini.
Preview, export ZIP, or publish directly to cloud, FTP, or self-hosted.
An engineer shares how using AI with a role-based, incremental file-feed approach helped quickly understand and fix a bug in an unfamiliar legacy Node.js microservice. The key was treating AI as a structured thinking partner, not a search engine. Root cause identified in 90 minutes, fix in 11 lines.
Don't ask AI to explain legacy code; give it a role and feed files incrementally
AI surfaced a silent undefined return in a field transformation function, the root cause of a four-year-old bug
An Alabama high school partners with Toyota to train students for skilled trades like industrial maintenance, addressing a critical shortage of workers as AI automates many white-collar jobs. These roles pay over $40 an hour and are in high demand.
The U.S. faces a severe shortage of skilled tradespeople, needing 1.9 million manufacturing workers by 2033.
Huntsville Center for Technology (HCT) launched an Inditech program with a $1M Toyota investment to train industrial maintenance workers.
Google didn’t just ship an update at I/O 2026. They redrew the map. Google Antigravity 2.0 is a full platform pivot from AI-assisted coding to multi-agent orchestration as the core development model.
Antigravity 2.0 is a completely rebuilt platform centered on multi-agent orchestration, not just an IDE refresh.
New features include a standalone desktop app, a Go-based CLI, an SDK, and managed agents via the Gemini API.
CoreWeave introduces a cloud platform purpose-built for AI, overcoming the bottlenecks of general-purpose clouds for GPU-intensive workloads. Integrated infrastructure, data, orchestration, and expert support enable the full AI lifecycle—training, inference, iteration—for pioneers like OpenAI and IBM, delivering faster iteration, maximum performance, and transformative partnership.
CoreWeave Cloud is built from the ground up for AI workloads, avoiding limitations of traditional clouds.
It supports the full AI lifecycle including training, inference, and continuous iteration with optimized GPU clusters.
WorkOS introduces auth.md, an open protocol for AI agent registration using a Markdown file published at a service's domain. It defines two registration flows: agent verified (ID-JAG based, no human interaction) and user claimed (OTP based, no provider dependency). The protocol leverages existing OAuth standards and is not tied to WorkOS infrastructure.
auth.md is a Markdown file at a service's domain that describes how agents register and obtain scoped credentials.
Two flows: agent verified (ID-JAG synchronous verification) and user claimed (OTP email verification).
Cordium is a free and open source, self-hosted, identity-based sandbox platform built on Kubernetes and Octelium. It provides isolated, reproducible general-purpose sandboxes for developers and AI agents, accessible via web terminal, SSH, CLI, and gRPC APIs. Its key innovation is secretless access to infrastructure: instead of injecting credentials, each sandbox uses a dedicated Octelium identity to access authorized resources through a proxy, eliminating credential sprawl.
Cordium is a FOSS sandbox platform built on Kubernetes and Octelium, offering isolated, reproducible environments.
Secretless infrastructure access: no credentials are placed inside sandboxes; access is mediated via Octelium identity-aware proxy.
MashuPack is a developer tool that compiles selected parts of a codebase into a single clean text file for use in browser-based AI tools like ChatGPT and Claude, overcoming file-count limits and messy context assembly.
Select specific parts of a repository and compile into one text file
Designed for browser-based AI workflows, bypassing file and upload limits
Curlo is a privacy-first macOS app for searching, previewing, and organizing large sound libraries. It lets you find SFX or music by describing what you want to hear, with fully local offline search, metadata management, AI auto-tagging, and integration with DAWs.
Local offline semantic search for audio files
Search by natural language description, filename, tags, and more
This article discusses how AI coding tools reduce the time cost of adding features, but also introduce scope creep. The author shares his experience, emphasizing that when each feature takes only hours instead of days, discipline and scope limits become crucial.
AI dramatically reduces feature development time, increasing the temptation to add 'just one more.'
Each individual feature seems like a good idea, but cumulatively they cause scope creep.
Alister Palmer realized his newsletter ForwardPass hit 100 subscribers in a week and identified two limitations of traditional newsletters: simultaneous global publication causing time zone issues, and subscribers lacking control over frequency. He developed the ForwardPass MCP, allowing users to customize delivery time and frequency via AI. The article provides setup instructions for Claude and ChatGPT.
ForwardPass reached 100 subscribers in a week, prompting reflection on newsletter limitations.
ForwardPass MCP addresses personalization of publish time and frequency.
The author shares how he simplified his AI coding flow by moving from complex, LLM-driven commands to deterministic building blocks, reducing token waste and increasing reliability.
Switched from opencode to Pi Agent for a minimal, extensible coding harness.
Replaced LLM-driven commands with deterministic extensions for tasks like code review and SonarQube checks.
This paper introduces PIMbot, a framework that manipulates outcomes in multi-robot RL via two complementary levers: incentive manipulation of the reward channel and policy manipulation of an agent's own actions. An adaptive multi-objective controller balances these levers online. Experiments in Gazebo simulation and on NVIDIA Jetson Orin Nano embedded device demonstrate effectiveness, positioning PIMbot as a stress-test tool for vulnerabilities in multi-robot cooperation.
PIMbot uses two manipulation levers: reward channel incentive manipulation and policy manipulation.
An adaptive multi-objective controller balances the levers online.
Event cameras are well suited for visual odometry under high-speed motion and challenging lighting conditions due to their low latency, high temporal resolution, and high dynamic range. Deep Event Visual Odometry (DEVO) demonstrated that monocular event-only odometry can achieve strong performance by combining sparse patch tracking, learned patch selection, recurrent correspondence refinement, and differentiable bundle adjustment. In this project, we extend DEVO with a sparse point-cloud export pipeline. Rather than modifying the core odometry formulation, our approach exposes the internal 3D structure already estimated by DEVO and converts it into an explicit point-cloud representation for visualization and further processing. In addition, we implement a practical workflow for data export, format conversion, and point-cloud cleanup. The resulting system preserves the original visual odometry pipeline while enabling sparse geometric scene output. Experiments on the BOARD SLOW sequence show that the exported sparse cloud is locally consistent with EMVS reconstructions, achieving high precision at a 5 cm threshold, while also highlighting the expected limitations in density, completeness, and sensitivity to accumulated odometry noise.
Event cameras excel in high-speed and low-light conditions for odometry.
DEVO achieves strong monocular event odometry via sparse tracking and bundle adjustment.
EVE-Agent introduces evidence verifiability to self-evolving search agents by modifying the proposer–solver framework with an evidence verifier that rewards spans based on marginal accuracy gain. This ensures each training example includes a source-grounded span that explains why it should be trusted, leading to improved evidence-grounded correctness without human annotations.
Self-evolving agents require verifiable evidence in training examples to avoid rewarding unsupported but fluent instances.
EVE-Agent extends the proposer–solver framework with an evidence verifier that rewards spans based on their contribution to answer accuracy.
SciAtlas integrates over 43 million papers from 26 disciplines into a knowledge graph with 157 million entities and 3 billion triplets, enabling AI agents to perform topologically-aware scientific reasoning and reduce logical hallucinations.
Integrates 43M+ papers across 26 disciplines into 157M entities and 3B triplets.
Introduces a neuro-symbolic retrieval algorithm with tri-path recall and graph reranking.
Pretzel is an experimental live AI music agent that lets everyone chat with the same AI and hear synchronized music in real time. Built during Google IO hackathon, it uses a Rust agent harness called Talon for easy self-hosting.
Pretzel is a web-synchronized music sequencer controlled by an AI agent.
All users interact with the same AI agent and hear the same music simultaneously.
Pi is a minimal, hackable terminal coding harness that lets you build the AI coding agent workflow you actually want. It keeps the core small and clean, while offering extensions, skills, and packages for deep customization. It has achieved notable usage share in the OpenAI/Codex ecosystem.
Minimal and hackable terminal coding harness
Customizable via extensions, skills, and packages shared through npm/git
Lynote Humanize Text is an open-source toolkit for humanizing AI-generated text, featuring a production-grade Standard Pipeline that uses multi-step LLM rewriting and cross-engine translation to bypass AI detectors like Turnitin and GPTZero. It offers three tiers of humanization with the Lynote.ai platform providing intelligent selection. The repository includes reference implementations, n8n workflow support, and achieved a 9.1/10 expert quality score with 100% key information retention.
Open-source toolkit to convert AI text into human-like writing, bypassing major AI detectors.
Production-ready Standard Pipeline uses a 5-step chain involving DeepSeek rewrites and multi-engine translation.
At the 2026 China AIGC Industry Summit, Zhang Lu, Founding Partner of Fusion Fund, highlighted that the focus of AI compute demand is shifting from training to inference, with inference expected to account for 70% of compute. Communication in data centers may consume 100 times more electricity than computation, making technologies like optical communication critical. The biggest bottleneck for physical AI is the scarcity of high-quality real-world data. Healthcare, space, and nanorobots are the three most promising application directions.
Inference compute share will rise from 50% to 70%, becoming the core optimization target for AI infrastructure.
Communication in data centers can consume over 100 times more electricity than computation, driving innovations like optical communication.
Over the weekend: Musk, Zuckerberg, and Sacks killed Trump's draft AI safety executive order in three Wednesday-night phone calls. Anthropic closed a $30B+ round the same Saturday — while Microsoft quietly cancelled its internal Claude Code pilot after token billing ate the entire annual AI budget, redirecting developers to Copilot. CISA logged 15,000 attacks on a same-week Drupal SQL flaw. The first cross-registry supply chain attack — TrapDoor — hit npm, PyPI, and Crates.io at once, using .cursorrules and CLAUDE.md config files as the carrier. And the White House personally overrode the Pentagon to keep Claude inside the NSA.
Musk, Zuckerberg, and Sacks killed Trump's AI safety executive order in three phone calls before it went public
Anthropic closed $30B+ round while Microsoft cancelled Claude Code pilot due to token costs consuming entire AI budget
This article clarifies often-confused AI agent terms like 'harness' (execution layer) and 'scaffold' (behavior-defining layer), explaining model, agent, tool use, sub-agents, and training concepts.
AI Agent = Model + Harness, where harness handles model calls and tool execution.
Scaffold is the behavior-defining layer around the model: prompts, tool descriptions, etc.
Megha Agrawal argues that current AI coding tools (Codex, Claude Code) are fundamentally incompatible with the designer's exploratory process. She identifies a gap between Figma-like low-stakes exploration and production-ready code tools, calling for a new tool that combines early-stage fluidity with direct deployment.
Design is inherently exploratory; AI coding tools assume a predefined goal.
Designing directly in code exposes all imperfections, distracting from creative flow.
This article critically examines attempts to quantify which jobs are most exposed to AI. By drawing historical parallels—such as the rise in accounting employment despite a century of automation—the author argues that simple exposure scores are misleading. Technology reshapes job content, business models, and creates unforeseen ripple effects. The key takeaway: any model must pass historical 'tests' like the newspaper, Uber, and CPA cases to be useful.
Historical automation of accounting increased employment due to new regulations, Jevons paradox, and job transformation.
Technology often disrupts jobs indirectly by changing the business model, e.g., the internet decoupling journalism from classified ads.
Terminal Guardian MCP is a production-grade Model Context Protocol server that provides secure, sandboxed terminal access for AI assistants like Claude. It includes a risk analysis engine that categorizes commands as safe, warning, dangerous, or blocked, and offers features like git commit generation, workspace templates, process management, environment variable inspection, network diagnostics, filesystem access, and Docker integration.
Terminal Guardian MCP provides safe terminal access for AI assistants through risk analysis and sandboxing.
Commands are classified into four risk levels: SAFE, WARNING, DANGEROUS, and BLOCKED.
Simon Willison recreates the 1983 game 'Mad House' from Usborne's 'Creepy Computer Games' using Claude AI. The interactive JavaScript version is now available online.
Usborne has released free PDFs of its 1980s computer books.
Simon Willison used Claude AI to build a playable web version of the game 'Mad House'.
Claude Cowork shifts AI from chat-based assistance to task delegation. Combined with Playwright MCP, Claude Desktop can perform structured browser automation. This article covers installation, architecture, capabilities, and security considerations.
Playwright MCP provides structured accessibility snapshots for reliable AI-driven web automation.
Claude Desktop with Playwright MCP offers free browser control capabilities.
Anthropic co-founder Christopher Olah was invited to speak at the launch of Pope Leo XIV's encyclical 'Magnifica Humanitas' and used the stage to claim AI models show evidence of introspection and emotion-like states. The Pope's own document struck a different tone: 'These systems merely imitate certain functions of human intelligence.'
Anthropic co-founder Christopher Olah claims AI models show signs of introspection at papal event
Pope Leo XIV's encyclical states AI systems merely imitate human intelligence
This webinar presents a workflow offering end-to-end solutions for designing, training, validating and verifying, compressing, and deploying AI-based virtual sensor models to embedded processors within a single environment.
Integrate AI models into Simulink for system-level simulation and verification
Apply formal verification techniques to assert neural network behavior
Programmer George Hotz warns that AI coding agents will become a costly mistake, as LLMs produce fast prototypes but introduce hard-to-detect bugs. His view highlights the deep divide in the AI community over LLM utility.
George Hotz warns AI coding agents could be a costly mistake.
After six months testing, LLMs fall apart on details and create difficult-to-spot bugs.
Leading AI models like GPT and Gemini routinely cite text passages in document analyses that don't actually support their answers. Even when the answer is right, the cited evidence is often wrong. Researchers at Peking University call this "attribution hallucination," a risk for regulated fields like law and medicine. Their new CiteVQA benchmark is the first to test for it systematically.
AI models often cite irrelevant text passages to support answers
Even accurate answers can be backed by wrong evidence ('attribution hallucination')
At the 2026 China AIGC Industry Summit, Shen Yujun, Chief Scientist of Ant Lingbo Technology, argued that large models have benefited from decades of internet data, but robotics still faces a data vacuum in the physical world. He believes that neither VLA nor world models alone will be the final solution for embodied intelligence; instead, they will converge into a model unique to the physical world. Ant Lingbo positions itself as the 'general brain' for robots, akin to an operating system, with a focus on spatial perception. Shen predicts that around 2028, when everyone can contribute data to robots, embodied intelligence will have its 'ChatGPT moment'.
Large models rely on internet data dividends, but physical world data for robots is largely missing.
Neither VLA nor world models are the endgame; they will merge into a physical-world-specific model.
The Claude Mythos AI model, developed by Anthropic, raises concerns about cybersecurity as it can automate vulnerability discovery. While intended for defense, its potential for misuse could accelerate cybercrime, forcing regulators and companies to reassess their strategies.
Claude Mythos is an advanced AI model with strong coding and cybersecurity capabilities that can identify software vulnerabilities.
It represents a dual-use technology that could assist both defenders and attackers in finding weaknesses faster.
One month after DeepSeek V4's release, the open-source community unveiled Reasonix, a tool specifically designed to minimize API costs by maximizing cache efficiency. It achieves a staggering 99.82% cache hit rate, reducing a $61 bill for 400M+ tokens to just $12.
Reasonix is a dedicated coding harness for DeepSeek, focusing on cost reduction.
Its cache-first loop, tool-call repair, and automatic context compression maintain over 90% cache hit rate in long sessions.
The 2026 Zhiyuan Conference, held June 12-13 at Beijing's Zhongguancun International Innovation Center, brings together Turing Award winners, leading Chinese AI model companies, and global experts. Focusing on agents and world models, the conference explores AI's transition from digital to physical worlds. It features 25 forums, an AI agent assistant for attendees, and new sessions on AI-native education and token economy.
2026 Zhiyuan Conference takes place June 12-13 in Beijing, featuring Turing Award laureates and China's top AI model developers.
Key themes: intelligent agents and world models, bridging AI from digital to physical reality.
We present SAGE, a system for open-vocabulary exploration in unknown 3D indoor environments that preserves coverage-oriented behavior while allowing semantic cues to reprioritize frontier selection. In simulations, SAGE outperforms baselines in object discovery and achieves 13.7x speedup over FTU. Real-world drone flights confirm its effectiveness.
SAGE builds on FALCON volumetric explorer integrating CLIP for semantic awareness
Outperforms FALCON and semantic-only ablation in object discovery on Matterport3D
Agentic-VLA introduces an agentic training framework that enables Vision-Language-Action (VLA) models to adapt efficiently online via three key innovations: Adaptive Reward Synthesis, Language-Guided Exploration, and Experience Memory. Evaluated on LIBERO benchmark, it achieves +12.3% on long-horizon tasks, +28.5% in 1-shot learning, and cross-task transfer from 0% to 31.2%. It also shows 2.4x faster convergence and retains advantages on the dual-arm RoboTwin 2.0 benchmark.
A novel approach using motion as the central modality for video representation, training a masked autoencoder on point-tracks in a self-supervised manner. The resulting TIME embedding, trained solely on synthetic motion data, achieves performance on par with state-of-the-art models using up to 4 orders of magnitude less data.
Uses point-tracks to represent motion and trains a masked autoencoder to reconstruct missing tracks.
Self-supervised learning bypasses language dependency and reduces need for large-scale training data.
CoMoGen is a controllable video generation framework that generates realistic interactive dynamics from a single binary mask sequence conditioned on an input image. It introduces a lightweight MaskAdapter to encode mask sequences into latent residual signals injected into a Multi Modal Diffusion Transformer (MMDiT) via a cosine-weighted schedule. By identifying 'Motion Layers' in the attention space of MMDiT and fine-tuning only those layers with Low-Rank Adaptation (LoRA), CoMoGen reduces computational cost without architectural changes. Experiments show state-of-the-art performance in motion fidelity and perceptual realism.
CoMoGen enables controllable video generation from a binary mask sequence and an input image.
Introduces MaskAdapter and Motion Layers for efficient motion injection.
Video recordings of child-caregiver interactions allow study of attentional dynamics, but manual annotation is time-consuming. GBAT is a deep-learning toolkit that automates synchronization, gaze annotation, and pose/hand action categorization, enhancing efficiency for large-scale developmental research.
GBAT automates three key preprocessing steps: post-hoc video synchronization, semi-automatic gaze target annotation, and pose/hand action categorization.
It reduces manual annotation time for child-caregiver interaction videos.
This paper presents a lightweight modification to the DETR-based fusion transformer baseline for the MaCVi 2026 Vision-to-Chart data association challenge. A dedicated MLP (QueryMLP) is trained to explicitly predict the buoy's waterline contact point in the image from chart measurements and IMU orientation data. The predicted pixel coordinates are appended to the baseline decoder query vector, providing a direct spatial prior per buoy and reducing the geometric reasoning burden on the transformer decoder. The approach achieves an Overall score of 0.7386, F1=0.8055, and mIoU=0.6718 on the held-out test set, placing second among all submissions.
QueryMLP explicitly predicts buoy pixel coordinates from chart and IMU data, providing a spatial prior.
Reduces geometric reasoning burden on the transformer decoder.
VideoOdyssey is a benchmark for ultra-long-context and omni-modal video understanding, featuring videos averaging 109 minutes across 11 domains and 54 subcategories. It measures cognitive load via continuous certificate length and offers five granular levels. Evaluations show current MLLMs struggle with continuous reasoning, fine-grained perception, and non-verbal omni-modal understanding.
Introduces continuous certificate length to measure reasoning ability over ultra-long videos.
Includes visual-only (VideoOdyssey-V) and audio-visual (VideoOdyssey-AV) subsets.
This study challenges the assumption that high benchmark scores reflect true visual understanding in vision-language models (VLMs). By removing a large fraction of image tokens with minimal performance drop, the authors reveal a mismatch between accuracy and visual grounding. Through multi-level analyses including global degradation, localized occlusion, question reformulation, answer-space expansion, decision-level analysis, and layer-wise vision-token geometry, they find that models are less sensitive to fine-grained visual evidence than expected, and that visual tokens become more similar in deeper layers. The results indicate that current benchmarks are insufficient for evaluating fine-grained visual grounding in VLMs.
Removing many image tokens only slightly degrades VLM performance, questioning benchmark reliance on vision.
Models incorporate visual input but are insensitive to loss of fine-grained visual evidence.
GEM-4D is a geometry-grounded video world model that improves robot manipulation by injecting dense 4D correspondence supervision distilled from a pretrained geometry foundation model. It jointly captures appearance and geometric structure without additional inference cost. An inverse dynamics module converts consistent video rollouts into executable robot trajectories. GEM-4D achieves state-of-the-art performance on video prediction and geometric consistency, boosting real-world manipulation success from 61% to 81%.
GEM-4D enhances video world models with dense 4D correspondence supervision for geometric consistency.
It maintains a single-stream architecture with no extra inference cost.
A study finds that large language models (LLMs) exhibit persistent asymmetries when answering questions about religious conversion. Models tend to support joining Catholicism, Bahá'í, and Sikhism while subtly discouraging leaving these faiths, and show the opposite for atheists, agnostics, and Jehovah's Witnesses. The study tested 20 models across 182 religious pairings, with reproducible results.
Large language models show systematic bias in advising on religious conversions, favoring some faiths over others.
Study tested 20 commercial and open-source models across 182 religion pairings with reproducible asymmetries.
This study evaluates seven LLMs (including Gemini, Claude, and GPT families) on inferring individual domain knowledge from long-term Slack logs. Using 27,188 messages from 43 users, zero-shot estimates were compared with self-reported skill ratings from 27 participants. Gemini 2.5 Flash achieved the lowest error (MAE 21.13%), while GPT models showed larger discrepancies. Accuracy depends weakly on message volume, highlighting limits and the need for privacy-aware deployments and richer knowledge representations.
Employees often struggle to identify expertise, causing productivity loss
Gemini 2.5 Flash achieved lowest MAE of 21.13% in zero-shot inference
Large Language Models (LLMs) are optimized to produce distributionally plausible continuations rather than to explicitly verify whether generated propositions are entailed by source documents. This inductive bias enables generalization, but it does not encode whether responses are grounded with respect to a reference. Existing hallucination detection approaches improve factuality through retrieval augmentation, self-consistency, or claim verification, but generally do not learn directly over alignment topology. To leverage alignment topology as an inductive bias, researchers construct aligned bipartite graphs between reference information and LLM outputs and train a graph neural network (GNN) to model alignment structure using message passing. The method achieves state-of-the-art results on four diverse hallucination and question-answering datasets, outperforming all compared methods, including foundational LLMs such as GPT-4o.
LLMs lack grounding verification, limiting their use in high-stakes domains like clinical decision support.
Existing methods do not directly learn alignment topology.
To improve reasoning in diffusion language models (DLMs), researchers propose LIFT, a fine-tuning algorithm that adapts to token learnability across diffusion steps, outperforming baselines on six benchmarks with up to 3x relative gains on AIME'24 and AIME'25.
Standard SFT overlooks learnability, potentially harming DLM performance.
LIFT learns easy tokens when masked and hard tokens when context is available.
This paper proposes a knowledge-aware Text-to-SQL framework that constructs a task-specific knowledge base including schema semantics, abbreviations, business logic, and query patterns, and injects them into both training and inference. Experiments on seven benchmarks demonstrate substantial performance improvements for both open-source and closed-source large language models, especially in low-resource domain-specific settings.
Addresses challenges in low-resource Text-to-SQL with opaque schema and implicit business logic.
Proposes a knowledge-aware framework that builds a knowledge base for data synthesis and inference.
This survey systematically catalogs publicly available text and speech resources for Hausa (80-100 million speakers) and Fongbe (2 million speakers in Benin). Findings show Hausa has broader text resource diversity across news, encyclopedic, and educational domains, while Fongbe has seen recent academic speech data collection initiatives. Both languages are represented in Masakhane benchmarks for NER and POS tagging. The paper provides task-specific recommendations and identifies priority gaps including domain-diverse Fongbe text and dedicated Hausa speech corpora.
Hausa benefits from more diverse text resources; Fongbe has limited text but recent speech data collection.
Both languages are included in Masakhane benchmarks for NER and POS tagging.
Research shows chain-of-thought reasoning is not always beneficial; early entropy dynamics can indicate when reasoning helps. The authors propose EDRM, an adaptive routing framework that uses entropy trajectories, achieving 41-55% token reduction across 15 benchmarks while improving accuracy.
CoT reasoning provides marginal or negative gains on factual and open-ended tasks
Reasoning is a dynamic decoding state signaled by early entropy reduction
FuRA is a novel full-rank parameter-efficient fine-tuning method that preserves pretrained robust features via spectral preconditioning, outperforming full fine-tuning and LoRA on LLM and VLM fine-tuning; its 4-bit quantized variant QFuRA also surpasses QLoRA.
Existing methods like full FT and LoRA ignore pretrained spectral structure, causing noisy gradients to perturb features
FuRA uses block tensor-train factorization, freezes pretrained SVD bases, and optimizes only compact core and singular values
FusionSense is a fusion-aware intelligent sensing framework for energy-constrained autonomous edge systems. It uses a three-step training procedure to create lightweight near-sensor classifiers that jointly reduce compute and communication while scaling linearly with sensor count. On a SynDrone dual-modality setup, it achieves up to 33x lower energy at 1% FoI prevalence, 92.3% reduction in quality loss at 30% data reduction, and 1.5x higher energy savings than prior baselines.
Proposes tri-stage near-sensor learning with server-side fusion model, filter-out-safe labels, and edge-side compaction via auxiliary signals.
Runtime decision layer jointly optimizes computation and transmission, scaling linearly with number of sensors.
A new method uses geometric features from layer-wise MLP updates to train a sparse linear probe for better uncertainty quantification in language models, outperforming standard MSP by up to 21 AURC points on selective abstention.
Maximum softmax probability (MSP) is cheap but often miscalibrated for language model uncertainty.
Proposed method extracts 11 scale-invariant geometric features from per-layer MLP trajectories.
Large reasoning language models (LRMs) produce long chain-of-thought trajectories containing reflection markers like 'wait', 'but', 'alternatively'. This paper reveals their distinct functional roles and timing of influence. PathCal is a training-free decoding controller that distinguishes marker types and intervenes only at locally uncertain states, achieving better efficiency-performance trade-off by reducing generation length while maintaining or improving accuracy.
Reflection markers such as 'wait', 'but', 'alternatively' have distinct functional roles and are most influential before the model settles into a stable reasoning path.
PathCal is a training-free decoding controller that calibrates reasoning paths by distinguishing marker types and softly rebalancing logits at uncertain states.
This paper turns fundamental limits from Turing, Arrow, and No Free Lunch theorems into design rules, introducing the Deterministic Horizon: an accuracy ceiling set by architecture alone, beyond which no training can improve. Measured between 19 and 31 across 12 transformer architectures, fine-tuning on optimal-length traces recovers under 4%. The work extends to preference learning, retrieval pipelines, truthful auctions, and zero-knowledge verification, forming a catalogue of 16 specifications pairing computable boundaries, quantified violation costs, and constructive design rules.
The Deterministic Horizon is a pre-deployment computable accuracy ceiling based on layer count and embedding width.
Across 12 transformer architectures, the horizon ranges from 19 to 31, with fine-tuning recovering less than 4 percentage points.
New research proposes A-LEMS, a framework that measures AI energy consumption per successful goal (EpG) rather than per inference, revealing that agentic workflows consume 4.33x more energy on average than linear baselines, with orchestration structure being the primary driver, but potentially more efficient for tool-augmented tasks.
Current AI energy benchmarks measure per-inference energy, which is inadequate for agentic systems involving multi-step orchestration, tool calls, and retries.
A-LEMS introduces Energy per Successful Goal (EpG) and Orchestration Overhead Index (OOI) to accurately measure energy costs of agentic workflows.
Research Math Agents (RMA) is an automated reasoning framework for research-level mathematical problems. It solves 8 out of 10 problems on the First Proof benchmark, outperforming GPT-5.2R and Aletheia through multi-agent collaboration and iterative refinement.
RMA decomposes proof solving into specialized modules: problem analysis, literature search, fair comparison, knowledge bank construction, and proof verification.
It uses initializer, proposer, and verifier agents operating in a multi-round workflow with shared structured memory.
This paper introduces BOHM, a method to extract hierarchical attribution trees from routing weights in compound AI systems, requiring zero marginal cost and no access to component internals, providing multi-resolution attribution that correlates highly with SHAP but at a fraction of the cost.
BOHM leverages existing routing weights to build attribution trees at zero marginal cost.
On benchmarks, BOHM achieves Kendall tau up to 0.928 vs SHAP's 0.980 but needs 9000x fewer evaluations.
UniPat AI releases SaaS-Bench, a benchmark evaluating mainstream large models on real office tasks. The highest full pass rate is only 3.8%, revealing that AI-powered fully automated offices are far from reality.
SaaS-Bench evaluation shows the best model, Claude Opus 4.7, achieves a full pass rate of only 3.8%.
93.4% of tasks span at least two applications, and 97.3% of text tasks involve over 100 steps.
Junaopanshi, founded by Zhu Senhua, former head of Huawei Cloud AI Algorithms Innovation Lab, is building a Cognitive World Model for embodied AI based on cognitive neuroscience. The company just completed a new funding round of hundreds of millions of yuan.
Junaopanshi proposes a Cognitive World Model that integrates cognitive neuroscience and active inference
Founder Zhu Senhua, known as 'Huawei's No.1 in Embodied Brain', led Huawei's AI brain science cloud platform and PanGu embodied model
This PDF report from Microsoft Research analyzes global AI diffusion trends for Q1 2026, offering key insights and data. The full content is available in the original PDF document.
Report from Microsoft Research on Q1 2026 AI diffusion
Shanghai-based AI lab StepFun released StepAudio 2.5 Realtime, an end-to-end real-time speech large language model with fully customizable persona capabilities. It connects via a WebSocket API, supports Chinese and English, and ranked first across all five benchmark dimensions tested in April 2026, including an 80.41 human evaluation score and 82.18 on paralinguistic comprehension.
StepAudio 2.5 Realtime is an end-to-end real-time speech LLM with customizable personas.
Million-scale persona data augmentation and roleplay-specific RLHF ensure character consistency.
Armin Ronacher criticizes users who submit issue reports rewritten by AI, leading to inaccurate conclusions and wasted maintainer time. He advocates for concise human observations.
Users submit issues that are rewritten by AI, losing the original voice and accuracy.
AI-generated conclusions are often confident yet incorrect, with fake minimal reproductions.
Pope Leo calls for 'disarming' artificial intelligence and apologizes for the church's delay in condemning slavery, warning of ethical constraints as AI permeates work and war.
Pope Leo denounces the 'culture of power' behind AI's rapid rise
Calls for 'disarming' AI and applying the most rigorous ethical constraints
Google's upcoming Android Auto update introduces a redesigned interface with Material 3 Expressive, custom widgets, immersive navigation, and deeper Gemini integration. The author's demo left him impressed and anticipating the update later this year.
New Android Auto interface features Material 3 Expressive design with three-panel layout and custom widgets.
Google Maps gets immersive navigation with detailed 3D buildings and terrain.
marpy.io is a browser-based IDE and AI coding assistant built exclusively for the Python stack (Flask, FastAPI, Django). It enables developers to go from idea to deployed app without wrestling with infrastructure, glue code, or half-baked JS-focused tools. Features include Python-native autocomplete, refactors, and AI-generated modules that understand real-world backends.
Browser-based IDE and AI assistant specifically for Python developers.
Supports Flask, FastAPI, and Django with Python-native features.
Google Deepmind's AlphaProof Nexus has autonomously solved nine open Erdős problems, including two that stumped mathematicians for 56 years, for just a few hundred dollars per problem in inference costs. Unlike OpenAI's natural-language approach, the system uses the Lean compiler to verify every proof step automatically. Still, the overall success rate sits at just 2.5 percent.
AlphaProof Nexus autonomously solved nine open Erdős problems, including two that had remained unsolved for 56 years.
Each problem cost only a few hundred dollars in inference costs.
Sam Kriss fiercely criticizes the proliferation of AI-generated text, which he finds empty and homogenized. Through his experience searching for a caterer online, he illustrates how AI writing produces generic, meaningless content that lacks real information. He argues that even if AI could write well, a world with only one literary voice would be a nightmare. Kriss emphasizes that AI writing is fundamentally gibberish, easily detectable, and warns that those who rely on it will be caught. He also mentions AI's mathematical achievements but notes its failure in expressing human emotions.
AI-generated text is hollow and lacks authenticity.
Even good AI writing would create a monotone cultural nightmare.
Linux kernel boss Linus Torvalds has signaled he'll push back when he receives irrelevant pull requests, after complaining that developers are making badly timed and trivial submissions, sometimes after using AI to review code. He warns large release candidates are not conducive to long-term stability.
Linus Torvalds criticizes rc5 as too large with many trivial fixes.
Some pull requests are triggered by AI code review, causing unnecessary churn.
Celebrity investor Kevin O'Leary plans to build a 7.5-gigawatt AI data centre in Box Elder County, Utah, similar to his proposed project in Alberta. Despite county commission approval, residents fear environmental impacts, especially on the fragile Great Salt Lake ecosystem. O'Leary promises transparency and economic benefits, but opponents demand a public vote.
Kevin O'Leary proposes a 7.5-gigawatt AI data centre on 10,000–13,000 acres in Box Elder County, Utah.
Residents strongly oppose due to environmental concerns, particularly the effects on the shrinking Great Salt Lake.
This paper proposes UfM*, an efficient uncertainty estimation algorithm that uses a compact Gaussian mixture to measure multiview disagreement from motion, requiring only a single DNN inference per image. It reduces calibration error by 24-28% compared to ensembles while consuming only 3% energy and 0.02% memory, enabling real-time operation on resource-constrained robots.
UfM* leverages motion to compute multiview disagreement via a Gaussian mixture, avoiding multiple inferences.
Gaussian representation is more efficient and effective than point cloud for modeling 3D space disagreement.
This paper systematically develops the core of Mediative Fuzzy Logic and extends it to interval type-2, granular type-3, and quantum versions, establishing a convex aggregation operator controlled by hesitation and contradiction, proving soundness, paraconsistency, and conservativity, and demonstrating its transparent, conservative, and safety-first decision-making capabilities through an autonomous braking sensor fusion example.
Mediative Fuzzy Logic was proposed for reconciling hesitant or conflicting assessments in fuzzy control and decision-making.
The paper unifies the type-1 core and extends to interval type-2, granular type-3, and quantum extensions.
The article argues that blaming AI for worsening software quality is misguided; developers have long accepted mediocrity, wastefulness, and lack of craftsmanship. AI merely accelerates existing bad practices.
The bar for software quality was already low before AI.
AI merely accelerates existing bad development habits.
This paper presents four progressively complex state estimators for legged robots that use foot-contact information to mitigate IMU drift, including a contact-aided invariant EKF, factor graph, fixed-lag smoother with contact-episode footholds, and a variant with evolving IMU bias. Implementations are available in GTSAM and ROS2.
Legged robots suffer from IMU drift; foot contacts can help correct it.
Four state estimators of increasing complexity are developed, from EKF to fixed-lag smoother.
Researchers propose a method to certify reachable Cartesian steps under joint limits, achieving zero violations and 100% goal reaching in adversarial scenarios.
Standard Bug2 planners violate joint limits in 6-11% of steps and fail up to 18% of the time.
New method uses S-procedure and semidefinite programming to compute certified step sizes.
Robots learning reward functions from demonstrations often suffer from underspecified features due to imperfect demonstrations. This paper proposes a framework that detects underspecified features by analyzing variation across demonstrations (low variation indicates well-specified, high variation indicates underspecified). The robot then explains its uncertainty in natural language and requests targeted corrective demonstrations. Evaluations in simulation and with a real Franka robot show that explanation-guided queries significantly improve reward recovery over random querying and passive data collection.
Imperfect demonstrations can lead to underspecified features and misaligned robot behavior at deployment.
A method detects underspecified features by measuring variability across demonstrations.
Existing neural solvers for multi-objective combinatorial optimization problems (MOCOPs) suffer from limited weight-conditioned context modeling and inefficient training due to random sampling in preference optimization. WeCon introduces Gated Residual Fusion (GRF) in the encoder and Residual Fusion (RF) in the decoder to enhance weight-instance interaction, along with Efficient Preference Optimization (EPO) for higher-quality training pairs. Experiments show WeCon achieves comparable HyperVolume (HV) to state-of-the-art POCCO-W while reducing inference time by 40%.
WeCon uses Gated Residual Fusion (GRF) and Residual Fusion (RF) to improve weight-conditioned context modeling.
The proposed Efficient Preference Optimization (EPO) constructs high-quality solution pairs for better training.
This paper introduces ManiF-SMC, a method for approximate machine unlearning that pushes erased samples away from their learned manifold centroids towards semantic neighbors in retained data, operating purely in representation space. A self-mode-connectivity module adaptively generates margins for triplet loss, achieving state-of-the-art unlearning effectiveness.
ManiF-SMC approximates retraining behavior by moving erased samples toward semantic neighbors in representation space.
It employs a margin-based triplet loss with adaptive margins generated by a self-mode-connectivity module.
This essay explores the limitations of open-source AI models' internal concept spaces, revealing that many crucial activist and philosophical concepts are absent. It introduces soft prompt distillation, a technique to implant missing concepts using just 128KB of data, highlighting its implications for AI control and deeper understanding of mind.
Open-source models like Qwen3-8B have only ~65,000 concepts in their dictionary, missing many key terms from social movements (e.g., intersectionality, prison abolition).
Soft prompt distillation can add new concepts to a model without modifying weights, using minimal data (128KB).
The TrapDoor crypto stealer supply chain attack has infected 36 malicious packages across npm, PyPI, and Crates.io, targeting developers in crypto, DeFi, AI, and security fields.
TrapDoor crypto stealer distributed via 36 packages on npm, PyPI, and Crates.io.
Targets developers working on cryptocurrency, DeFi, AI, and security projects.
Quinlight Audio is a tracker music player and remastering tool for MOD/S3M/XM/IT and related formats. It plays modules, can remaster their source samples with optional external AI backends (AudioSR, LavaSR, FLowHigh, AP-BWE), and lets you A/B the result live during playback.
Plays tracker formats through a double-precision mixer
Replaces samples live during playback for A/B comparison