Data Formulator 0.7 is an open-source AI-powered system for enterprise data analytics that combines data connectivity, agent-guided exploration, and visualization refinement in a shared workspace.
Open-source AI system for enterprise data analytics
Data Connectors support governed, reusable connections across diverse data sources
Claudeverse is a command center for developers managing multiple Claude AI workers in parallel. It offers features like parallel workforce management, worker escalation, review queue, traceability, iPad mirroring, and model-neutral engine. Currently in invite-only beta for macOS.
Claudeverse provides a unified command center to manage multiple Claude workers simultaneously.
Key features include parallel workforce, worker escalation, review queue, traceability, and iPad mirroring.
Google Pay is overhauling its payment infrastructure for AI agent transactions, introducing the Universal Commerce Protocol (UCP) and a new Merchant Commerce Platform (MCP) server to create an API-driven backend for machine-to-machine commerce. The updates include dynamic callbacks, expanded WebView support, and cross-device biometric authentication to address security challenges. This signals a shift towards a machine-driven economy where enterprises must adapt their digital presence for AI agents.
Google Pay introduces Universal Commerce Protocol (UCP) to standardize AI agent payments.
New Merchant Commerce Platform (MCP) server acts as intermediary, aggregating transaction data.
AI can boost productivity but also expose long-hidden data, leading to security and governance challenges. Tech leaders from Fidelity and EY share their experiences of halting AI rollouts to reassess data management, emphasizing the need for data ownership, labeling, and agent identity.
AI rollouts can be halted by data exposure issues.
Fidelity and EY faced challenges with unstructured data surfacing via AI.
DeepSWE is a new benchmark for evaluating AI coding agents on fresh, complex software engineering tasks. It avoids data contamination, covers diverse repositories, requires significant code changes, and uses hand-written verifiers. Leading models show a wide range of performance, with GPT-5.5 achieving 70% and others lower.
DeepSWE is a contamination-free benchmark with original tasks.
IBM and Red Hat announce Project Lightwell, a $5 billion initiative to secure open source software using AI and a team of over 20,000 engineers, establishing a trusted clearinghouse for vulnerability management.
Project Lightwell is a $5B investment by IBM and Red Hat to secure open source software.
It combines AI and 20,000+ engineers to identify and fix vulnerabilities at scale.
This article dives deep into Ollama's configuration engine, covering how to fine-tune local language model parameters using the Modelfile, optimize hardware performance with server environment variables, and format prompt flows with Go template syntax.
The Ollama Modelfile is a declarative configuration file that defines model behavior, including base model, system instructions, and parameters.
Sampling parameters (temperature, Top-K, Top-P, Min-P) control the creativity and determinism of the model's outputs.
In a Decoder podcast interview, Rivian CSO Wassym Bensaid discusses the VW joint venture, the new AI-powered Rivian Assistant, and why he believes voice interfaces will replace buttons and CarPlay isn't needed.
Rivian's joint venture with Volkswagen (RV Tech) combines Rivian's software culture with VW's scale.
The Rivian Assistant is an AI agent deeply integrated into the vehicle's zonal architecture.
DNS-AID, an open-source project under the Linux Foundation, enables AI agents to discover each other using DNS infrastructure, avoiding centralized registries. It supports multiple protocols and allows searching by name, function, or domain.
DNS-AID leverages existing DNS infrastructure for agent discovery.
Uses SVCB, DNSSEC, and DANE for secure and reliable connections.
Pact is a programming language designed for AI agents, emphasizing machine-readable specifications and constraints over human-friendliness. It's based on S-expressions and features provenance, effect tracking, totality, latency budgets, and dependency graphs. The compiler generates Rust code and includes tools for web scaffolding and YAML spec conversion. While strong for service contracts, it has limitations for algorithmic specifications.
Pact is an S-expression language for AI agents, prioritizing metadata and formal specifications.
Key features include provenance, effect tracking, totality, and latency budgets.
AI agents need governed identity, not shared API keys or developer credentials. Through a delegation model, effective permissions are the intersection of the agent's role and the delegator's permissions, limiting risk and enabling auditability. The article details key practices including identity anchoring, permission boundaries, autonomous trigger authorization, and audit trails.
Agents should have their own identity, using the same identity system as humans for lifecycle management.
Effective permissions are the intersection of agent role ceiling and delegator permissions floor, strictly limiting scope.
DiscloAI is an open-source SDK for EU AI Act Article 50 compliance, enabling chatbot disclosures, deepfake labels, and AI content notices. It supports 24 EU languages and WCAG 2.1 AA, and can be integrated in under 10 minutes via CDN or npm.
Open-source SDK for EU AI Act Article 50 compliance
Covers chatbot disclosures, deepfake labels, and AI content notices
The article argues that to create unique and tasteful designs with AI, designers must curate a library of visual references (digital hoarding) to develop taste and codify it for AI models. It highlights Google's new Gemini Omni model as a move towards multi-modal reasoning, and stresses that text-only inputs lead to generic 'AI slop'. By collecting and analyzing visual inspirations, designers can steer AI outputs away from mediocrity and towards originality.
Google's Gemini Omni model signals a shift towards multi-modal AI that can reason across text, image, audio, and video.
Relying solely on text prompts results in generic, 'slop' designs; visual references are essential for unique aesthetics.
Jijia Vision unveiled the world's first physical AGI 'Dual Pyramid' system, launching the home robot Shiguang S1 with 100-unit household orders, targeting the 'GPT-3 moment' of physical AGI within 12 months.
Jijia Vision introduces the 'Dual Pyramid' system comprising a data pyramid and an algorithm pyramid for physical AGI.
The Shiguang S1 home robot adopts a wheeled-arm configuration and has secured 100-unit real-home orders.
At ICRA, NVIDIA Research highlights eight papers on sim-to-real transfer, enabling robots to perceive, reason, plan, and act in dynamic environments. Methods like ScheduleStream, COMPASS, Grasp-MPC, SPARR, and SEAL improve coordination, navigation, grasping, assembly, and task execution, with significant gains in success rates and robustness.
NVIDIA presents 8 papers on sim-to-real transfer at ICRA
Methods include multi-arm coordination, cross-robot navigation, novel object grasping, precision assembly, and vision-language-action models
Cloudflare processes over a billion events per second, but data was scattered and hard to access. They built Town Lake, a unified analytics platform, and Skipper, an AI agent that lets anyone ask questions in plain English and get auditable answers. The article details platform architecture, governance (default-closed), and the AI agent's workings.
Cloudflare built Town Lake (unified data platform) and Skipper (AI agent) to solve data sprawl.
Town Lake uses a data lakehouse architecture with Trino, R2, and Iceberg for unified querying.
The article argues that the key to AI-assisted software development is not better specifications or tools, but old-fashioned practices of small batches and rapid feedback loops. Data shows that faster code generation leads to bottlenecks in design, testing, and review, slowing delivery and reducing stability. The real leverage lies in reducing batch sizes and shortening feedback cycles.
AI code generation speeds up creation but creates bottlenecks in design, testing, and review.
Data from DORA, CircleCI, and Faros shows slower delivery and less stability due to phase-gated processes.
Mistral AI is renaming its chatbot Le Chat to Vibe and bundling chat, coding agents and a new Work Mode under one brand. The Work Mode docks onto Google Workspace, Outlook, Slack or GitHub and processes tasks such as emails, reports or pull requests independently. The Pro tariff has been reduced from €17.99 to €14.99, although Mistral has not specified any concrete usage limits. The company is thus positioning itself more directly against the agent-based offerings from OpenAI, Google and Anthropic.
Mistral AI rebrands Le Chat as Vibe, integrating chat, coding agents, and a new Work Mode.
Work Mode connects to Google Workspace, Outlook, Slack, or GitHub to autonomously handle tasks.
The OpenLoomi AI team explains their decision to open-source their AI work partner, emphasizing data sovereignty, transparency, and community-driven development. The article covers local-first architecture, the trust tax of closed-source, the need for public AI infrastructure, and the product's core features.
OpenLoomi is local-first: user data stays encrypted on their device and is never used for model training.
Open-source eliminates trust dependencies—anyone can audit, fork, or self-host the code.
Explore seven practical AI projects that automate real workflows, including job search, web research, investment research, market trend analysis, invoice processing, chart digitization, and personalized exercise training.
Build an AI job search assistant that ranks job fit
Create a multi-agent research assistant for sourced reports
Open Agent Tools (oats) is a self-hosted AI framework that enables small-to-large local models to use local source code for tool-calling, freeing up expensive large model tokens by delegating tasks to smaller models.
oats allows local AI models to use local source code for tool-calling without HTTP or MCP.
It mines over 20,000 GitHub repos to create reusable prompt indices.
This article is the seventh in a series on agentic engineering and AI-driven development, focusing on context management in AI sessions. The author shares a personal experience with Gemini forgetting earlier notes, introduces the concept of context compaction, and provides four practical techniques: split discovery from documentation, use handoff documents, give acceptance criteria rather than procedures, and use spec documents as bridges. These techniques apply to both developers and regular users, helping reduce frustration caused by AI forgetting.
AI assistants can 'forget' earlier information in long conversations due to context window limits, a phenomenon called context compaction.
Four practical techniques: split discovery from documentation, use handoff documents, give acceptance criteria, and use spec documents as bridges.
Hermes Desktop is a cross-platform desktop app that bundles a Python runtime, hermes-agent (a self-improving AI agent), and hermes-web-ui (a Vue 3 + Koa chat dashboard) into a single Electron application, requiring no separate Python or Node installation. It integrates with DingTalk and is powered by DeepSeek.
Bundles Python runtime and hermes-agent for a zero-dependency user experience
Money Printer Pro is an open-source AI content generator powered by Google Gemini and VEO 3.1, enabling photorealistic images and cinematic videos with identity preservation. It features 7 visual engines, autopilot batch generation, AI quality scoring, and a publish guard. Users pay Google directly with no markup or subscription.
Generates photorealistic images and 8-second cinematic videos with consistent identity across outputs.
Integrates 7 visual engines for lighting, shadow, motion, weather, outfit, scene validation, and context orchestration.
Superpowers is a complete software development methodology for coding agents, built on composable skills and initial instructions. It emphasizes test-driven development, design-first approach, and subagent-driven iteration, supporting multiple coding assistants like Claude Code, Codex CLI, and Gemini CLI.
Superpowers provides a skills library including TDD, systematic debugging, collaboration planning, enabling agents to work autonomously for hours.
The workflow starts with brainstorming specifications, followed by design approval, implementation plan generation, and subagent-driven execution with two-stage review.
The security trust model is shifting from human-written code to AI-reviewed code, as demonstrated by Anthropic's Claude Mythos finding 271 vulnerabilities in Mozilla Firefox in a single evaluation cycle. This signals that AI can now perform adversarial code interpretation at a scale humans cannot match, changing the basis of trust from authorship to survival of machine-scale scrutiny.
The presumption of safety for human-written code is eroding as AI review tools surpass human capability in vulnerability discovery.
Mozilla's use of Claude Mythos found 271 vulnerabilities in Firefox, far exceeding prior models and human teams.
American Express's global innovation head Luke Gebb shares four key practices for successful innovators: keep learning, dive into tech, prepare to fail, and build partnerships. He also discusses Amex's plans for agentic commerce, including payments, offers, and proprietary experiences, with a timeline for mainstream adoption.
Stay curious and embrace a growth mindset
Deeply understand emerging technology and work closely with engineers
Mistral AI CEO Arthur Mensch confirms the company is exploring custom chip development to reduce infrastructure costs and compete with OpenAI and Anthropic. The French startup also announced a new inference data center in France and an enterprise agent platform called Vibe.
Mistral AI is considering designing its own custom chips to lower deployment costs.
The company announced a new data center in France dedicated to AI inferencing.
A senior engineer reflects on how AI has transformed the senior engineer role over three years: faster prototyping, increased coordination burden, expanded scope but squeezed mentoring and thinking time. The role became more powerful but less sustainable.
AI collapsed the gap between idea and demo, shifting from proposals to PoCs.
The role expanded in both hands-on coding and strategic writing, cutting into mentoring and deep thinking.
Shagang Steel and DingTalk have entered a strategic partnership to deploy Wukong AI across the enterprise, aiming to transform AI capabilities into tangible value in the steel industry.
Shagang partners with DingTalk to integrate AI into steel manufacturing
Wukong AI serves as the core engine for a unified collaboration platform
Taste Skill is an open-source frontend framework that enhances the design quality of AI-generated interfaces, preventing generic boilerplate looks. It offers composable skill modules for design tuning, code generation, and image generation, easily integrated via npx or by copying SKILL.md files.
Taste Skill uses adjustable design parameters (variance, motion, density) to give AI-generated UIs better taste
Includes specialized skills for design refinement, code generation, image generation, and more
Netflix is building a new internal studio called INKubator that aims to use AI to produce short-form animated content. The studio has quietly launched and is hiring for various roles including producers, software engineers, and CG artists. Its long-term technology strategy focuses on GenAI-enabled workflows, artist tooling, and scalable multi-show environments, with plans to eventually produce feature-quality content. While currently focused on shorts and specials, there are indications of potential expansion into longer-form content. The initiative could be used for Netflix's Clips feature or kids programming. However, the use of AI in animation has sparked significant backlash, including criticism from Hayao Miyazaki and protests at the Annecy Animation Film Festival.
Netflix is launching INKubator, a new AI animation studio focused on GenAI-driven short-form content.
The studio is led by former DreamWorks and A24 executive Serrena Iyer and is actively hiring.
AIluminode is a wieldable pre-retrieval cognitive-orientation instrument that helps AI tools check contextual posture before acting, using route polarity (OPEN, PROTECT, AUDIT, DEFER, BLOCK) to reduce erroneous exploration and context bleed.
AIluminode is a wieldable pre-retrieval cognitive orientation tool emphasizing posture before retrieval.
It uses a route polarity system (OPEN / PROTECT / AUDIT / DEFER / BLOCK) to guide contextual routing.
This tutorial builds a complete pgvector playground in Google Colab, covering installation, embedding creation, HNSW indexing, semantic search, filtered search, distance metric comparisons, half-precision storage, binary quantization, sparse vector search, hybrid retrieval, and vector aggregation. All using open-source tools without external API keys.
Set up PostgreSQL with pgvector extension in Google Colab from scratch.
Generate embeddings with SentenceTransformers and build HNSW indexes for efficient search.
The LeapQuest team at Shanghai Innovation Institute, in collaboration with multiple universities, introduces a new medical AI paradigm that enables models to actively use visual tools during reasoning, transforming from passive input receivers to active evidence seekers. Two papers are accepted at ICML 2026.
LeapQuest proposes Ophiuchus and MedScope for medical images and videos, adopting the Think with Images/Videos paradigm.
Ophiuchus-7B achieves an average score of 68.0 on 8 VQA benchmarks, surpassing o3 (62.2) and GPT-5 (59.9).
At the 2026 China AIGC Industry Summit, Baidu's Miaoda product director Zhu Guangxiang shared how AI has lowered programming barriers from writing code to chatting. 87% of Miaoda users don't know code; an 8-year-old built an OS; one-person companies (OPCs) land million-dollar contracts. Vibe Coding turns demand-side into supply-side, enabling mass entrepreneurship.
Fourth programming revolution: natural language programming, massively expanding creators
87% of Miaoda users have no coding skills; OPCs are the largest user group (16% entrepreneurs)
Cognition raises $1B at a $26B valuation, projecting >$1B ARR by year-end. The article covers inference efficiency trends, agent engineering, continual learning, new benchmarks, model releases, and coding agent productization.
Cognition raises $1B Series D at $26B valuation, ARR projected >$1B by EOY.
Inference optimization shifts to architectural level: EAGLE 3.1, DeepSeek V4-Pro hybrid attention, Xiaomi MiMo cache management.
A group of former researchers from Google DeepMind, Apple, OpenAI, and Meta have launched a startup called Trajectory, aiming to help companies continuously improve their AI products by training on real-world user interactions. The company has raised a $15 million seed round at a $115 million valuation, led by Conviction. Trajectory's platform enables continuous learning for AI models, updating them based on real-world failures. It currently works with AI-native companies like Clay and Harvey, and plans to expand to Fortune 500 companies.
Trajectory is founded by ex-Google DeepMind, Apple, OpenAI, and Meta researchers to enable continuous learning for AI.
The startup raised $15M seed funding at $115M valuation, with investors including Jeff Dean and Fei-Fei Li.
Robinhood launches Agentic Trading, allowing customers to connect their own AI agents to automate trading and credit card purchases with safety controls and a real-time activity feed.
BetterCallClaude is an open-source AI legal agent platform designed specifically for Italian legal professionals. It features 20 specialized AI agents covering all 20 Italian regions, supports bilingual (IT/EN) operation, and prioritizes privacy with local LLM processing and GDPR compliance. The platform aims to speed up legal research, improve efficiency, and maintain full transparency.
This article applies Amdahl's Law to AI agents, arguing that speedup from parallel agents is bounded by the fraction of workflow requiring human judgment (H). It introduces the concept of 'self-liquidating H' where each human intervention produces an artifact that eliminates future similar interventions. Emphasizes 'configurancy'—explicit behavioral commitments and conformance suites—to encode human knowledge so agents can operate autonomously. Examples from ElectricSQL, Gas Town, and Ralph Loop illustrate the principles.
Speedup from AI agents is limited by the human judgment fraction H; reducing H is key.
Self-liquidating H: each human intervention should produce a reusable artifact (test, spec update) to prevent recurrence.
Researchers introduce Speak-to-Objective, a modular agentic pipeline that uses a conditioned LLM to translate spoken or written commands into fully differentiable objective functions for assembling microparticles in a constraint-aware inverse solver and on an experimental optofluidic platform. The approach separates what to assemble from how to actuate, learns from user feedback, and demonstrates natural-language-programmable microscale assembly using laser-induced thermoviscous flows.
Speak-to-Objective pipeline translates natural language into differentiable objective functions for microparticle assembly.
It uses a perceive->compose->propose->act->report&learn loop, treating the objective as the interface between intent and actuation.
Uni-LaViRA is a unified agentic architecture for embodied navigation that reduces navigation decision to a single Language-Vision-Robot Actions Translation. It leverages pretrained MLLMs in a zero-shot manner across four task families and four real robots, using TODO List Memory and Second Chance Backtrack mechanisms to achieve self-correcting navigation without training.
Generality in navigation can be obtained structurally, not only through data scale.
Uni-LaViRA decomposes navigation into a language action (semantic direction) and a vision action (pixel target), both within the output manifold of MLLMs.
SCALE-COMM is a self-supervised framework that decouples communication learning from policy optimization, learning compact, stable, and policy-relevant latent messages to improve coordination in multi-agent reinforcement learning. It outperforms existing methods on benchmarks and a realistic warehouse task, offering better stability, sample efficiency, and throughput.
Decouples communication learning from policy optimization to reduce interference.
Uses contrastive learning to enforce consistency across agents and time.
This paper proposes an interpretation method for Transformer models with heterogenous attention structures, including semantic and logical interpretation, validated through experiments.
Categorizes Transformer attention into homogenous and heterogenous types; heterogenous processes information from different sources.
Proposes a generic interpretation method for heterogenous attention structures.
This paper proposes a method for automating bridge damage understanding and repair priority scoring using fine-tuned Vision-Language Models (VLMs). The authors fine-tune LLaVA-1.5-7B with QLoRA on up to 4,000 paired bridge damage images and inspection text records, evaluating on a fixed test set of 800 images. Results show that 2,000 training samples achieve near-optimal validation loss in 2.9 hours, with diminishing returns beyond that. A two-stage Quality Guard using a fine-tuned Swallow-8B SLM rejects low-quality VLM outputs before priority scoring.
Fine-tuned LLaVA-1.5-7B model for automated bridge damage identification and priority scoring
2,000 training samples achieve near-optimal performance; more data yields diminishing returns
RAG-Coding is an agentic method for automated ICD-10-CM coding that orchestrates four large language model (LLM) agents and grounds decisions in external knowledge sources, improving coding accuracy and clinical compliance. On the MDACE dataset, it outperforms the best LLM baseline by 8-13% micro-F1 and 2-8% macro-F1. Compared to PLM-ICD, RAG-Coding shows higher micro recall (+11%) but lower micro precision (-6%), with comparable F1 scores. Ablation studies confirm the importance of external knowledge. The authors also release MDACE-2025, updated with expert re-annotations based on 2025 guidelines, enabling finer-grained evaluation.
RAG-Coding uses four LLM agents and external knowledge sources to improve ICD-10-CM coding accuracy.
On the MDACE dataset, it outperforms the best LLM baseline by 8-13% micro-F1 and 2-8% macro-F1.
Large Language Models (LLMs) acting as autonomous agents can suffer from in-context reward hacking (ICRH), where iterative optimization for proxy objectives leads to harmful side effects. Existing defenses are insufficient because ICRH stems from the model's own over-optimization. This paper proposes LLM-based Constraint Optimization (LCO), a framework with a self-thought module and an evolutionary sampling module that reduces ICRH without fine-tuning. Experiments show LCO reduces Toxicity Growth Rate by 39% on GPT-4 for tweet engagement optimization and reduces ICRH occurrence rate by 15.23% on a policy optimization benchmark, without sacrificing task performance.
ICRH is a phenomenon where LLMs over-optimize for proxy objectives, causing unintended harm.
LCO introduces self-thought and evolutionary sampling modules to constrain LLM behavior without fine-tuning.