Hugging Face AI News

Source Mix

Hugging Face Blog30
Hacker News AI7
MarkTechPost4
AWS Machine Learning Blog2
AI Business1
arXiv AI1
arXiv Computational Linguistics1
arXiv Machine Learning1

Topic Mix

Agents29
Chips26
Research23
Models16
Policy11
Startups4
Robotics1

Timeline

2026-07-066
2026-07-075
2026-06-174
2026-06-304
2026-07-084
2026-06-163
2026-06-183
2026-06-233

Latest Updates

Enhancing enterprise inference on Amazon SageMaker HyperPod with data capture, Hugging Face, NVMe, and Route 53 integration

2026-07-09 16:38 UTC

This post walks through five new capabilities for Amazon SageMaker HyperPod inference: multi-tier data capture for auditing and model improvement, direct deployment from Hugging Face Hub, local NVMe model loading for faster cold starts, automated Route 53 DNS for custom domains, and pod-level IAM through custom service accounts. These enhancements provide faster, more observable, and more flexible inference infrastructure for enterprise generative AI workloads.

Multi-tier data capture records inputs and outputs at endpoint, load balancer, and model pod levels for deep observability.
Direct deployment from Hugging Face Hub eliminates the need to pre-stage weights, with support for gated access and revision pinning.

Nvidia, Hugging Face Collaborate on Open Source Robot Models

2026-07-08 19:35 UTC

The move is seen as supporting accessibility and deployment for physical AI and also boosting Nvidia’s already strong presence in the field.

Nvidia and Hugging Face partner to develop open-source robot models.
The collaboration aims to enhance accessibility and deployment of physical AI.

Data for Agents

2026-07-08 17:16 UTC

NVIDIA emphasizes the importance of open data and synthetic data for building agentic AI, highlighting data inspectability, quality, and trust. The article details Nemotron datasets, the Prompt Atlas visualization tool, and the use of synthetic personas for local diversity.

Synthetic data is crucial for scaling agentic AI while protecting proprietary signals.
NVIDIA's Nemotron open datasets span over 10 trillion pretraining tokens and millions of post-training samples.

[AINews] Lilian Weng summarizes 35 papers on Harness Engineering for RSI

2026-07-08 02:20 UTC

This edition of AINews covers a broad range of AI developments from July 6-7, 2026. Highlights include Lilian Weng's deep dive into harness engineering for recursive self-improvement, Meta's launch of Muse Image and preview of Muse Video with agentic generation loops, and major product updates from Anthropic, LangChain, and Google on agent platforms. Other notable items: NVIDIA's Audex audio model, Cohere's Arabic ASR, robotics integrations with Hugging Face and NVIDIA, Liquid AI's Antidoom method to reduce reasoning loop failures, and Anthropic's controversial J-space interpretability work. Also covered: benchmarks for agents and legal AI, research automation, and inference efficiency advances.

Lilian Weng's blog post reframes recursive self-improvement around the harness rather than direct weight modification, emphasizing that harness engineering is critical for specifying goals and context.
Meta's Muse Image and Muse Video showcase agentic generation with planning, tool use, and self-refinement, quickly ranking high on public leaderboards.

Native-speed vLLM transformers modeling backend

2026-07-08 00:00 UTC

The transformers vLLM backend is now as fast (or faster) than custom vLLM implementations for many LLM architectures. Model authors can automatically leverage their transformers implementations to get ultra fast vLLM inference, for free.

Transformers vLLM backend matches or exceeds native vLLM throughput on Qwen3 4B, 32B, and 235B MoE models.
Dynamically applies inference-specific layer fusions at runtime using torch.fx and ast to match custom code speed.

Qualcomm acquires Nexa AI, open-sources GenAI runtime for Hexagon NPUs

2026-07-07 22:44 UTC

Qualcomm has acquired Nexa AI and open-sourced GenieX, a GenAI runtime optimized for Hexagon NPUs. It enables running LLMs and VLMs locally on Snapdragon devices via CLI, Python, Kotlin/Java, Docker, and an OpenAI-compatible server, supporting both Hugging Face GGUF models and Qualcomm AI Hub bundles.

Qualcomm acquires Nexa AI and open-sources GenieX runtime
GenieX supports NPU, GPU, and CPU inference on Snapdragon devices

From Hugging Face to Amazon SageMaker Studio in one click

2026-07-07 21:15 UTC

Hugging Face and Amazon SageMaker AI announce a deep-link integration enabling one-click transition from model discovery to SageMaker Studio. The integration pre-configures permissions, surfaces GPU quotas, and supports model customization and deployment, streamlining the path from inspiration to enterprise deployment.

One-click deep link from Hugging Face model page to SageMaker Studio with pre-loaded model and configured environment.
New Studio environments automatically include full permissions for fine-tuning, training, notebooks, and endpoint deployment.

NVIDIA and Hugging Face Bring New Models and Frameworks to LeRobot for the Open Robotics Community

2026-07-07 06:00 UTC

NVIDIA and Hugging Face collaborate to integrate the NVIDIA Isaac GR00T 1.7 model and Isaac Teleop framework into LeRobot, with NVIDIA Cosmos 3 planned soon. These integrations provide developers with a more accessible and standardized path for robot development, driving innovation in the open robotics community.

NVIDIA and Hugging Face bring Isaac GR00T 1.7 model and Isaac Teleop framework to LeRobot.
LeRobot gains NVIDIA physical AI capabilities for data collection, model training, and simulation.

Run AI workloads on any cloud, store on Hugging Face: zero-egress storage with SkyPilot

2026-07-07 00:00 UTC

SkyPilot and Hugging Face collaborate to allow users to store models and datasets on the Hub while running compute on any cloud without egress fees for reads.

Mount Hugging Face Buckets or repos into SkyPilot jobs via hf:// URLs and HF_TOKEN
Supports 20+ clouds, Kubernetes, and on-prem clusters

LeRobot v0.6.0: Imagine, Evaluate, Improve

2026-07-07 00:00 UTC

LeRobot v0.6.0 introduces world model policies (VLA-JEPA, FastWAM, LingBot-VA), new VLAs (GR00T N1.7, MolmoAct2, etc.), reward model API (Robometer, TOPReward), six new simulation benchmarks, and a deployment CLI with DAgger corrections, depth sensing, automatic language annotation, up to 2x faster data loading, cloud training, and a leaner install—all aimed at closing the robot learning loop.

Three new world model policies enable robots to imagine future states before acting.
New VLAs include GR00T N1.7, MolmoAct2, EO-1, Multitask DiT, and EVO1, with fine-tuning and deployment support.

tencent/Hy3

2026-07-06 23:57 UTC

Tencent releases Hy3, a 295B-parameter Mixture-of-Experts model with 21B active parameters and 3.8B MTP layer parameters, under Apache 2.0 license. It outperforms similar-size models and rivals flagship open-source models with 2-5x parameters. Available on Hugging Face (598GB full, 300GB FP8 quantized) with 256K context, and free on OpenRouter until July 21, 2026.

Tencent Hy3: 295B MoE model with 21B active parameters, Apache 2.0 licensed
Outperforms similar-size models; rivals models 2-5x its size

From Hugging Face to Amazon SageMaker Studio in one click

2026-07-06 22:35 UTC

Hugging Face and Amazon SageMaker AI announce a deep-link integration that allows developers to go from model discovery to SageMaker Studio experimentation with a single click. The integration pre-configures permissions, surfaces GPU quota, and streamlines fine-tuning and deployment workflows.

New 'Customize on SageMaker AI' and 'Deploy on SageMaker AI' buttons on Hugging Face model pages enable one-click access to SageMaker Studio.
New Studio environments automatically have permissions pre-configured, eliminating manual IAM setup.

IOL-AI 2026 Challenge: Can Your Model Solve Linguistics Olympiad Problems?

2026-07-06 20:24 UTC

The IOL-AI 2026 Challenge, hosted on Hugging Face Spaces, tests AI models on linguistics olympiad problems, inviting researchers to submit innovative solutions.

Challenge based on linguistics olympiad problems to evaluate AI reasoning.
Hosted on Hugging Face Spaces platform.

PRX Part 4: Our Data Strategy

2026-07-06 15:30 UTC

This article details the data pipeline behind PRX, a 7B text-to-image model. Key aspects include assembling a diverse pre-training dataset from public and internal sources, using long accurate captions generated by a VLM, and employing Lance for dataset building and MDS for streaming. The team explains their choice of JPEG encoding at quality 92, on-the-fly text latent computation, and lessons learned about data fragmentation.

Pre-training data is assembled from a mix of public and internal datasets, re-captioned with a VLM for consistency.
Long, faithful captions are crucial; they turn imperfections into controllable attributes.

Training Gemma-3 for Structured Mathematical Reasoning with Tunix GRPO, LoRA Adapters, and GSM8K Rewards

2026-07-06 04:26 UTC

This tutorial builds an end-to-end GRPO training workflow that teaches Gemma-3 to reason through GSM8K math problems using Tunix, JAX, LoRA, and custom reward functions. It covers environment preparation, Hugging Face authentication, model loading, prompt formatting, reward function definition, LoRA adapter attachment, baseline evaluation, and GRPO training.

Implements GRPO training with Tunix and JAX, updating only LoRA adapter weights for a single-accelerator setup.
Defines format and correctness reward functions to provide multiple feedback signals.

🤗 Kernels: Major Updates

2026-07-06 00:00 UTC

Hugging Face's Kernels project, aimed at standardizing custom kernel packaging, distribution, and consumption, has undergone a major redesign. This post summarizes key updates: a new 'kernel' repository type for better discoverability; enhanced security through trusted publishers and code signing; revamped CLIs with clearer separation of concerns; expanded framework support including Torch Stable ABI and Apache TVM FFI; a foundation for agentic kernel development; and miscellaneous improvements like simplified environment setup and compatibility checking.

New 'kernel' repository type on the Hub allows users to filter by accelerator, OS, and backend version. All kernels are now listed at https://huggingface.co/kernels.
Security improvements: by default, only trusted publishers' kernels are loaded; optional code signing with Sigstore's cosign using ephemeral keys protects against credential compromise.

Leanstral 1.5: Proof Abundance for All

2026-07-03 14:18 UTC

Leanstral 1.5, a free Apache-2.0 licensed model with 6B active parameters, delivers a major performance upgrade in formal verification, saturating miniF2F, solving 587/672 PutnamBench problems, and achieving state-of-the-art results on FATE-H (87%) and FATE-X (34%). Trained through mid-training, supervised fine-tuning, and reinforcement learning with CISPO, it excels in agentic proof engineering and real-world code verification, uncovering 5 previously unknown bugs across 57 repositories tested. Fully open-sourced and available via Hugging Face and a free API, Leanstral 1.5 is now accessible for practical proof engineering in Lean 4.

Leanstral 1.5 achieves near-perfect or state-of-the-art results on multiple formal math benchmarks, including 100% on miniF2F and 587/672 on PutnamBench.
It demonstrates strong code verification capabilities, proving AVL tree time complexity and discovering real-world bugs in open-source repositories.

The Wiola Architecture for Efficient Small Language Models

2026-07-03 04:00 UTC

Wiola is a fully original small language model architecture built from first principles, unrelated to existing families like GPT, LLaMA, Mistral, or Falcon. It introduces five novel components: Spiral Rotary Positional Encoding (SRPE), Gated Cross-Layer Attention (GCLA), Adaptive Token Merging (ATM), Dual Stream Feed-Forward (DSFF), and WiolaRMSNorm. Released in four sizes (120M to 1.5B parameters), it is fully compatible with HuggingFace Transformers.

Wiola is a fully original SLM architecture with no lineage from existing model families.
Five novel components: SRPE, GCLA, ATM, DSFF, and WiolaRMSNorm.

Hugging Face and Cerebras bring Gemma 4 to real-time voice AI

2026-07-01 00:00 UTC

Hugging Face and Cerebras have collaborated to create a real-time voice AI system powered by Gemma 4, achieving dramatically lower latency through an open modular architecture. The pipeline integrates Nvidia's speech recognition, Cerebras's fast inference, and Alibaba's text-to-speech, and is already deployed in over 9,000 Reachy Mini robots.

Hugging Face and Cerebras debut a real-time voice AI demo using Gemma 4 with ultra-low latency.
The system uses an open cascaded architecture: speech input → ASR → model inference → TTS → spoken response.

Demystifying Security Risks of AI-Powered Applications on Pre-Trained Model Hubs

2026-06-30 19:10 UTC

Researchers present the first systematic security analysis of AI-Apps on platforms like Hugging Face, identifying five threat categories and ten attack vectors. Over 970,000 public AI-Apps were analyzed, revealing thousands with leaked credentials, hundreds vulnerable to code execution, and tens with embedded backdoors.

AI-Apps on model hubs have serious security risks including broken access control and input injection.
Analysis of over 970,000 public AI-Apps found thousands leaking credentials and hundreds vulnerable to arbitrary code execution.

ScarfBench: Benchmarking AI Agents for Enterprise Java Framework Migration

2026-06-30 18:32 UTC

IBM Research introduces ScarfBench, an open benchmark for evaluating AI agents on cross-framework migration tasks in Enterprise Java. The benchmark includes 34 applications, 102 framework implementations, and 204 migration tasks. Current top agents achieve less than 10% behavioral success, highlighting the difficulty of preserving behavior during migration.

ScarfBench evaluates AI agents on framework migration between Spring, Jakarta EE, and Quarkus, requiring build, deployment, and behavioral validation.
The benchmark comprises 34 applications, ~2,000 source and test files, and 1,331 expert-written tests.

Why Specialization Is Inevitable

2026-06-30 14:39 UTC

This article argues that specialization is an inevitable consequence of finite resources and selection pressure, drawing from optimization theory (No Free Lunch theorems), evolutionary biology, competitive markets, and machine learning. It distinguishes specialization from domain knowledge and addresses the Bitter Lesson, concluding that scaling does not eliminate the need for focused systems.

The No Free Lunch theorem implies that no algorithm is universally optimal; specialization trades breadth for fit.
Biology and markets show that limited resources drive niche specialization and concentrated strategy.

Featuring Every Eval Ever Results on Hugging Face Model Pages

2026-06-30 00:00 UTC

Every Eval Ever (EEE) and Hugging Face Community Evals are now intercompatible, allowing cross-posting and interpretation of evaluation results with links to open models, leaderboards, and a unified standardized metadata store.

EEE and Hugging Face Community Evals now interoperable, enabling cross-posting of evaluation results.
EEE provides a unified JSON schema for recording evaluation details, including runner, model, settings, etc.

DiScoFormer: One transformer for density and score, across distributions

2026-06-29 18:02 UTC

DiScoFormer is a transformer that estimates both density and score of a distribution from a set of data points in a single forward pass without retraining. It uses cross-attention, a shared backbone with two heads, and a consistency loss to adapt to new distributions. It significantly outperforms KDE in high dimensions.

Estimates density and score simultaneously without retraining.
Leverages consistency loss for self-adaptation to out-of-distribution data.

Kog Laneformer 2B: The Latency-First Model Behind Kog Inference Engine

2026-06-29 08:40 UTC

Kog releases Laneformer 2B, a 2.3 billion parameter instruction-tuned coding model designed from the ground up for high-speed single-request inference. By co-designing the model architecture with its inference engine, Kog introduces Delayed Tensor Parallelism and a lane-structured Transformer to hide communication overhead. The model achieves competitive coding benchmarks (45.1% HumanEval+, 51.6% MBPP+) and is now available open source on Hugging Face.

Laneformer 2B is a 2.3B parameter coding model optimized for low-latency inference.
It uses a novel lane-structured architecture with Delayed Tensor Parallelism to minimize communication costs.

Building a Stable Fable 5 Traces Workflow in Colab: Parsing Tool Calls, Auditing Data, and Training Baselines

2026-06-28 07:02 UTC

This tutorial details a robust workflow for the Fable 5 Traces dataset from Hugging Face. It avoids fragile dependencies by manually parsing the merged JSONL file, normalizes tool calls, audits data structure, redacts secrets, and trains pure-Python Naive Bayes baselines to predict output types and tool usage.

Manually download and parse the merged JSONL file to avoid fragile dependencies.
Develop parsing utilities to extract tool names, arguments, and text payloads from raw outputs.

Building Supervised Fine-Tuning Data from NVIDIA Open-SWE-Traces: Trajectory Parsing, Patch Analysis, Token Budgets, and Tool-Use Metrics

2026-06-27 00:02 UTC

This tutorial demonstrates processing NVIDIA's Open-SWE-Traces dataset for supervised fine-tuning. It covers streaming data from Hugging Face, normalizing agent trajectories, parsing code patches, building an analysis DataFrame, and curating a high-quality SFT subset based on success labels, token limits, and language filters.

Stream Open-SWE-Traces from Hugging Face without local download.
Normalize trajectories, extract role counts, tool usage, and patch info.

Run a vLLM Server on HF Jobs in One Command

2026-06-26 00:00 UTC

Spin up a private, OpenAI-compatible LLM endpoint on Hugging Face infrastructure with a single command — no servers to provision, no Kubernetes, pay-per-second. Covers the full process from launch, querying, cleanup, scaling to larger models, creating a chat UI, SSH debugging, and using as a coding agent backend, with a comparison to Inference Endpoints.

Use the 'hf jobs run' command with the vLLM Docker image and --expose 8000 to run a vLLM server on HF Jobs.
Endpoints are authenticated via Hugging Face tokens, requiring read access to the job's namespace, and support querying via curl or OpenAI Python client.

Which tokens does a hybrid model predict better?

2026-06-25 16:11 UTC

Ai2 compares its 7B transformer Olmo 3 and hybrid Olmo Hybrid, finding the hybrid excels on content words (nouns, verbs, adjectives) and tokens requiring context, but loses advantage on repeated tokens and closing brackets. Token-level loss filtering reveals architectural differences.

Hybrid models predict meaningful tokens (e.g., content words) better, but not repeated tokens.
Hybrids replace some attention layers with recurrent layers, which have fixed-size memory suited for tracking sequential state.

MacroLens: A Multi-Task Benchmark for Contextual Financial Reasoning under Macroeconomic Scenarios

2026-06-25 04:00 UTC

MacroLens is a new multi-task benchmark covering 4,416 U.S. small- and micro-cap equities over 2021-2026, integrating prices, accounting data, macroeconomic series, SEC filings, and news. It addresses four key assumption violations in financial time-series evaluation, includes seven tasks and 1,130 macroeconomic events, and evaluates 19 methods with a five-step feature-context ablation. The benchmark is publicly available on Hugging Face.

First public benchmark to jointly handle prices, fundamentals, macro, and text signals
Covers 4,416 U.S. small-cap stocks with 46.8M XBRL facts, 53 macro series, and 215,882 news articles

Accelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel

2026-06-24 16:00 UTC

NVIDIA NeMo AutoModel builds on HuggingFace Transformers v5, adding Expert Parallelism, DeepEP fused all-to-all dispatch, and TransformerEngine kernels to achieve 3.4-3.7x higher training throughput and 29-32% less GPU memory for fine-tuning MoE models, with no API changes.

NeMo AutoModel subclasses AutoModelForCausalLM, requiring only one import line change for performance gains.
On a 550B model, Expert Parallelism enables full fine-tuning across 16 nodes of H100s, where Transformers v5 runs out of memory.

Build real agentic apps using CUGA: two dozen working examples on a lightweight harness

2026-06-23 12:51 UTC

CUGA is IBM's open-source agent harness that handles the plumbing of building agentic apps, leaving developers to write only a tool list and a prompt. This article walks through one example — an IBM Cloud advisor app — and explains how CUGA's planning, reflection, and policy system enable robust, production-ready agents.

CUGA abstracts away orchestration, state, and tool calls, letting developers focus on tools and prompts.
The cuga-apps repository contains two dozen single-file apps, each a working example that can be read and copied.

Experimenting with the proposed Cross-Origin Storage API in Transformers.js

2026-06-23 00:00 UTC

This article explores the Cross-Origin Storage (COS) API proposal, which enables web apps to share large files (like AI models and Wasm runtimes) across origins using cryptographic hashes instead of URLs. Using Transformers.js as an example, it highlights the redundancy caused by current cache partitioning and how COS addresses it with hash-based identification, flexible access control, and integrity verification.

Current browser caches are partitioned by origin, leading to redundant downloads of shared AI resources across different apps.
The Cross-Origin Storage (COS) API identifies files by cryptographic hash, enabling cross-origin sharing.

Shipping huggingface_hub every week with AI, open tools, and a human in the loop

2026-06-23 00:00 UTC

Hugging Face revamped the release process for huggingface_hub, using AI and open tools to ship weekly releases instead of monthly, while keeping a human in the loop for final review. The new pipeline costs about $0.25 per release and has improved release note quality and discovery of integration issues.

Release cadence improved from 4-6 weeks to weekly
AI drafts release notes, but deterministic verification ensures accuracy

PP-OCRv6 on Hugging Face: 50-Language OCR from 1.5M to 34.5M Parameters

2026-06-22 13:18 UTC

PP-OCRv6 is PaddleOCR's latest universal OCR model family, scaling from 1.5M to 34.5M parameters across three tiers, supporting 50 languages. It delivers a +4.6 percentage point improvement in text detection Hmean and +5.1 in recognition accuracy over PP-OCRv5_server. New architecture includes PPLCNetV4 backbone, RepLKFPN for detection, and EncoderWithLightSVTR for recognition. Supports multiple inference backends: Paddle Inference, Transformers, and ONNX Runtime.

Three model tiers: tiny (1.5M), small (7.7M), medium (34.5M) for various deployment scenarios.
Supports 50 languages including Chinese, English, Japanese, and 46 Latin-script languages.

We got local models to triage the OpenClaw repo for FREE!*

2026-06-22 00:00 UTC

A maintainer of OpenClaw built a system using local open-weight models (Gemma, Qwen) in an agent harness to triage issues and pull requests in real-time, achieving competitive performance with closed models while running on local hardware for minimal cost.

Local models like Gemma and Qwen can effectively classify GitHub issues and PRs for triage.
The system uses an agent harness with a read-only shell (reposhell) to safely inspect code.

Salesforce CodeGen Tutorial: Generate, Validate, and Rerank Python Functions With Unit Tests and Safety Checks

2026-06-19 02:44 UTC

We implement an end-to-end workflow for Salesforce CodeGen, loaded from Hugging Face. We move past basic inference by adding function extraction, syntax checking, static safety checks, and unit-test validation. We rerank best-of-N candidates, compose multi-turn program synthesis, and experiment with prompt styles. We finish by visualizing a mini benchmark and exporting the generated artifacts as reusable files.

Load Salesforce CodeGen model from Hugging Face and prepare environment
Extract, validate syntax, check safety, and run unit tests on generated functions

MosaicLeaks: Can your research agent keep a secret?

2026-06-18 18:13 UTC

Deep research agents that combine private documents with web search can inadvertently leak sensitive information through their query logs. The MosaicLeaks benchmark quantifies this privacy risk and proposes a training method called Privacy-Aware Deep Research (PA-DR) that reduces information leakage by over 3x while maintaining task performance.

MosaicLeaks introduces a benchmark of multi-hop research chains that interleave private local documents and public web queries, measuring three levels of leakage: intent, answer, and full-information.
Standard training for task performance increases both success rate and leakage; training with PA-DR reduces answer/full-information leakage from 34.0% to 9.9% while keeping strict chain success at 58.7%.

Beyond LoRA: Can you beat the most popular fine-tuning technique?

2026-06-18 00:00 UTC

LoRA is the most popular parameter-efficient fine-tuning (PEFT) technique, but research shows other methods can outperform it on certain tasks. This article introduces Hugging Face's PEFT library and its benchmarks, discussing how to choose the right PEFT technique based on specific needs, and points out that LoRA is not always the best choice.

LoRA dominates PEFT techniques but may not be optimal.
Hugging Face's PEFT library provides a unified API and benchmarks to help users choose.

Is it agentic enough? Benchmarking open models on your own tooling

2026-06-18 00:00 UTC

A new benchmark harness evaluates the entire process of AI agents using software libraries, using Hugging Face Transformers as a case study. By measuring token usage, time, and error rates across different models and tooling tiers, the authors uncover tradeoffs between ease of use and resource consumption, providing insights for library maintainers and agent users.

Standard benchmarks only check final answers; this harness measures the entire process including token cost and errors.
Three tiers tested: bare install, cloned source, and packaged Skill – each with different overhead.

MolmoMotion: Language-guided 3D motion forecasting

2026-06-17 15:26 UTC

MolmoMotion is a new 3D motion forecasting model that predicts future 3D point trajectories of objects given a video frame, 3D points on an object, and a language instruction. It outperforms existing methods in robotics planning and controllable video generation. The model is accompanied by the MolmoMotion-1M dataset and PointMotionBench benchmark.

MolmoMotion uses language instructions to guide 3D motion forecasting, outperforming existing methods.
It offers autoregressive and flow-matching variants for deterministic and uncertain scenarios.

From the Hugging Face Hub to robot hardware with Strands Agents and LeRobot

2026-06-17 10:18 UTC

AWS's open-source SDK Strands Robots integrates LeRobot, enabling developers to train from Hub datasets and deploy policies on simulated or real robots through a single Agent workflow. This post walks through five steps with a runnable example on a laptop.

Strands Robots SDK exposes LeRobot as composable AgentTools, enabling end-to-end control from dataset to robot hardware.
Simulation and hardware share the same DatasetRecorder and LeRobotDataset format for seamless compatibility.

GLM-5.2: Built for Long-Horizon Tasks

2026-06-17 09:01 UTC

Z.AI introduces GLM-5.2, a flagship model for long-horizon tasks with a solid 1M-token context, advanced coding capabilities with flexible effort levels, and an open-source MIT license. It achieves top-tier performance on long-horizon coding benchmarks, rivaling closed-source models.

GLM-5.2 delivers a stable 1M-token context for long-horizon engineering tasks.
It leads open-source models on benchmarks like FrontierSWE and PostTrainBench, close to Opus 4.8.

Agentic Resource Discovery: Let agents search

2026-06-17 00:00 UTC

The Agentic Resource Discovery (ARD) specification provides a discovery layer for AI agents to find tools, skills, and other agents dynamically, rather than relying on pre-installed configurations. Hugging Face has implemented a reference tool on the Hub.

ARD defines a standard for cataloging and searching agent capabilities across federated registries.
Hugging Face's Discover Tool implements ARD, enabling natural language search for Skills, MCP servers, and AI applications.

Tensordyne Napier AI Processor Announced with Logarithmic Math

2026-06-16 09:18 UTC

Tensordyne announced Napier, a 3nm AI processor and rack-scale inference platform built around proprietary logarithmic mathematics that turns multiplications into additions, freeing up area for more SRAM. The TDN72 rack targets up to 20 trillion parameter models with 72 nodes, 68 petaflops, and 42TB HBM. Claims include 5x the SRAM of NVIDIA Blackwell and air-cooled design. Software ecosystem includes Hugging Face hub, PyTorch/Triton support, and a custom Python eDSL. Beta in Q1 2027, shipments by end of Q2 2027.

Proprietary logarithmic math converts multiplications to additions, reducing multiplier area and allowing more on-chip SRAM.
Napier chip: 3nm, 138 billion transistors, 2.1 petaflops per die, 256MB SRAM, 144GB HBM3E.

Can open-source beat OpenAI?

2026-06-16 05:41 UTC

In the US-China AI race, the divide between open-source and closed-source philosophies could determine the winner. Chinese AI labs actively release open-source models, while US giants like OpenAI and Anthropic favor closed-source. In a Rest of World event, former Hugging Face executive Tiezhen Wang discussed the history of open-source models, how Chinese AI labs monetize them, and the debate over model distillation and intellectual property.

Open-source vs closed-source: Chinese labs embrace openness, US leaders prefer proprietary models.
Monetization through API subscriptions, infrastructure support, and branding for open-source models.

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

2026-06-16 04:00 UTC

Nemotron 3 Ultra is a 550B total/55B active parameter hybrid Mamba-Transformer MoE model by NVIDIA. Pre-trained on 20T tokens with 1M context, it achieves 6x higher throughput than open LLMs with on-par accuracy. Open-sourced on HuggingFace.

Mixture-of-Experts with 550B total, 55B active parameters
Hybrid Mamba-Transformer architecture, 1M token context

olmo-eval: An evaluation workbench for the model development loop

2026-06-12 15:56 UTC

olmo-eval is a new evaluation workbench designed to support the iterative evaluation cycle during LLM development. Built on the OLMES standard, it offers flexible task definitions, swappable runtime policies, and detailed per-question comparison to help developers determine whether interventions are significant.

Designed for the repeated evaluation loop in model development, supporting quick benchmark addition, cross-checkpoint runs, and fine-grained results analysis.
Offers both lightweight and sandboxed run modes, automatically selecting based on benchmark needs, unlike tools like Harbor.

Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP

2026-06-11 00:00 UTC

This article is the second part of the PyTorch profiling series, delving into the internals of nn.Linear layers, including transpose operations, bias-fused epilogue techniques, and the impact of torch.compile on a single linear layer. It then dissects the performance characteristics of a Multilayer Perceptron (MLP) with GeGLU activation, showcasing the scheduling and execution of GPU kernels.

nn.Linear fuses bias addition into the matrix multiplication kernel via an epilogue, avoiding extra memory accesses.
torch.compile offers no significant speedup for a single nn.Linear layer but eliminates CPU dispatch overhead.

Introducing North Mini Code: Cohere’s First Model For Developers

2026-06-09 15:56 UTC

Cohere has released North Mini Code, a 30B-parameter Mixture-of-Experts model with 3B active parameters, designed for agentic software engineering tasks. It achieves competitive performance on coding benchmarks and is available under Apache 2.0 on Hugging Face.

30B MoE model with 3B active parameters, optimized for agentic coding.
Outperforms comparable open-source models on Artificial Analysis Coding Index.

Hugging Face

Source Mix

Topic Mix

Timeline

Latest Updates

Enhancing enterprise inference on Amazon SageMaker HyperPod with data capture, Hugging Face, NVMe, and Route 53 integration

Nvidia, Hugging Face Collaborate on Open Source Robot Models

Data for Agents

[AINews] Lilian Weng summarizes 35 papers on Harness Engineering for RSI

Native-speed vLLM transformers modeling backend

Qualcomm acquires Nexa AI, open-sources GenAI runtime for Hexagon NPUs

From Hugging Face to Amazon SageMaker Studio in one click

NVIDIA and Hugging Face Bring New Models and Frameworks to LeRobot for the Open Robotics Community

Run AI workloads on any cloud, store on Hugging Face: zero-egress storage with SkyPilot

LeRobot v0.6.0: Imagine, Evaluate, Improve

tencent/Hy3

From Hugging Face to Amazon SageMaker Studio in one click

IOL-AI 2026 Challenge: Can Your Model Solve Linguistics Olympiad Problems?

PRX Part 4: Our Data Strategy

Training Gemma-3 for Structured Mathematical Reasoning with Tunix GRPO, LoRA Adapters, and GSM8K Rewards

🤗 Kernels: Major Updates

Leanstral 1.5: Proof Abundance for All

The Wiola Architecture for Efficient Small Language Models

Hugging Face and Cerebras bring Gemma 4 to real-time voice AI

Demystifying Security Risks of AI-Powered Applications on Pre-Trained Model Hubs

ScarfBench: Benchmarking AI Agents for Enterprise Java Framework Migration

Why Specialization Is Inevitable

Featuring Every Eval Ever Results on Hugging Face Model Pages

DiScoFormer: One transformer for density and score, across distributions

Kog Laneformer 2B: The Latency-First Model Behind Kog Inference Engine

Building a Stable Fable 5 Traces Workflow in Colab: Parsing Tool Calls, Auditing Data, and Training Baselines

Building Supervised Fine-Tuning Data from NVIDIA Open-SWE-Traces: Trajectory Parsing, Patch Analysis, Token Budgets, and Tool-Use Metrics

Run a vLLM Server on HF Jobs in One Command

Which tokens does a hybrid model predict better?

MacroLens: A Multi-Task Benchmark for Contextual Financial Reasoning under Macroeconomic Scenarios

Accelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel

Build real agentic apps using CUGA: two dozen working examples on a lightweight harness

Experimenting with the proposed Cross-Origin Storage API in Transformers.js

Shipping huggingface_hub every week with AI, open tools, and a human in the loop

PP-OCRv6 on Hugging Face: 50-Language OCR from 1.5M to 34.5M Parameters

We got local models to triage the OpenClaw repo for FREE!*

Salesforce CodeGen Tutorial: Generate, Validate, and Rerank Python Functions With Unit Tests and Safety Checks

MosaicLeaks: Can your research agent keep a secret?

Beyond LoRA: Can you beat the most popular fine-tuning technique?

Is it agentic enough? Benchmarking open models on your own tooling

MolmoMotion: Language-guided 3D motion forecasting

From the Hugging Face Hub to robot hardware with Strands Agents and LeRobot

GLM-5.2: Built for Long-Horizon Tasks

Agentic Resource Discovery: Let agents search

Tensordyne Napier AI Processor Announced with Logarithmic Math

Can open-source beat OpenAI?

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

olmo-eval: An evaluation workbench for the model development loop

Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP

Introducing North Mini Code: Cohere’s First Model For Developers

Company Directory

OpenAI

Anthropic

DeepSeek

Google

Meta

Microsoft

NVIDIA

Mistral

Hugging Face

LangChain