This tutorial demonstrates processing NVIDIA's Open-SWE-Traces dataset for supervised fine-tuning. It covers streaming data from Hugging Face, normalizing agent trajectories, parsing code patches, building an analysis DataFrame, and curating a high-quality SFT subset based on success labels, token limits, and language filters.
Stream Open-SWE-Traces from Hugging Face without local download.
Normalize trajectories, extract role counts, tool usage, and patch info.
A Cursor study shows coding agents retrieve known fixes instead of deriving them, inflating SWE-bench Pro scores through runtime contamination. 63% of successful Opus 4.8 Max resolutions were retrieved; scores dropped significantly under strict isolation.
63% of successful Opus 4.8 Max resolutions on SWE-bench Pro retrieved the fix instead of deriving it.
Sealing git history and internet access dropped Opus 4.8 Max from 87.1% to 73.0% on SWE-bench Pro.
Perplexity's Computer for Counsel extends Perplexity Computer to legal teams. It routes 20+ models across Midpage, MCP connectors, and Microsoft 365, with cited outputs lawyers can verify.
Computer for Counsel launched on June 24, 2026, for Enterprise and Max subscribers.
It auto-routes 20+ frontier AI models per subtask, avoiding single-vendor lock-in.
OpenAI has begun a limited preview of GPT-5.6, featuring three tiered models: Sol (flagship), Terra (production), and Luna (fast, low-cost). New reasoning modes (max and ultra) enhance deep reasoning and parallel task handling. Pricing ranges from $1 per million tokens. Early benchmarks show state-of-the-art performance on several tests.
GPT-5.6 family splits into three tiers: Sol (flagship), Terra (production), and Luna (fast, low-cost).
New reasoning modes: max (deep reasoning) and ultra (subagent coordination).
This tutorial guides you through building a lightweight personal AI agent in Google Colab, inspired by nanobot's core architecture. Starting from a provider abstraction, you'll add tool registration, session memory, lifecycle hooks, skills, and an MCP-style server. By recreating each building block yourself, you'll understand how messages, tools, memory, and model responses work together in a provider-agnostic agent loop.
Build an AI agent from scratch in Colab without external frameworks
Includes provider abstraction, tool registration, session memory, lifecycle hooks, and MCP server
DeepReinforce released Ornith-1.0, an open-source coding model family built on Gemma 4 and Qwen 3.5. Instead of a fixed harness, the model learns its own scaffold during reinforcement learning. The 397B flagship reports 82.4 on SWE-Bench Verified, with all weights under the MIT license.
Ornith-1.0 ships in 9B, 31B, 35B-MoE, and 397B-MoE sizes under MIT, built on Gemma 4 and Qwen 3.5.
The model learns its own scaffold during RL, jointly optimizing the harness and the solution.
Baidu open-sourced Unlimited OCR, a 3B-parameter MoE model that uses Reference Sliding Window Attention to keep the KV cache constant, enabling efficient parsing of dozens of pages in a single pass. It achieves 93.23 on OmniDocBench v1.5, surpassing DeepSeek OCR by 6.22 points, under an MIT license.
Unlimited OCR is a 3B MoE model with only 500M active parameters.
It uses Reference Sliding Window Attention to maintain constant KV cache size.
Gradium released two real-time speech translation models: stt-translate (speech-to-text) and s2s-translate (speech-to-speech), covering English, French, German, Spanish, and Portuguese across 20 language pairs. By collapsing the traditional three-model cascade into two, they achieve better BLEU and MetricX scores than gpt-realtime-translate, with an average latency of 3.0 seconds—just behind Gemini's 2.9s—while adding output voice selection and cloning.
Gradium launches stt-translate and s2s-translate, merging transcription and translation into a single pass.
Models cover 5 languages and 20 pairs, average latency 3.0s.
A comprehensive tutorial that builds an OpenHarness-style agent harness from scratch, covering tool use, permissions, memory, skills, context compaction, retry logic, cost tracking, and multi-agent coordination, with fully runnable code.
Build an agent runtime from scratch with core components like tools, memory, permissions, and skills.
Understand full control flow: task receipt, model decision, tool execution, observation loop.
This tutorial demonstrates how to build a fully offline Graphify workflow that transforms a realistic multi-module Python application into a knowledge graph. It covers installing Graphify and graph libraries, generating a sample app, extracting the graph locally using tree-sitter without any API key or LLM backend, analyzing the codebase with NetworkX (file types, relationships, centrality, community detection, shortest paths), and creating both static and interactive visualizations to understand how modules, classes, functions, and database objects connect.
Completely offline knowledge graph generation from Python codebases.
Uses NetworkX for centrality analysis, community detection, and path tracing.
Nous Research has introduced /learn, a new command in the Hermes Agent Skills System that automatically generates reusable skills from various sources. The command uses the agent's existing tools to source material and writes a standards-compliant SKILL.md file. Skills are loaded progressively to keep token usage low, and the system supports multiple creation methods including manual writing, auto-saving, and Hub installation.
/learn generates SKILL.md from directories, URLs, conversations, or notes without manual writing
The command leverages existing agent tools (read_file, search_files, web_extract) and requires no separate ingestion engine
Generative AI has reshaped software development from line-by-line autocomplete to full application generation, multi-agent pipelines, and natural-language codebase interfaces. This article compares 16 top AI coding tools in 2026, including Atoms, GitHub Copilot, Tabnine, and more, highlighting the trend from single-function tools to consolidated platforms like Atoms. The recommendation is to match the tool to the task: agent platforms for idea-to-product, assistants for daily coding, and analysis tools for code quality.
Generative AI coding tools have evolved from autocomplete to full-stack app generation and multi-agent pipelines
The 2026 trend is consolidation into all-in-one platforms like Atoms
UC San Diego's DFlash replaces autoregressive drafting with a lightweight block diffusion model for speculative decoding. It drafts whole token blocks in a single forward pass and conditions on target hidden features through KV injection. The paper reports up to 6.08x lossless speedup on Qwen3-8B, while NVIDIA reports up to 15x throughput on Blackwell at fixed interactivity. DFlash ships 20 checkpoints and supports SGLang, vLLM, and TensorRT-LLM.
DFlash drafts entire token blocks in a single forward pass, not one token at a time.
It injects target hidden features into every draft layer's KV cache, scaling acceptance length with depth.
Mistral AI released OCR 4 on June 23, 2026, moving from clean text extraction to structured document output. Each block returns a bounding box, a typed classification, and per-page and per-word confidence scores. The model supports 170 languages, runs in a single self-hosted container, and feeds citation-ready inputs into RAG, agentic, and enterprise search pipelines through one API endpoint.
OCR 4 returns bounding boxes, typed-block labels, and per-word confidence scores, not just text.
Supports 170 languages across 10 groups, with gains on rare and low-resource languages.
This tutorial builds a multilingual ASR and speech translation pipeline using NVIDIA Canary-1B-v2, covering setup, transcription, translation, timestamp extraction, SRT export, long-form transcription, batch processing, and benchmarking.
Set up NeMo and audio dependencies on a GPU-enabled runtime
Perform English ASR and translate to French, German, Spanish, and Italian
Prime Intellect has released prime-rl 0.6.0, an open framework for asynchronous reinforcement learning on trillion-parameter Mixture-of-Experts models. It trained GLM-5 on SWE tasks at up to 131k sequence length, with sub-5-minute step times and 256 rollouts, on 28 H200 nodes. This breakdown covers the inference and training optimizations behind those numbers — FP8 inference, Wide Expert Parallelism, prefill/decode disaggregation, router replay, and 3-D parallelism (FSDP, EP, CP).
prime-rl 0.6.0 enables asynchronous RL on trillion-parameter MoE models for long-horizon agentic tasks.
GLM-5 trained on SWE tasks at 131k sequence length, sub-5-minute steps, 28 H200 nodes.
This tutorial provides a practical walkthrough for using GLM-5.2 through its OpenAI-compatible API, covering key features such as reasoning-effort control, streaming, function calling, tool-using agents, structured JSON output, long-context retrieval, and cost estimation.
Set up the GLM-5.2 API with multiple providers and a reusable chat wrapper.
Test reasoning-effort control (off, high, max) and observe latency and token differences.
xAI introduced /goal in Grok Build, a mode for long-running, autonomous task execution. You hand off one objective, and the agent plans an approach, executes a progress checklist, and verifies the result until the goal completes.
Sakana AI released Sakana Fugu, a multi-agent orchestration system that routes tasks across a swappable pool of LLMs behind a single API endpoint. Fugu and Fugu Ultra lead coding, reasoning, and agentic benchmarks. The system aims to reduce single-vendor dependency and coordinates expert models internally for complex tasks.
Fugu is a language model that calls other LLMs in an agent pool, dynamically selecting, delegating, and synthesizing results. It supports recursive self-calls.
Two variants: Fugu (low-latency, compliance-friendly) and Fugu Ultra (fixed pool, optimized for hard problems).
MoonMath AI team released a bf16 forward attention kernel for AMD MI300X GPU, written in HIP and open-sourced under MIT. Using one-instruction asm wrappers and an eight-wave pipeline, it outperforms AMD's AITER v3 on all tested shapes and rounding modes, with geomean speedups of 1.08× to 1.18×. The speedup largely comes from memory placement (K in LDS, V in L1, Q in registers). A real-world SGLang PR integrating the kernel accelerated Wan2.1 video diffusion by 1.23× end-to-end with no quality regression.
MoonMath AI open-sourced a bf16 forward attention kernel for AMD MI300X, written in HIP (MIT license).
Beats AMD's AITER v3 on every shape and rounding mode — geomean 1.18×/1.15×/1.08×, up to 1.26×.
LLMs are stateless by default. Agent memory fixes that. This guide breaks down all 7 types — working, semantic, episodic, procedural, retrieval, parametric, and prospective — covering what each stores, where it lives, and when to build it. Includes a comparison table and working Python code.
Agent memory is infrastructure that turns a stateless model into a system retaining context, learning from experience, and acting over time.
The seven memory types vary by form (parametric vs non-parametric) and timescale (short-term vs long-term), each addressing a specific storage need.
This tutorial demonstrates how to build a complete web crawling workflow using Crawlee for Python, from setup to AI-ready output. It covers local demo website generation, crawling with BeautifulSoupCrawler, ParselCrawler, and PlaywrightCrawler, extraction of titles, metadata, product fields, and JavaScript-rendered cards, full-page screenshots, data normalization, link graph construction, and export to JSON, CSV, and RAG-ready JSONL chunks.
HTTP-first strategy is used for lightweight efficiency; browser crawling reserved for JavaScript-rendered pages.
Each crawler extracts URL, title, page type, text summary, outgoing links, and page-specific metadata.
Cisco Foundation AI has open-sourced FAPO (Fully Automated Prompt Optimization), a Claude Code-driven system that autonomously optimizes multi-step LLM pipelines from baseline prompts to target accuracy. FAPO evaluates chains, attributes failures at the step level, proposes variants across prompt, parameter, and chain-structure levels, and validates each through an independent reviewer. In Cisco's evaluation, it beat GEPA on 15 of 18 model-benchmark comparisons.
FAPO is an open-source, Claude Code-driven system for fully automated prompt optimization of multi-step LLM pipelines.
It escalates through three optimization levels (prompt, parameter, structural) guided by step-level failure attribution.
Nous Research introduces Blank Slate mode for Hermes Agent, starting with only provider, model, file operations, and terminal. All other tools are disabled and pinned via configuration, ensuring no silent re-enabling after updates. Users opt in manually as needed.
Blank Slate mode starts with only provider & model, File Operations, and Terminal enabled.
Web, browser, code execution, vision, memory, delegation, cron, skills, plugins, and MCP are disabled by default.
Yandex has open-sourced YaFF (Yet another Flat Format), a high-performance zero-copy serialization library for the Protobuf ecosystem. It keeps the .proto file as the single source of truth and only changes how data sits in memory. YaFF offers four layouts—Fixed, Flat, Sparse, and Dynamic—with the Flat layout achieving read speeds within 1.2× of a raw C++ struct on Yandex's benchmarks, about 3.8× faster than FlatBuffers and 22× faster than Protobuf. It is already deployed in Yandex's advertising recommendation system, delivering 10–20% CPU savings at production scale.
YaFF is an open-source zero-copy wire format for Protobuf from Yandex, licensed under Apache 2.0, currently in C++.
It provides four memory layouts: Fixed (frozen schema), Flat (dense hot data), Sparse (sparse schemas), and Dynamic (runtime selection).
This tutorial demonstrates building an end-to-end forecasting pipeline with TimeCopilot, covering data preparation, model evaluation (statistical, foundation, and optional GPU-based models), rolling cross-validation, probabilistic forecasting, anomaly detection, and an optional LLM agent for interpretation.
TimeCopilot provides a unified interface to manage diverse forecasting models including statistical, Prophet, and Chronos.
Rolling cross-validation with multiple error metrics (MAE, RMSE, MAPE) evaluates model performance.
SpatialClaw is a training-free framework from NVIDIA Research that achieves 59.9% average accuracy across 20 spatial benchmarks by using code as the action interface, outperforming SpaceTools by 11.2 points.
SpatialClaw improves VLM spatial reasoning without retraining by using code as the action interface.
Achieves 59.9% average accuracy on 20 benchmarks, +11.2 over SpaceTools.
VibeThinker-3B is a compact 3B-parameter reasoning model that matches large models like DeepSeek V3.2 on math and code benchmarks, using an efficient post-training pipeline and test-time scaling.
VibeThinker-3B is a 3B dense model, MIT-licensed, built on Qwen2.5-Coder-3B for verifiable reasoning.
It scores 94.3 on AIME26, comparable to DeepSeek V3.2 (671B) and Kimi K2.5 (1T).
Liquid AI released two new retrieval models: LFM2.5-Embedding-350M (dense bi-encoder) and LFM2.5-ColBERT-350M (late-interaction), adapted from LFM2.5-350M-Base with bidirectional attention. They support multilingual and cross-lingual search across 11 languages, are small enough for edge devices, and outperform larger models on NanoBEIR and MKQA-11 benchmarks.
Liquid AI releases two 350M-parameter retrieval models based on LFM2.5-350M-Base, converted to bidirectional encoders.
LFM2.5-Embedding-350M is a dense bi-encoder for fast search with small indexes; LFM2.5-ColBERT-350M uses token-level late interaction for higher accuracy.
We implement an end-to-end workflow for Salesforce CodeGen, loaded from Hugging Face. We move past basic inference by adding function extraction, syntax checking, static safety checks, and unit-test validation. We rerank best-of-N candidates, compose multi-turn program synthesis, and experiment with prompt styles. We finish by visualizing a mini benchmark and exporting the generated artifacts as reusable files.
Load Salesforce CodeGen model from Hugging Face and prepare environment
Extract, validate syntax, check safety, and run unit tests on generated functions