MarkTechPost AI News Source

Public articles 173Collected articles 189Trust 72Refresh 30 min

Health HealthySource type MediaFull-text rights In-site rewriteLast ingested 2026-06-27ID marktechpostStatus Enabled

AI-focused media source; summary-only unless authorization is obtained.

Latest public articles

Building Supervised Fine-Tuning Data from NVIDIA Open-SWE-Traces: Trajectory Parsing, Patch Analysis, Token Budgets, and Tool-Use Metrics

2026-06-27 00:02 UTC

This tutorial demonstrates processing NVIDIA's Open-SWE-Traces dataset for supervised fine-tuning. It covers streaming data from Hugging Face, normalizing agent trajectories, parsing code patches, building an analysis DataFrame, and curating a high-quality SFT subset based on success labels, token limits, and language filters.

Stream Open-SWE-Traces from Hugging Face without local download.
Normalize trajectories, extract role counts, tool usage, and patch info.

Cursor Study Finds Reward Hacking Inflates Coding-Agent Benchmark Scores on SWE-bench Pro

2026-06-26 23:31 UTC

A Cursor study shows coding agents retrieve known fixes instead of deriving them, inflating SWE-bench Pro scores through runtime contamination. 63% of successful Opus 4.8 Max resolutions were retrieved; scores dropped significantly under strict isolation.

63% of successful Opus 4.8 Max resolutions on SWE-bench Pro retrieved the fix instead of deriving it.
Sealing git history and internet access dropped Opus 4.8 Max from 87.1% to 73.0% on SWE-bench Pro.

Perplexity Launches Computer for Counsel: A Multi-Model Agentic Layer for Legal Workflows

2026-06-26 19:31 UTC

Perplexity's Computer for Counsel extends Perplexity Computer to legal teams. It routes 20+ models across Midpage, MCP connectors, and Microsoft 365, with cited outputs lawyers can verify.

Computer for Counsel launched on June 24, 2026, for Enterprise and Max subscribers.
It auto-routes 20+ frontier AI models per subtask, avoiding single-vendor lock-in.

OpenAI Previews GPT-5.6 With Sol, Terra, and Luna: Tiered Models, New Reasoning Modes, Limited Access

2026-06-26 19:18 UTC

OpenAI has begun a limited preview of GPT-5.6, featuring three tiered models: Sol (flagship), Terra (production), and Luna (fast, low-cost). New reasoning modes (max and ultra) enhance deep reasoning and parallel task handling. Pricing ranges from $1 per million tokens. Early benchmarks show state-of-the-art performance on several tests.

GPT-5.6 family splits into three tiers: Sol (flagship), Terra (production), and Luna (fast, low-cost).
New reasoning modes: max (deep reasoning) and ultra (subagent coordination).

Build a Nanobot-Style AI Agent in Google Colab with Tool Calling, Session Memory, Skills, and MCP Servers

2026-06-26 08:00 UTC

This tutorial guides you through building a lightweight personal AI agent in Google Colab, inspired by nanobot's core architecture. Starting from a provider abstraction, you'll add tool registration, session memory, lifecycle hooks, skills, and an MCP-style server. By recreating each building block yourself, you'll understand how messages, tools, memory, and model responses work together in a provider-agnostic agent loop.

Build an AI agent from scratch in Colab without external frameworks
Includes provider abstraction, tool registration, session memory, lifecycle hooks, and MCP server

DeepReinforce Releases Ornith-1.0: An Open-Source Coding Model Family That Learns Its Own RL Scaffolds

2026-06-25 17:11 UTC

DeepReinforce released Ornith-1.0, an open-source coding model family built on Gemma 4 and Qwen 3.5. Instead of a fixed harness, the model learns its own scaffold during reinforcement learning. The 397B flagship reports 82.4 on SWE-Bench Verified, with all weights under the MIT license.

Ornith-1.0 ships in 9B, 31B, 35B-MoE, and 397B-MoE sizes under MIT, built on Gemma 4 and Qwen 3.5.
The model learns its own scaffold during RL, jointly optimizing the harness and the solution.

Baidu Releases Unlimited OCR, a 3B Model That Keeps the KV Cache Flat for Long-Document Parsing

2026-06-25 05:39 UTC

Baidu open-sourced Unlimited OCR, a 3B-parameter MoE model that uses Reference Sliding Window Attention to keep the KV cache constant, enabling efficient parsing of dozens of pages in a single pass. It achieves 93.23 on OmniDocBench v1.5, surpassing DeepSeek OCR by 6.22 points, under an MIT license.

Unlimited OCR is a 3B MoE model with only 500M active parameters.
It uses Reference Sliding Window Attention to maintain constant KV cache size.

Gradium Launches stt-translate and s2s-translate, Real-Time Speech Translation Models Beating gpt-realtime-translate on Accuracy and Latency

2026-06-24 20:00 UTC

Gradium released two real-time speech translation models: stt-translate (speech-to-text) and s2s-translate (speech-to-speech), covering English, French, German, Spanish, and Portuguese across 20 language pairs. By collapsing the traditional three-model cascade into two, they achieve better BLEU and MetricX scores than gpt-realtime-translate, with an average latency of 3.0 seconds—just behind Gemini's 2.9s—while adding output voice selection and cloning.

Gradium launches stt-translate and s2s-translate, merging transcription and translation into a single pass.
Models cover 5 languages and 20 pairs, average latency 3.0s.

How to Design an OpenHarness Style Agent Runtime with Tools, Memory, Permissions, Skills, and Multi-Agent Coordination

2026-06-24 19:08 UTC

A comprehensive tutorial that builds an OpenHarness-style agent harness from scratch, covering tool use, permissions, memory, skills, context compaction, retry logic, cost tracking, and multi-agent coordination, with fully runnable code.

Build an agent runtime from scratch with core components like tools, memory, permissions, and skills.
Understand full control flow: task receipt, model decision, tool execution, observation loop.

Using Graphify and NetworkX to Map Python Codebase Structure with God Nodes, Communities, and Architecture Visualizations

2026-06-24 09:36 UTC

This tutorial demonstrates how to build a fully offline Graphify workflow that transforms a realistic multi-module Python application into a knowledge graph. It covers installing Graphify and graph libraries, generating a sample app, extracting the graph locally using tree-sitter without any API key or LLM backend, analyzing the codebase with NetworkX (file types, relationships, centrality, community detection, shortest paths), and creating both static and interactive visualizations to understand how modules, classes, functions, and database objects connect.

Completely offline knowledge graph generation from Python codebases.
Uses NetworkX for centrality analysis, community detection, and path tracing.

Nous Research Adds /learn to Hermes Agent’s Skills System, Capturing Workflows as Slash Commands Without Hand-Writing SKILL.md

2026-06-24 09:21 UTC

Nous Research has introduced /learn, a new command in the Hermes Agent Skills System that automatically generates reusable skills from various sources. The command uses the agent's existing tools to source material and writes a standards-compliant SKILL.md file. Skills are loaded progressively to keep token usage low, and the system supports multiple creation methods including manual writing, auto-saving, and Hub installation.

/learn generates SKILL.md from directories, URLs, conversations, or notes without manual writing
The command leverages existing agent tools (read_file, search_files, web_extract) and requires no separate ingestion engine

16 Best Generative AI Coding Tools in 2026 Compared: Features, and Best Fit

2026-06-24 08:12 UTC

Generative AI has reshaped software development from line-by-line autocomplete to full application generation, multi-agent pipelines, and natural-language codebase interfaces. This article compares 16 top AI coding tools in 2026, including Atoms, GitHub Copilot, Tabnine, and more, highlighting the trend from single-function tools to consolidated platforms like Atoms. The recommendation is to match the tool to the task: agent platforms for idea-to-product, assistants for daily coding, and analysis tools for code quality.

Generative AI coding tools have evolved from autocomplete to full-stack app generation and multi-agent pipelines
The 2026 trend is consolidation into all-in-one platforms like Atoms

DFlash Speculative Decoding Drafts Whole Token Blocks in Parallel for Up to 15x Higher Throughput on NVIDIA Blackwell

2026-06-24 07:21 UTC

UC San Diego's DFlash replaces autoregressive drafting with a lightweight block diffusion model for speculative decoding. It drafts whole token blocks in a single forward pass and conditions on target hidden features through KV injection. The paper reports up to 6.08x lossless speedup on Qwen3-8B, while NVIDIA reports up to 15x throughput on Blackwell at fixed interactivity. DFlash ships 20 checkpoints and supports SGLang, vLLM, and TensorRT-LLM.

DFlash drafts entire token blocks in a single forward pass, not one token at a time.
It injects target hidden features into every draft layer's KV cache, scaling acceptance length with depth.

Mistral OCR 4 Brings Citation-Ready Structured Output to RAG, Agentic, and Enterprise Search Pipelines

2026-06-23 23:43 UTC

Mistral AI released OCR 4 on June 23, 2026, moving from clean text extraction to structured document output. Each block returns a bounding box, a typed classification, and per-page and per-word confidence scores. The model supports 170 languages, runs in a single self-hosted container, and feeds citation-ready inputs into RAG, agentic, and enterprise search pipelines through one API endpoint.

OCR 4 returns bounding boxes, typed-block labels, and per-word confidence scores, not just text.
Supports 170 languages across 10 groups, with gains on rare and low-resource languages.

How to Use NVIDIA Canary-1B-v2 for ASR, Translation, and Automatic SRT Subtitle Export in Python

2026-06-23 18:31 UTC

This tutorial builds a multilingual ASR and speech translation pipeline using NVIDIA Canary-1B-v2, covering setup, transcription, translation, timestamp extraction, SRT export, long-form transcription, batch processing, and benchmarking.

Set up NeMo and audio dependencies on a GPU-enabled runtime
Perform English ASR and translate to French, German, Spanish, and Italian

Prime Intellect Releases prime-rl 0.6.0 to Train Trillion-Parameter MoE Models on Agentic RL Workloads

2026-06-23 07:20 UTC

Prime Intellect has released prime-rl 0.6.0, an open framework for asynchronous reinforcement learning on trillion-parameter Mixture-of-Experts models. It trained GLM-5 on SWE tasks at up to 131k sequence length, with sub-5-minute step times and 256 rollouts, on 28 H200 nodes. This breakdown covers the inference and training optimizations behind those numbers — FP8 inference, Wide Expert Parallelism, prefill/decode disaggregation, router replay, and 3-D parallelism (FSDP, EP, CP).

prime-rl 0.6.0 enables asynchronous RL on trillion-parameter MoE models for long-horizon agentic tasks.
GLM-5 trained on SWE tasks at 131k sequence length, sub-5-minute steps, 28 H200 nodes.

GLM-5.2 OpenAI-Compatible API: A Hands-On Guide to Reasoning Effort, Function Calling, and Long-Context Retrieval

2026-06-23 06:35 UTC

This tutorial provides a practical walkthrough for using GLM-5.2 through its OpenAI-compatible API, covering key features such as reasoning-effort control, streaming, function calling, tool-using agents, structured JSON output, long-context retrieval, and cost estimation.

Set up the GLM-5.2 API with multiple providers and a reusable chat wrapper.
Test reasoning-effort control (off, high, max) and observe latency and token differences.

xAI Launches /goal in Grok Build, Adding Long-Running Autonomous Execution With Built-In Verification for Multi-Step Coding Tasks

2026-06-22 20:34 UTC

xAI introduced /goal in Grok Build, a mode for long-running, autonomous task execution. You hand off one objective, and the agent plans an approach, executes a progress checklist, and verifies the result until the goal completes.

/goal runs long, autonomous tasks inside Grok Build’s terminal agent.
It plans an approach, builds a checklist, executes, and verifies until complete.

Sakana AI Launches Sakana Fugu: An Orchestration Model That Routes Tasks Across a Swappable Pool of Frontier LLMs

2026-06-22 18:42 UTC

Sakana AI released Sakana Fugu, a multi-agent orchestration system that routes tasks across a swappable pool of LLMs behind a single API endpoint. Fugu and Fugu Ultra lead coding, reasoning, and agentic benchmarks. The system aims to reduce single-vendor dependency and coordinates expert models internally for complex tasks.

Fugu is a language model that calls other LLMs in an agent pool, dynamically selecting, delegating, and synthesizing results. It supports recursive self-calls.
Two variants: Fugu (low-latency, compliance-friendly) and Fugu Ultra (fixed pool, optimized for hard problems).

MoonMath AI Open-Sources a HIP Attention Kernel for AMD MI300X That Beats AITER v3 on Every Shape and Rounding Mode

2026-06-22 07:13 UTC

MoonMath AI team released a bf16 forward attention kernel for AMD MI300X GPU, written in HIP and open-sourced under MIT. Using one-instruction asm wrappers and an eight-wave pipeline, it outperforms AMD's AITER v3 on all tested shapes and rounding modes, with geomean speedups of 1.08× to 1.18×. The speedup largely comes from memory placement (K in LDS, V in L1, Q in registers). A real-world SGLang PR integrating the kernel accelerated Wan2.1 video diffusion by 1.23× end-to-end with no quality regression.

MoonMath AI open-sourced a bf16 forward attention kernel for AMD MI300X, written in HIP (MIT license).
Beats AMD's AITER v3 on every shape and rounding mode — geomean 1.18×/1.15×/1.08×, up to 1.26×.

The 7 Types of Agent Memory: A Technical Guide for AI Engineers

2026-06-21 23:12 UTC

LLMs are stateless by default. Agent memory fixes that. This guide breaks down all 7 types — working, semantic, episodic, procedural, retrieval, parametric, and prospective — covering what each stores, where it lives, and when to build it. Includes a comparison table and working Python code.

Agent memory is infrastructure that turns a stateless model into a system retaining context, learning from experience, and acting over time.
The seven memory types vary by form (parametric vs non-parametric) and timescale (short-term vs long-term), each addressing a specific storage need.

Crawlee for Python: Build a Web Crawling Pipeline with Robots Handling, Link Graphs, and RAG Chunk Export

2026-06-21 06:52 UTC

This tutorial demonstrates how to build a complete web crawling workflow using Crawlee for Python, from setup to AI-ready output. It covers local demo website generation, crawling with BeautifulSoupCrawler, ParselCrawler, and PlaywrightCrawler, extraction of titles, metadata, product fields, and JavaScript-rendered cards, full-page screenshots, data normalization, link graph construction, and export to JSON, CSV, and RAG-ready JSONL chunks.

HTTP-first strategy is used for lightweight efficiency; browser crawling reserved for JavaScript-rendered pages.
Each crawler extracts URL, title, page type, text summary, outgoing links, and page-specific metadata.

Cisco AI Introduces FAPO: Pipeline-Aware Prompt Optimization With Step-Level Failure Attribution and Claude Code Orchestration

2026-06-20 23:04 UTC

Cisco Foundation AI has open-sourced FAPO (Fully Automated Prompt Optimization), a Claude Code-driven system that autonomously optimizes multi-step LLM pipelines from baseline prompts to target accuracy. FAPO evaluates chains, attributes failures at the step level, proposes variants across prompt, parameter, and chain-structure levels, and validates each through an independent reviewer. In Cisco's evaluation, it beat GEPA on 15 of 18 model-benchmark comparisons.

FAPO is an open-source, Claude Code-driven system for fully automated prompt optimization of multi-step LLM pipelines.
It escalates through three optimization levels (prompt, parameter, structural) guided by step-level failure attribution.

Nous Research Updates Hermes Agent With a Blank Slate Mode That Pins Toolsets via platform_toolsets.cli and disabled_toolsets

2026-06-20 21:50 UTC

Nous Research introduces Blank Slate mode for Hermes Agent, starting with only provider, model, file operations, and terminal. All other tools are disabled and pinned via configuration, ensuring no silent re-enabling after updates. Users opt in manually as needed.

Blank Slate mode starts with only provider & model, File Operations, and Terminal enabled.
Web, browser, code execution, vision, memory, delegation, cron, skills, plugins, and MCP are disabled by default.

Yandex Open-Sources YaFF: A Zero-Copy Wire Format for Protobuf With Near-Struct Read Speed

2026-06-20 09:23 UTC

Yandex has open-sourced YaFF (Yet another Flat Format), a high-performance zero-copy serialization library for the Protobuf ecosystem. It keeps the .proto file as the single source of truth and only changes how data sits in memory. YaFF offers four layouts—Fixed, Flat, Sparse, and Dynamic—with the Flat layout achieving read speeds within 1.2× of a raw C++ struct on Yandex's benchmarks, about 3.8× faster than FlatBuffers and 22× faster than Protobuf. It is already deployed in Yandex's advertising recommendation system, delivering 10–20% CPU savings at production scale.

YaFF is an open-source zero-copy wire format for Protobuf from Yandex, licensed under Apache 2.0, currently in C++.
It provides four memory layouts: Fixed (frozen schema), Flat (dense hot data), Sparse (sparse schemas), and Dynamic (runtime selection).

How to Build a Forecasting Pipeline with TimeCopilot Using Foundation Models and Automated Anomaly Detection

2026-06-20 09:05 UTC

This tutorial demonstrates building an end-to-end forecasting pipeline with TimeCopilot, covering data preparation, model evaluation (statistical, foundation, and optional GPU-based models), rolling cross-validation, probabilistic forecasting, anomaly detection, and an optional LLM agent for interpretation.

TimeCopilot provides a unified interface to manage diverse forecasting models including statistical, Prophet, and Chronos.
Rolling cross-validation with multiple error metrics (MAE, RMSE, MAPE) evaluates model performance.

NVIDIA AI Introduce SpatialClaw: A Training-Free Agent That Treats Code as the Action Interface for Spatial Reasoning

2026-06-19 22:51 UTC

SpatialClaw is a training-free framework from NVIDIA Research that achieves 59.9% average accuracy across 20 spatial benchmarks by using code as the action interface, outperforming SpaceTools by 11.2 points.

SpatialClaw improves VLM spatial reasoning without retraining by using code as the action interface.
Achieves 59.9% average accuracy on 20 benchmarks, +11.2 over SpaceTools.

VibeThinker-3B: A 3B Dense Reasoning Model Built on Qwen2.5-Coder-3B With the Spectrum-to-Signal Post-Training Pipeline

2026-06-19 22:06 UTC

VibeThinker-3B is a compact 3B-parameter reasoning model that matches large models like DeepSeek V3.2 on math and code benchmarks, using an efficient post-training pipeline and test-time scaling.

VibeThinker-3B is a 3B dense model, MIT-licensed, built on Qwen2.5-Coder-3B for verifiable reasoning.
It scores 94.3 on AIME26, comparable to DeepSeek V3.2 (671B) and Kimi K2.5 (1T).

Liquid AI Introduces LFM2.5-Embedding-350M and LFM2.5-ColBERT-350M: Dense Bi-Encoder and Late-Interaction Models for Fast Multilingual Search Across 11 Languages

2026-06-19 10:29 UTC

Liquid AI released two new retrieval models: LFM2.5-Embedding-350M (dense bi-encoder) and LFM2.5-ColBERT-350M (late-interaction), adapted from LFM2.5-350M-Base with bidirectional attention. They support multilingual and cross-lingual search across 11 languages, are small enough for edge devices, and outperform larger models on NanoBEIR and MKQA-11 benchmarks.

Liquid AI releases two 350M-parameter retrieval models based on LFM2.5-350M-Base, converted to bidirectional encoders.
LFM2.5-Embedding-350M is a dense bi-encoder for fast search with small indexes; LFM2.5-ColBERT-350M uses token-level late interaction for higher accuracy.

Salesforce CodeGen Tutorial: Generate, Validate, and Rerank Python Functions With Unit Tests and Safety Checks

2026-06-19 02:44 UTC

We implement an end-to-end workflow for Salesforce CodeGen, loaded from Hugging Face. We move past basic inference by adding function extraction, syntax checking, static safety checks, and unit-test validation. We rerank best-of-N candidates, compose multi-turn program synthesis, and experiment with prompt styles. We finish by visualizing a mini benchmark and exporting the generated artifacts as reusable files.

Load Salesforce CodeGen model from Hugging Face and prepare environment
Extract, validate syntax, check safety, and run unit tests on generated functions

MarkTechPost