Chips AI News

Chips updates

Argocd-AI-Assistant

2026-07-12 23:00 UTC

An Argo CD UI extension that adds an AI-powered assistant tab, allowing users to query Kubernetes resources in natural language with context including manifest, events, and optional logs. Compatible with any OpenAI-compatible backend and requires Argo CD v2.13+.

Integrates as an Argo CD UI extension providing natural language querying of Kubernetes resources.
Enriches queries with live resource manifest, events, and optional container logs.

A SETI Home for AI-Assisted Research

2026-07-12 20:45 UTC

The article proposes crowdsourcing unused AI inference tokens for scientific research, drawing parallels to SETI@home. It highlights recent successes by small teams using AI to solve math problems and discusses the design challenges of such a platform.

SETI@home pooled idle home computer power for extraterrestrial signal analysis.
Today, AI users could donate unused token allowances to collective research.

AI customers are coming around to the idea that small is beautiful

2026-07-12 19:53 UTC

OpenAI and Anthropic build ever-larger models, but companies like Microsoft are turning to smaller, specialized models for cost and efficiency. Microsoft's MAI family is replacing OpenAI models in its products.

Microsoft has developed a family of small, specialized MAI models, gradually replacing OpenAI's general-purpose models.
Smaller models are more efficient and cost-effective for specific tasks, allowing multiple instances on a single accelerator.

W11 Copilot tells you what's slowing down your PC, while using 1GB RAM itself

2026-07-12 17:45 UTC

Microsoft is testing PC Insights, a new Copilot feature that analyzes system resource usage to help users identify performance bottlenecks. However, Copilot itself is a full web app with a private Edge instance, consuming up to 1GB RAM at idle, highlighting the irony. The feature is opt-in and requires user permission.

Copilot’s PC Insights can read CPU, RAM, storage, and other system info to answer questions.
The feature is opt-in and does not scan in the background without permission.

Apple’s failed self-driving car program left a legacy of powerful AI chips

2026-07-12 16:27 UTC

Apple's self-driving car program never really got off the ground, but it may have been what made the company's chips the powerful AI performers they are. Early in the development of the self-driving platform, Apple realized that it would need powerful on-device AI processing. While the car processor was never finished, as Mark Gurman details in his latest Power On newsletter, it did lead to the development of the Neural Engine, the backbone of Apple's on-device AI processing. The Neural Engine made its debut with the iPhone X and the A11 Bionic. In those early days, it was primarily used for computer vision, powering FaceID, Animoji, and a … Read the full story at The Verge.

Apple's car project spurred creation of Neural Engine, now core to on-device AI.
Neural Engine debuted in iPhone X's A11 Bionic for FaceID and Animoji.

Apple files lawsuit, accuses OpenAI of stealing trade secrets

2026-07-12 14:52 UTC

Apple accuses OpenAI and two former Apple employees of stealing trade secrets to build hardware for ChatGPT, alleging a coordinated pattern of misconduct. OpenAI denies the claims, stating it has no interest in other companies' secrets.

Apple sues OpenAI for trade secret theft involving former employees Tang Tan and Chang Liu.
OpenAI denies allegations, says it is reviewing the filing.

Memory makers are slaves to the boom-bust rollercoaster

2026-07-12 11:09 UTC

AI data center demand has tripled memory makers' revenues, but lagging fab construction keeps prices high until at least 2028, risking a severe bust if AI demand falters.

SK Hynix, Micron revenues tripled; Samsung roughly doubled
HBM, DDR5 shortages driving up prices across electronics

The Sequence Radar #893: Last Week in AI: GPT-5.6, Grok 4.5, Muse Spark 1.1 and the Post-Chatbot Stack

2026-07-12 11:02 UTC

Frontier AI labs are shifting from chatbots to integrated systems where models act as runtimes, with near-monthly releases of powerful models and agents. This week's highlights include OpenAI's GPT-5.6 with programmatic tool calling, GPT-Live's full-duplex audio, ChatGPT Work for artifact creation, Meta's Muse Spark 1.1 with active context management, and Grok 4.5 for coding and knowledge work. Research updates reveal issues with coding benchmarks, selective unlearning, agent self-evolution, speculative decoding, and traffic routing. Notable industry news includes major funding rounds for Lovable, Prime Intellect, SambaNova, Norm Ai, and Ollama.

OpenAI releases GPT-5.6 (Sol, Terra, Luna) with programmatic tool calling and parallel subagents.
GPT-Live introduces full-duplex audio interaction, shifting from turn-based to continuous dialogue.

MSK – an AI agent that thinks like a CTO

2026-07-12 06:27 UTC

MSK is an AI CTO agent app for iPhone, offering architecture reviews, scaling advice, and startup strategy via chat or voice. Modeled on the experience of Moeid Saleem Khan (15+ years, 300+ projects, 50+ startups), it provides sharp, opinionated answers. Free to start with no account required; premium subscription available.

AI CTO agent providing on-demand technical and strategic advice.
Simulates real CTO experience; supports chat and voice interaction.

Big Tech piles on $350B in debt to fuel AI data center race

2026-07-12 04:49 UTC

The five largest U.S. tech companies—Alphabet, Amazon, Meta, Microsoft, and Oracle—have doubled their debt to $350 billion over five years to fund AI data centers. While investors have been supportive, Amazon's recent $25 billion bond issuance received a cool reception, signaling limits to market appetite. Oracle was downgraded by S&P due to rising AI spending, and Intel's debt woes serve as a cautionary tale. Hyperscalers plan to spend up to $725 billion this year, primarily on data centers and Nvidia chips.

Big Tech debt has doubled in five years, adding $350 billion
Amazon's $25 billion bond sale met with investor caution

TalkFitly – Practice high-EQ conversations with AI

2026-07-12 03:06 UTC

TalkFitly is an iPhone app that trains social intelligence through real-life scenario simulations and AI scoring. It helps users improve clarity, emotional stability, assertiveness, and empathy in conversations, with daily micro-sessions, a Quote Wall, and robust privacy.

Not a chat AI or quiz, but a social intelligence trainer for adults based on real conversations.
AI coach scores responses on clarity, emotional stability, assertiveness, and empathy, with actionable feedback.

What happens between entering the prompt and seeing the first word appear

2026-07-12 00:28 UTC

An exploration of the inference process in large language models, covering autoregressive generation, prefill and decode phases, the KV cache, and decoding strategies, explaining the mechanics behind token-by-token output.

Inference in LLMs is autoregressive: tokens are generated one at a time, each step depending on previous outputs.
The process splits into a fast prefill phase (processing the entire prompt in parallel) and a slower decode phase (generating tokens sequentially).

A Coding Guide to NVIDIA’s Tile-Based GPU Programming: From cuTile and Triton Kernels to Flash Attention

2026-07-12 00:01 UTC

This tutorial explores NVIDIA's tile-based GPU programming with TileGym, building a Colab workflow that runs across different hardware. We probe the CUDA environment, try the real cuTile backend, and fall back to Triton when standard Colab GPUs lack the cuTile stack. We learn the core tile idea: operate on whole data tiles instead of single threads, then load, compute, and store them. We implement vector addition, fused GELU, row-wise softmax, tiled matrix multiplication, and flash attention, checking each against PyTorch.

Introduces NVIDIA's tile programming model, operating on data blocks rather than individual threads.
Provides a runnable Colab script that works with both cuTile and Triton backends.

Fixed three bugs that made Qwen3.5-122B a daily driver on Mac Studio

2026-07-11 22:54 UTC

After fixing three bugs related to prefix caching, the author achieved sub-second prefill times for long-context conversations with Qwen3.5-122B on a Mac Studio, turning a multi-minute wait into a seamless experience. The bugs included a timestamp in system prompt, missing reply saves on interrupt, and junk checkpoint writes.

Qwen3.5-122B on Mac Studio had severe prefill latency due to hybrid attention's cache behavior.
Three bugs: timestamp in system prompt caused cache miss; interrupted replies not saved; junk checkpoints evicted good ones.

Show HN: AgentTransfer – open-source file transfer for AI agents (one Go binary)

2026-07-11 22:52 UTC

AgentTransfer is an open-source file transfer tool designed for AI agents, allowing them to send files up to 5GB, discover peers, and coordinate in spaces. It uses email as a control plane and HTTPS for data transfer, with no human required for agent onboarding. The tool is a single Go binary that can be self-hosted or used via a hosted instance.

AgentTransfer enables AI agents to transfer files up to 5GB with just a name and API key.
Features include self-onboarding, content-addressed storage, hash verification, and signed receipts.

Mesh LLM: distributed AI computing on iroh

2026-07-11 22:38 UTC

Mesh LLM pools GPUs and memory across machines using iroh networking, exposing an OpenAI-compatible API. It allows running models locally, routing to peers, or splitting large models across multiple machines, offering control and cost savings without central servers.

Mesh LLM pools distributed GPU resources into a single OpenAI-compatible API
Supports local execution, peer routing, and pipeline splitting for large models

I built TradingSpy: local, privacy-first AI trading assistant(First Open Source)

2026-07-11 20:45 UTC

TradingSpy is an open-source local AI trading research workstation that integrates market heatmaps, news catalysts, strategy generation, Backtrader backtesting, and transparent agent runs in one Docker app. It is privacy-first, with all data stored locally, no external accounts, and no cloud dependency. Supports multiple LLM providers and a broad range of financial data sources, suitable for traders and developers for strategy research, backtesting, and signal analysis.

Local-first architecture with all data stored locally, zero data privacy concerns.
Supports AI strategy generation, automated backtesting, and benchmark comparison with loop engineering.

Show HN: Don't let your engineering brain rot in the age of AI

2026-07-11 19:57 UTC

30 Seconds of Knowledge is a browser extension that replaces your new tab with a real code snippet, helping developers stay sharp by reading one snippet per tab — in 30 seconds or less.

Replaces new tab with a random code snippet from 14 libraries.
Over 1500 snippets covering languages, frameworks, and interview questions.

Reverse centaurs are the answer to the AI paradox

2026-07-11 17:23 UTC

Cory Doctorow explores the paradox of AI: why some users love it while others hate it. He introduces the concepts of 'centaurs' (humans assisted by AI) and 'reverse centaurs' (humans used as AI's accountability sink). He argues AI is a bubble that will burst, but productive residue like open-source models will remain. The key is who controls the AI, not the technology itself.

AI can be empowering when humans choose how to use it (centaurs) or oppressive when bosses impose it (reverse centaurs).
The Hearst summer reading guide fiasco exemplifies a reverse centaur scenario where a freelance writer was blamed for AI mistakes.

Litert.js, Google's High Performance Web AI Inference

2026-07-11 14:32 UTC

Google announces LiteRT.js, a JavaScript binding of LiteRT that brings high-performance AI inference to web browsers with hardware acceleration via WebAssembly, outperforming existing solutions by up to 3x.

LiteRT.js enables running .tflite models directly in the browser with native performance through WebAssembly.
Supports CPU (XNNPACK), GPU (WebGPU), and NPU (WebNN) acceleration for maximum efficiency.

openpilot 0.11.1

2026-07-11 12:17 UTC

openpilot 0.11.1 improves driver monitoring with a VLM-based phone detection model, raises thermal thresholds to reduce blocks, adds lateral maneuver reports, and expands car support. The new DM model reduces false positives and better detects active phone use. Thermal changes cut blocked devices by ~90%. New lateral reports aid steering tuning. Bug fixes and new car ports for Acura MDX and Rivian are included.

New DM model uses VLM for phone detection, reducing false positives
Thermal threshold raised to 85°C, cutting blocked devices by ~90%

In 24 hours, OpenAI, SpaceXAI, and Meta turned AI into a race to the bottom on price

2026-07-11 10:30 UTC

Over a 24-hour period, OpenAI, SpaceXAI, and Meta each released new AI models with a common theme: price cuts. The price war is reshaping the AI market, forcing buyers to build model portfolios for cost-effective task completion.

OpenAI launched GPT-5.6, Meta debuted its first paid model, and SpaceXAI released Grok 4.5, all competing on price.
The race to the bottom lowers per-token costs but may increase total task costs due to higher token consumption.

Java local AI client and MCP orchestrator without the Python dependency hell

2026-07-11 06:30 UTC

Ypipe is a free, Java-based local AI client and MCP orchestrator that eliminates Python dependencies. It offers private agentic chat, local model management, one-click integrations, and seamless docking with legacy systems like SAP and Oracle while ensuring data sovereignty. Features zero-setup portability, cross-platform support, headless operation, and an intelligent model switchboard for efficient task handling.

Java-based, no Python or external inference engines required, out-of-the-box usability
Supports private LLM chat, system automation, and zero data leaks

Managing a small local AI budget (Mac M2 16gb)

2026-07-11 04:17 UTC

The article describes millfolio's hybrid tag system for efficient local AI inference: deterministic string and reference tags cover most transactions, while on-device AI tags handle the fuzzy tail. Tags are computed once at index time and stored, avoiding re-inference at query time. Backfilling uses batching, deduplication, and a priority scheduler to avoid overloading the laptop. Performance data shows ~650ms per distinct description, with 8.5 rows/s effective speed. The system includes a preview mechanism for users to verify tags before saving.

millfolio uses three tag types: string, reference, and AI tags, with AI only for uncertain cases.
Tags are computed once and stored, enabling fast queries without re-running AI.

GDP.pdf: Can Frontier Models Master the Documents That Run the World?

2026-07-11 02:26 UTC

The GDP.pdf benchmark evaluates AI models on real-world PDF tasks across ten domains. All frontier models scored below 30%, with GPT-5.5 leading at 25%. The article highlights the critical importance of PDF mastery for AI agents and the serious consequences of failure in high-stakes fields like finance, law, and healthcare.

GDP.pdf benchmark consists of 100 real-world prompts and PDFs across ten professional domains.
Every frontier model scored under 30%, with GPT-5.5 achieving the highest score of 25%.

AI Can't Recreate the Thrust Game (But It Can Help You Understand It)

2026-07-10 22:04 UTC

The author attempted to recreate the classic 1986 game Thrust using Claude AI, but the result was poor. However, using AI to analyze the original 6502 assembly code led to deep insights into the game's physics, sound, and graphics, enabling a faithful TypeScript recreation.

AI failed to capture Thrust's feel due to precise timing and physics nuances.
AI excelled at explaining original assembly code, revealing game mechanics.

Kyutai Releases MuScriptor: An Open-Weight Decoder-Only Transformer for Multi-Instrument Music Transcription to MIDI

2026-07-10 20:21 UTC

MuScriptor is an open-weight decoder-only Transformer from Kyutai and Mirelo that transcribes multi-instrument audio to MIDI. It uses a three-stage training pipeline: pre-training on 1.45M synthetic MIDIs, fine-tuning on 170k real recordings (11k+ hours), and reinforcement learning on 300 manually verified tracks. On the DTest benchmark, it achieves a Multi F1 of 48.2%, significantly outperforming the YourMT3+ baseline's 21.9%. Available in three sizes (103M, 307M, 1.4B parameters), with MIT-licensed inference code and CC BY-NC 4.0 weights.

MuScriptor is an open-weight decoder-only Transformer for multi-instrument music transcription to MIDI, developed by Kyutai and Mirelo.
Three-stage training: pre-training on synthetic data, fine-tuning on 170k real recordings, and RL post-training on 300 manually verified tracks.

How to Build a T4-Friendly Autonomous Data Science Agent with DeepAnalyze-8B, Sandboxed Code Execution, and Iterative Analysis

2026-07-10 19:24 UTC

We build an autonomous data science agent around DeepAnalyze-8B and run it end to end. We prepare a stable Colab runtime, install the machine-learning dependencies, and load the tokenizer and model in 4-bit mode to fit limited GPU memory. We add a sandboxed execution environment that lets the model generate Python, run it safely, observe results, and continue in an agentic loop. We then hand the agent a multi-file e-commerce workspace and let it clean, join, analyze, visualize, and summarize the data as an analyst-grade report.

Set up Colab runtime and install dependencies, load DeepAnalyze-8B in 4-bit mode for T4 compatibility.
Build a sandboxed code executor to run model-generated Python code safely and capture outputs.

AI Gets a Cerebellum

2026-07-10 19:16 UTC

Northwestern researchers developed a cerebellum-inspired memtransistor that consumes very little energy and detects novelties almost instantly. In tests, it identified abnormal heart rhythms within one-fifth of a heartbeat with over 98% accuracy, using 10,000 times fewer computer operations than conventional AI.

New memtransistor mimics cerebellum to ignore routine inputs and react only to unexpected events
Detected arrhythmias in milliseconds with 98% accuracy, using minimal energy

The 2025–2026 Evolution of Generative Spatial AI

2026-07-10 17:47 UTC

A technical retrospective covering the rapid maturation of generative spatial AI from May 2025 to June 2026, highlighting key milestones from text-to-mesh and cinematic video to interactive world models, camera-controllable generation, local production pipelines, and AI-native CAD.

Early 2025 saw production-quality 3D assets and video foundations with tools like Meta AssetGen 2.0 and Google Veo 2.
August 2025 marked a shift with Google DeepMind's Genie 3, enabling interactive world generation.

SK Hynix raises $26.5B in the biggest foreign IPO in US history, is urged to build new US fabs

2026-07-10 17:17 UTC

The AI chip boom just produced its biggest Wall Street moment yet. SK Hynix, a South Korean memory chip giant, said Friday it has raised $26.5 billion in its US market debut, the largest-ever US debut by a non-American company, topping Alibaba’s $25 billion IPO in 2014. Now SK Hynix and Samsung are being asked to build US factories.

SK Hynix raises $26.5 billion in largest foreign IPO in US history.
Offers 177.9 million ADRs at $149 each.

Quoting Nilay Patel: The Privacy Trade-off of AR Glasses

2026-07-10 17:05 UTC

Nilay Patel argues on The Vergecast that making augmented reality glasses requires a camera next to the eyes continuously recording, with no chip powerful and efficient enough to fit in the stem, forcing cloud data processing or a bulky device like Vision Pro, leading to inevitable privacy invasion that may be too costly for society.

AR glasses need a continuously recording camera near the eyes.
No chip exists that is both powerful and power-efficient enough for real-time processing in the glasses frame.

This Week in AI: Chips, Checks, and Changing Jobs

2026-07-10 16:04 UTC

This week, Christina Stathopoulos covers AI hardware breakthroughs (IBM sub-1nm chips, OpenAI/Broadcom Jalapeño, NVIDIA liquid cooling), expanding government oversight (Anthropic model access restored, OpenAI equity stake proposal), workforce evolution (forward-deployed engineers, SAP external hiring vs IKEA retraining), and a hopeful story about AI-powered earthquake alerts.

IBM unveils 0.7nm chip technology with 50% performance boost and 70% lower power consumption.
OpenAI and Broadcom launch Jalapeño, a chip designed specifically for LLM inference.

Zero-copy TLS ingress with kTLS and splice(2) for sandboxes

2026-07-10 15:46 UTC

Tensorlake rebuilt sandbox ingress, moving from L7 reverse proxy to L4 byte forwarding using kernel TLS (kTLS) and splice(2) for zero-copy data paths, achieving 2.2x throughput and halving CPU cost. The new architecture decouples the data plane from the control plane, uses kTLS for in-kernel crypto, and derives adaptive timeouts from byte flow. Performance tests show single-connection throughput increases from 1.12 GB/s to 2.50 GB/s, with proxy CPU per GB dropping from 0.90 to 0.49 CPU-seconds.

Tensorlake replaced L7 reverse proxy with L4 byte forwarding, eliminating HTTP parsing and userspace buffering.
Uses kernel TLS (kTLS) and splice(2) for zero-copy, with encryption/decryption done in the kernel.

Fine-tune NVIDIA Nemotron 3 models with Amazon SageMaker AI serverless model customization

2026-07-10 15:35 UTC

This post explores the unique Nemotron 3 architecture, available fine-tuning techniques (SFT, RLVR, RLAIF), and provides a step-by-step guide to getting started with serverless customization using SageMaker Studio.

NVIDIA Nemotron 3 models feature a hybrid Mamba-Transformer Mixture-of-Experts architecture supporting up to 1M-token contexts.
Amazon SageMaker AI now offers serverless model customization for Nemotron 3 Nano and Super, requiring no infrastructure management.

Real-time dental image verification with Amazon SageMaker AI at Henry Schein One

2026-07-10 15:33 UTC

Henry Schein One developed Image Verify, an AI-powered system on Amazon SageMaker AI that evaluates dental X-ray quality in real time, reducing insurance claim denials. The system scaled from concept to over 10,000 locations in months, processing millions of X-rays with sub-2-second latency.

Up to 20% of dental insurance claims are initially denied due to poor image quality.
Image Verify provides real-time quality scores (1-5) at the point of capture, enabling immediate retakes.

Deploying quantized models on Amazon SageMaker AI with Unsloth

2026-07-10 15:26 UTC

Learn four deployment patterns for deploying Unsloth-quantized models on AWS: using EC2 for direct access, SageMaker AI for managed serving, and EKS/ECS for containerized inference. Understand Unsloth's dynamic quantization, model formats (GGUF, safetensors), and operational best practices.

Unsloth dynamic quantization reduces model size by up to 86% with minimal accuracy loss by allocating higher precision to sensitive layers.
Four deployment patterns are covered: EC2 for testing, SageMaker AI for managed endpoints, and EKS/ECS for containerized environments.

Disaggregated prefill and decode for LLM inference on SageMaker HyperPod

2026-07-10 15:20 UTC

This post demonstrates how to implement disaggregated prefill and decode (DPD) with vLLM on Amazon SageMaker HyperPod using the HyperPod Inference Operator. DPD separates prefill and decode phases onto distinct GPU pools, eliminating interference from long prompts and improving latency. It covers architecture, use cases, and step-by-step deployment instructions.

DPD isolates prefill and decode on separate GPU pools connected via EFA RDMA.
It reduces tail latency and prevents long prompts from blocking ongoing decode requests.

Prompt: AI's Next Challenge Is Making Better Use of Compute

2026-07-10 14:07 UTC

After years spent racing to secure AI chips and computing power, enterprise leaders are discovering that getting access to infrastructure may be easier than using it effectively.

Enterprise leaders find access to compute easier than effective use
Optimizing compute utilization is AI's next frontier

The 'learn to code' era is over - and employers are on the hook for reskilling now

2026-07-10 12:58 UTC

AI's ushered in a new era of reskilling. Here's what the industry can learn from the last decade's drive to put people in tech jobs.

Code Louisville, a free tech training program, is closing due to a decline in entry-level job placements.
AI's unpredictable impact on jobs forces companies to invest in reskilling their workforces.

Local Video Summarization Pipeline: Processing Frames with SmolVLM2-2.2B

2026-07-10 12:00 UTC

SmolVLM2-2.2B sits at a genuinely useful point on the capability-size trade-off curve; small enough to run on a single consumer GPU, capable enough to produce video summaries that are actually useful for real workflows.

SmolVLM2-2.2B uses pixel shuffle to compress each image to 81 tokens, enabling multi-frame processing on consumer GPUs.
The pipeline supports uniform and keyframe sampling for meetings, lectures, and surveillance footage.

How to shrink the token budget without shrinking the team

2026-07-10 09:34 UTC

Jensen Huang proposes a test for engineers: their annual AI token consumption should be at least half their salary. Nvidia aims for $2 billion yearly token bill. Many firms cut headcount to fund AI, but Gartner finds 80% saw no ROI improvement. Optimization techniques like prompt caching, model routing, and RAG can reduce token costs significantly. Retaining and training junior engineers is crucial for long-term success.

Jensen Huang suggests engineers' AI token usage should be at least 50% of their salary.
Companies are cutting jobs to afford AI token costs, often with poor returns.

I built an app that solves math problems from a photo

2026-07-10 08:50 UTC

MathNut AI is an iPhone math solver that lets you snap a photo of any problem and get step-by-step AI explanations. Supports arithmetic, algebra, geometry, and more.

Snap a photo of printed or handwritten problems
Get step-by-step solutions and AI chat tutoring

Ramblings on technological pursuits of AI systems

2026-07-10 08:33 UTC

The author reflects on the rapid advancement of AI technology by comparing childhood computers with today's B300 GPU system. He discusses the controversies surrounding LLMs, the differences between symbolic and statistical AI, the nature of intelligence, and the contrast between dreams and reality. The article also includes a discussion with a friend about determinism and memory.

Technological leap from childhood PC to B300 GPU system
Reflections on LLMs and the AI industry: hype or real change?

Can AI Answer the $3T Question?

2026-07-10 06:22 UTC

Three years ago, Sequoia partner David Cahn was one of the first to quantify the financial implications of Silicon Valley's massive AI infrastructure spending. Starting from Nvidia's $50B GPU revenue, he calculated that $200B in revenue would be needed to pay back the upfront investment.

David Cahn first calculated the ROI requirements for AI infrastructure three years ago
He derived a $200B revenue threshold from Nvidia's $50B annual GPU revenue

OpenAI Launches GPT-5.6 Sol/Terra/Luna, Codex Becomes ChatGPT Superapp

2026-07-10 06:19 UTC

OpenAI released three new GPT-5.6 models—Sol, Terra, Luna—alongside major app updates, including ChatGPT Work and Codex integration. The models show strong performance on benchmarks at lower costs, with Sol being the most capable. Independent evals confirm near-frontier results, especially in coding and agentic tasks.

OpenAI launched GPT-5.6 in three sizes: Sol (flagship), Terra (mid-range), Luna (budget).
New ultra reasoning effort coordinates multiple agents for complex tasks.

South Korea chip maker SK hynix rides AI boom raising $26.5bn in huge US listing

2026-07-10 05:06 UTC

SK hynix, a supplier of advanced memory chips, has seen profits skyrocket thanks to the global race to build AI datacentres. The South Korean chip maker set pricing for its mega US listing on Friday, aiming to raise $26.5bn.

SK hynix set pricing for its US listing on Friday, targeting $26.5bn.
The company is a major beneficiary of the AI boom, with soaring profits.

Meet LingBot-World-Infinity: An Open Causal World Model With An Agentic Harness

2026-07-10 04:38 UTC

Robbyant, Ant Group's embodied-intelligence unit, has released LingBot-World-Infinity (LingBot-World 2.0), a 14B causal video generation model that acts as an interactive world simulator. Its core innovations—Mixture of Bidirectional and Autoregressive (MoBA) attention and distribution matching distillation—tackle long-horizon drift. A Director-Pilot agentic harness enables infinite video generation. The paper demonstrates a 60-minute session, but the open-source release includes only one checkpoint and a 480P script, lacking deployment code and quantitative benchmarks, under a non-commercial license.

LingBot-World-Infinity is a 14B-parameter causal video generation model by Robbyant (Ant Group) for interactive world simulation.
MoBA attention and distribution matching distillation address long-horizon drift in world models.

TensorSharp: Open-Source Local LLM Inference Engine

2026-07-10 02:42 UTC

TensorSharp is a native .NET LLM inference engine for GGUF models, offering a CLI, browser chat server, and Ollama/OpenAI-compatible APIs. It emphasizes privacy, zero per-token fees, and runs on various hardware backends. The article includes a quick start guide and benchmarks against llama.cpp.

Built with C# and .NET 10 for local LLM inference with GGUF models and GPU acceleration.
Provides CLI, Web UI chat server, and HTTP APIs compatible with Ollama and OpenAI.

UST is bringing Claude to physical AI

2026-07-10 00:45 UTC

Anthropic partners with UST to integrate Claude into engineering platforms for physical AI tasks across semiconductor, automotive, and other industries, with plans to train 20,000 employees.

Claude powers UST's iDEC platform, cutting chip validation cycle times by 50-70%.
Claude is also deployed in healthcare, telecom, and banking automation systems.

Chips

Related tags