Research AI News

Research updates

Show HN: Kote – Capture and reuse engineering context from AI chats and Git

2026-07-12 18:56 UTC

Kote is an open-source tool that automatically captures developer conversations with AI assistants, Git commits, and development context, building a searchable knowledge base to help developers recall past technical decisions and solutions. It supports VS Code extension, GitHub integration, CLI, browser extension, WhatsApp/Telegram messaging, and self-hosted deployment.

Kote passively captures AI sessions, Git activity, and other context, organizing them into a knowledge base.
VS Code CodeLens shows file-related notes with AI summaries and timelines.

Against Usefulness

2026-07-12 17:47 UTC

This essay explores the critical role of 'useless' research in enabling future innovations. Using Folk Computer as a case study, the author traces a lineage from Xerox PARC to Dynamicland, and argues for funding paradigm-level work before it becomes useful.

Folk Computer is an open-source physical computing system that turns the room into a computer.
The system's lineage includes Alan Kay, Bret Victor, CDG, and Dynamicland.

Soulless – List of AI Artists Hiding on Spotify

2026-07-12 17:46 UTC

Soulless is a community-driven project that exposes AI-generated artists on Spotify. It lists 232 detected AI artists with monthly listeners and estimated earnings. It also provides an open-source AI music detector and a curated landscape of AI music resources.

Soulless identifies 232 AI-generated artists on Spotify, showing their monthly listeners and earnings.
The detection tool uses an ensemble of SONICS spectrogram models and a vocoder fakeprint scanner.

GPT-5.6, Fable 5, and Grok 4.5 rebuild Basecamp from the same spec

2026-07-12 17:02 UTC

The author evaluated GPT-5.6 Sol, Fable 5, Grok 4.5, and other AI models on a benchmark called Basecamp Bench, testing their ability to build a frontend and backend from the same specification. Fable 5 won both tracks, while Grok 4.5 offered the best speed-cost tradeoff. Results show significant differences in polish and completeness, especially in the final 10% of work.

Fable 5 scored highest on both frontend and backend, closely matching the real Basecamp implementation.
Grok 4.5 completed the build in 37 minutes at a cost of $9.30, offering the best speed and cost tradeoff.

OpenAI's AI Beating Every Human at AtCoder

2026-07-12 16:54 UTC

OpenAI's AI agent solved all five problems in the AtCoder Algorithm Division for 8,300 points; the top human scored 4,300. No human solved problems C or E. In the Heuristic Division, AI scored more than seven times the best human result. The 600,000-yen 'Humanity Prevails Award' went unclaimed. The system was described as comparable to GPT-5.6.

OpenAI's AI solved all five problems, scoring 8,300 vs top human 4,300
No human solved the hardest problems C and E

AI and the Future of Writing-roundtable of authors discuss ramifications for art

2026-07-12 16:50 UTC

In a roundtable discussion, writers and cultural critics explore the profound implications of AI on language, creativity, and society. They note that AI both sharpens and dulls linguistic abilities, and may clarify the boundary between machine and soul. Despite anxieties, AI offers opportunities in research, accessibility, and diagnostics.

AI is seen as a decentering technology, with progress likened to moving from the Wright brothers to a fleet of 747s.
Writers find AI both enhancing and eroding their language skills, requiring a redoubled commitment to reading and writing.

Using AI to Let History Speak About Bank Runs

2026-07-12 16:40 UTC

Researchers have compiled a database of over 3,000 bank runs from 1863-1934, revealing that most runs did not lead to failure, and analyzing geographic and temporal patterns.

Majority of bank runs do not result in failure.
Bank runs spiked during major crises like 1873, 1893, 1907, and the Great Depression.

Show HN: Agent Legibility Analyzer see if AI shopping agents can read your store

2026-07-12 14:30 UTC

AgentMint.net is a research publication that helps merchants understand and optimize for how AI shopping agents select products. Every claim is sourced, and it offers tools like the Agentic Shopping Readiness check and Agent Selection signals database.

AgentMint.net analyzes why AI shopping agents choose certain stores and products.
All factual claims are labeled with evidence sources.

India's TCS plans up to 8,900 AI deployment engineers, seeks AI acquisitions

2026-07-12 12:48 UTC

Tata Consultancy Services plans to build a team of up to 8,900 forward-deployed engineers and is hunting for AI acquisitions, betting artificial intelligence will create new business rather than undermine outsourcing. CEO K Krithivasan dismisses concerns that AI will disrupt the outsourcing model. AI revenue growth slowed to 13% in the first quarter from 28% in the previous quarter. TCS spends about $1 billion annually on talent development and making AI accessible.

TCS plans to have 1% to 1.5% of its workforce as forward-deployed engineers to accelerate AI adoption
The company is evaluating acquisitions in AI, data security, and cybersecurity

SlimeBallBench · AI models play slime soccer

2026-07-12 12:36 UTC

SlimeBallBench is a new benchmark that tests AI models in the game of slime soccer, evaluating their decision-making and strategic capabilities.

SlimeBallBench tests AI performance in slime soccer
The benchmark evaluates AI decision-making and strategy

The fight against AI data centers is just beginning

2026-07-12 12:00 UTC

From a small protest in Ireland to nationwide opposition in the US, the battle against AI data centers is escalating. This article traces the origins, current protests, political responses, and what lies ahead as communities push back against the environmental and economic impacts of massive data center buildouts.

Apple's 2015 Ireland data center plan was abandoned after years of local protests and legal battles.
In Q1 2026, at least 75 US projects were blocked or delayed, with 833 active opposition groups.

AI backlash hits university: laptops and phones banned for law students

2026-07-12 11:25 UTC

The University of Chicago bans electronic devices in first-year law classes starting fall to combat AI reliance, while integrating responsible AI education into the curriculum.

University of Chicago bans laptops, tablets, and phones in first-year law classrooms effective fall.
The ban aims to foster independent critical thinking without AI assistance.

Scientists' Side Hustle? Using AI and Quantum Computing to Generate New Peptides

2026-07-12 11:00 UTC

Researchers from the Technical University of Denmark combined a generative AI model with a quantum computer to design novel peptides that bind to specific proteins, potentially accelerating vaccine development and personalized immunotherapies, especially for understudied populations.

DTU team used hybrid AI-quantum system to generate novel peptides for protein binding.
Quantum integration improved peptide generation, especially with limited data.

25% long-form social media posts appear AI-generated

2026-07-12 10:58 UTC

A new study from AI detection platform Pangram reveals that 25% of long-form social media posts are fully AI-generated. LinkedIn leads with 41%, followed by X at 25%. The analysis covered over one million posts across platforms including Medium, Substack, and Reddit.

Pangram study finds 25% of long-form social posts are fully AI-written.
LinkedIn tops the list at 41% AI-generated long-form posts, X at 25%.

Chasing new skills, going back to basics and pushing for collective action: how software engineers are adapting to AI

2026-07-12 10:00 UTC

Software engineering, once a stable high-paying profession, is being disrupted by AI. Engineers are adapting by learning new skills, focusing on fundamentals, and organizing for protections. The industry faces layoffs, underemployment, and a shift from coding to reviewing AI-generated code.

AI is transforming software engineering, with 75% of code at Google now written by AI.
Engineers like Matt avoid AI to keep skills sharp, while others like George Dover upskill to stay relevant.

Political Neutrality Benchmark of Popular AI Models

2026-07-12 08:21 UTC

A new benchmark reveals that 97 out of 108 measured positions across 18 AI models from 12 labs land left of center. The findings show a consistent progressive lean, with exceptions on economics, foreign policy, and religion. xAI's Grok models are closest to center, while many models refuse to answer certain questions, affecting their scores.

97 of 108 positions left of center
Strongest progressive lean on environment (-0.82)

AI Found a Linux Root Bug That Was Missed for 15 Years

2026-07-12 05:56 UTC

Nebula Security, using its AI tool VEGA, discovered a 15-year-old privilege escalation vulnerability (CVE-2026-43499) in the Linux kernel that allows any logged-in user to gain root access. The bug has been present in essentially every major distribution since 2011 and was fixed in April, but patch rollout is uneven.

A use-after-free vulnerability in the Linux kernel, present in all major distributions since 2011, allows unprivileged users to gain root access.
Nebula Security discovered the flaw using its AI-driven tool VEGA and received a $92,337 payout from Google's kernelCTF program.

Dismissive Dan's Review of the Overplane AI Coding Harness

2026-07-12 01:02 UTC

Overplane is an open-source tool that converts Markdown specs into code using AI agents and SMT verification. Reviewer Dismissive Dan questions its necessity, noting many developers already have similar setups, but acknowledges its packaging and isolation design.

Overplane turns Markdown specs into code, uses Z3 solver for consistency checks.
The review is constructive but skeptical, as many developers already have similar workflows.

Mira Murati’s Thinking Machines Lab Makes The Technical Case For Human-Centered AI Built On Customizable Model Weights

2026-07-12 00:46 UTC

Thinking Machines Lab published "The Future Worth Building Is Human." The essay frames human participation, model ownership, and decentralized alignment as technical challenges. It ties them to interaction models and Tinker's LoRA fine-tuning, where teams train and keep their own model weights.

Thinking Machines Lab argues for distributed, customizable AI shaped by users.
Tacit, local knowledge requires AI to be distributed, not centrally frozen.

sqlite-utils 4.1

2026-07-11 23:50 UTC

sqlite-utils 4.1 is the first dot-release since 4.0, introducing several minor new features including a --code option for insert/upsert to generate rows from inline Python code, a --type option to override column types for CSV/TSV imports, drop-index commands, and the ability to read SQL queries from standard input. It also adds support for toggling SQLite STRICT mode via table.transform().

Insert/upsert now accept --code for inline Python row generation
New --type option allows overriding column types on table creation

Inferring multicellular interactions in tumors from standard pathology slides

2026-07-11 23:04 UTC

Stanford Medicine researchers have developed an artificial intelligence platform that can predict cancer cell neighborhoods from microscopic slides containing slices of human tumor tissue.

Stanford researchers developed CANVAS AI to infer cellular neighborhoods from H&E slides.
Analysis of over 18 million cells from 457 lung cancer patients revealed 10 distinct neighborhoods.

Banning AI in Law School: We've Seen This Before

2026-07-11 20:18 UTC

The University of Chicago Law School announced a new policy banning phones and laptops for first-year students, sparking debate about AI in education. This article recalls the history of banning portable computers at Harvard Law School 45 years ago, highlighting the cycle of technological fear. The author shares personal experiences, emphasizing how tools change work processes, and questions the rationality of current policies.

University of Chicago Law School bans phones and laptops for first-year students, causing controversy.
45 years ago, Harvard Law School banned portable computers for similar reasons.

The AI Disagreement Index: 8 models agreed on the "best tool" 0 of 16 times

2026-07-11 20:12 UTC

An open, rigorous, living measurement of how much AI engines disagree on which B2B tools to trust per category. In the recorded sample, across 16 categories, all eight models named the same single best tool zero times, with a mean pairwise agreement of 44% and Fleiss' kappa of 0.41. The index is updated monthly and provides raw data for reproducibility.

Across 16 B2B software categories, all eight AI models agreed on the single best tool zero times.
Mean pairwise agreement between engines is 44%, with Fleiss' kappa at 0.41 (moderate agreement).

I built a free tool to evaluate AI agent outputs (human labels and LLM judges)

2026-07-11 19:55 UTC

Verdict is an open-source, browser-based tool for evaluating AI agent outputs. It enables human labeling, grounded theory error analysis, and validation of LLM judges against human labels, all locally without data leaving your machine.

Verdict runs entirely in the browser, no backend or accounts needed.
Supports multiple trace formats and provides a clean chat timeline for review.

RAG Evaluation Frameworks Compared: RAGAS vs TruLens vs DeepEval

2026-07-11 18:16 UTC

This article compares three popular RAG evaluation frameworks: RAGAS, TruLens, and DeepEval. It explains why RAG needs dedicated evaluation, covers the three layers of evaluation (retrieval, generation, end-to-end), and details key retrieval metrics (Precision@K, Recall@K, MRR, NDCG). It then dives into RAGAS (LLM judge, no ground truth, synthetic test set generation) and TruLens (observability, RAG triad, dashboard), with brief mention of DeepEval, and provides guidance on choosing the right framework.

RAG systems require specialized evaluation because BLEU/ROUGE cannot capture retrieval and generation failures.
RAGAS uses an LLM judge for reference-free scoring and can auto-generate test sets from documents.

An educational lab of AI agent architectures

2026-07-11 15:33 UTC

An educational lab of AI agent architectures built on LangChain and local Ollama, offering multiple agent variants for chat, tool calling, RAG, hybrid, and agentic RAG modes.

Multiple AI agent architecture variants covering chat, tool calling, RAG, hybrid, and agentic RAG.
Built on LangChain and local Ollama server, with optional OpenRouter support.

Show HN: HoverSource – From pixel to source file in one keystroke

2026-07-11 15:24 UTC

HoverSource is a developer tool that lets you get the source file path and line number of UI elements by hovering and pressing Alt+C. It integrates with AI agents to reduce steps by 73.9% and token consumption by 94.5%. Works with React, Next.js, Vue, and more out of the box.

Hover and press Alt+C to instantly copy UI element source info
Integrates with AI agents, reducing steps by 73.9% and tokens by 94.5%

'Ghostcommit' hides prompt injection in images to fool AI agents, steal secrets

2026-07-11 14:06 UTC

Researchers have built a pull request that steals a repository's secrets by hiding the malicious instruction inside a PNG that AI code reviewers never open.

Attack hides prompt injection in PNG images to bypass AI code reviewers.
Coding agent reads the image and steals secrets from repository's .env file.

Kairos Engine – a pipeline that kills trading strategies before they cost money

2026-07-11 13:24 UTC

Kairos Engine is an end-to-end quantitative research platform for scalping signals in FX and metals markets. It uses a Hidden Markov Model for regime classification, an ensemble of time series foundation models for forecasting, and a strict walk-forward backtest against a broker cost model built from real measured spreads. The engine's value lies in rejecting bad strategies before real capital is risked.

Kairos Engine processes raw tick data through a four-state HMM regime classifier and an ensemble of four time series foundation models.
Backtested over 365 days on XAUUSD, the only passing variant executed 221 trades with net expectancy of +222.91 pips.

AI takes two-thirds of venture money, and your odds are still one in six

2026-07-11 12:26 UTC

In 2025, AI companies captured 65% of US venture capital, but most went to megadeals; small seed rounds shrank. The article analyzes seed round costs, success rates (about 1 in 6), and a decision framework for founders, along with fundraising strategies and alternatives.

AI companies absorbed most VC funding, but small seed round count and dollars fell 20%.
Median seed round sells ~20% of company; by Series A founders hold 36%.

Show HN: AI assistant for Google Chat to translate any file preserving layout

2026-07-11 12:00 UTC

AnyFile Translator is an AI-powered assistant for Google Chat that translates documents, web links, and messages while preserving original formatting. It supports over 100 languages, offers AI content writing, and ensures data privacy with encryption and deletion.

Translate files (PDF, Word, PPT, etc.) while preserving layout
Supports over 100 languages and works within Google Chat

AI Surveillance and Social Progress

2026-07-11 11:33 UTC

AI surveillance systems will soon track all public and private actions, enforce rules instantly, and create chilling effects that suppress dissent, creativity, and social progress. The article discusses examples from China and the US, the mechanisms of chilling effects, and calls for policy interventions.

AI surveillance combines facial recognition, digital tracking, and automated enforcement.
China and the US are deploying such systems for social control.

Nobel laureate Omar Yaghi joins Tsinghua to lead AI materials lab

2026-07-11 10:14 UTC

Omar Yaghi, 2025 Nobel Prize in Chemistry winner, has left the US to lead an AI-driven research center at Tsinghua University in China, aiming to accelerate materials design and synthesis to address environmental challenges like water scarcity and carbon neutrality.

Yaghi will head a team exploring how AI can transform materials design and synthesis, drastically reducing development cycles.
He won the 2025 Nobel Prize for pioneering metal-organic frameworks (MOFs), ultra-porous materials with record surface areas for carbon capture, water harvesting, and hydrogen storage.

Documentation is still in your Mum's filing cabinet

2026-07-11 09:41 UTC

The article argues that traditional folder-based documentation is outdated for modern knowledge work. It compares documentation to a filing cabinet inherited from 1970s office metaphors, which forces knowledge into single locations. AI retrieval systems highlight the limitations of folders, advocating for connected knowledge graphs that allow discovery from multiple paths.

Documentation's folder structure is based on 1970s office metaphors that don't match how knowledge works.
People forage for information rather than browsing hierarchies, often struggling to find what they need.

A font that humans can read but AI cannot

2026-07-11 09:36 UTC

Ghost Font is an experimental anti-AI font that uses motion, noise, and decoys to make messages readable to humans but not to current AI models. Even advanced models like Claude Fable and GPT Sol 5.6 Ultra struggle to decode it, making it a potential tool for CAPTCHA and AI visual perception benchmarks.

Ghost Font hides messages using moving dots; single screenshots reveal nothing.
Advanced AI models like GPT Sol 5.6 Ultra required lengthy analysis and often hallucinated.

Create high-converting AI UGC ads in minutes

2026-07-11 05:58 UTC

AIUGCAds.net provides an AI-powered platform to generate realistic UGC-style video ads in minutes, eliminating the need for creators, filming, or editing. It serves ecommerce stores, dropshippers, DTC brands, agencies, and marketplace sellers, enabling ad creation from product links or images with AI actors, voiceovers, and product demos.

Generate UGC video ads in under 2 minutes using AI, no creators or filming required.
Over 100 realistic AI actors and voiceovers in multiple languages and accents.

Show HN: Krbn, a pencil-style 3D renderer with SVG output

2026-07-11 05:51 UTC

Krbn is a web engine for non-photorealistic, pencil-style rendering of 3D scenes to SVG. It derives strokes from geometry rather than rasterizing pixels, supporting exact silhouettes, hidden lines, hatching, and more. Written in TypeScript and MIT-licensed, it was developed with AI assistance.

Krbn is a pencil-style 3D renderer that outputs SVG. It computes silhouettes and hidden lines analytically, not via pixel sampling.
Features include exact conic silhouettes, hidden-line ghosting, curvature-following hatching, and hand-drawn wobble.

The Conversation We're Not Having About AI in Peer Review

2026-07-11 05:36 UTC

This article discusses the important but often overlooked issues surrounding AI in academic peer review, citing Christian Bird's research on the topic.

AI's role in peer review is growing but underdiscussed
Christian Bird's work highlights fairness and accuracy concerns

Managing a small local AI budget (Mac M2 16gb)

2026-07-11 04:17 UTC

The article describes millfolio's hybrid tag system for efficient local AI inference: deterministic string and reference tags cover most transactions, while on-device AI tags handle the fuzzy tail. Tags are computed once at index time and stored, avoiding re-inference at query time. Backfilling uses batching, deduplication, and a priority scheduler to avoid overloading the laptop. Performance data shows ~650ms per distinct description, with 8.5 rows/s effective speed. The system includes a preview mechanism for users to verify tags before saving.

millfolio uses three tag types: string, reference, and AI tags, with AI only for uncertain cases.
Tags are computed once and stored, enabling fast queries without re-running AI.

Engineer

2026-07-11 03:22 UTC

Aicon Solutions is a product studio building AI-augmented thinking tools for decision-making under uncertainty. Their products include nodx, LaoMOS, and Still Employed?.

Focus on thinking better, not just doing more.
Products: nodx (decision workspace), LaoMOS (multi-agent orchestration), Still Employed? (daily check-in).

Choosing the Right AI Agent Memory Strategy: A Decision-Tree Approach

2026-07-11 00:43 UTC

Learn how to select appropriate memory strategies for AI agents using a decision tree approach that classifies information into working, semantic, episodic, or procedural memory layers.

Memory strategies for AI agents should be designed deliberately, not as an afterthought.
A five-question decision tree helps classify information into the correct memory layer: working, semantic, episodic, or procedural.

Which 'AI scientist' suits your lab? A guide for the perplexed

2026-07-10 23:58 UTC

The article explores various AI tools designed for scientific research, such as Anthropic's Claude Science, Google DeepMind's Co-Scientist, and the open-source Biomni. These tools accelerate tasks like genome analysis, hypothesis generation, and experimental design. Scientists share their experiences and recommend trying multiple tools, starting with small tasks, and verifying outputs while maintaining caution.

Anthropic launched Claude Science platform focused on biology research.
Google DeepMind's Co-Scientist generates scientific hypotheses by mining literature.

Ethereum deploys AI agents to hunt bugs, discovers libp2p vulnerability

2026-07-10 23:09 UTC

The Ethereum Foundation's Protocol Security team used coordinated AI agents to find a remotely-triggerable panic in libp2p's gossipsub (CVE-2026-34219). The real challenge was not finding the bugs but triaging AI-generated candidates to separate genuine findings from confident-sounding noise, highlighting the importance of human judgment in security auditing.

Coordinated AI agents discovered a critical libp2p vulnerability
Most AI-generated candidates are false positives or duplicates

Migrating a production AI agent to GPT 5.6

2026-07-10 20:40 UTC

Ploy migrated its AI agent from Claude Opus 4.8 to OpenAI's newly released GPT-5.6 Sol, achieving 2.2× faster builds, 27% lower cost, and improved visual scores. The migration involved solving issues with tool call argument filling, prompt caching differences, and reasoning replay, all of which were addressed through engineering optimizations.

GPT-5.6 Sol outperformed Claude Opus 4.8 in speed, cost, and visual quality
Tool call parameter filling issue resolved by schema transformation

AI Gets a Cerebellum

2026-07-10 19:16 UTC

Northwestern researchers developed a cerebellum-inspired memtransistor that consumes very little energy and detects novelties almost instantly. In tests, it identified abnormal heart rhythms within one-fifth of a heartbeat with over 98% accuracy, using 10,000 times fewer computer operations than conventional AI.

New memtransistor mimics cerebellum to ignore routine inputs and react only to unexpected events
Detected arrhythmias in milliseconds with 98% accuracy, using minimal energy

OpenWiki Brains: Proactive Memory for AI Agents

2026-07-10 16:46 UTC

OpenWiki Brains turns sources like Gmail, Notion, Git, X, Hacker News, and web search into a local wiki that agents can use as fresh, proactive memory.

OpenWiki Brains turns external sources into a local wiki for agents to use as proactive memory.
Two modes: Personal Brain for general context and Code Brain for code documentation.

Vibe coded AI Neovim is useful

2026-07-10 16:36 UTC

aeovim is a Rust TUI that multiplexes LLM coding agents with a Neovim-like interface, currently wrapping Claude Code and offering features like multi-chat sessions, streaming, and persistence.

aeovim provides a keyboard-native TUI for managing multiple AI coding agents simultaneously.
It reuses Claude Code's infrastructure and supports live multi-turn sessions with streaming output.

Better tools made Copilot code review worse. Here’s how we actually improved it.

2026-07-10 15:57 UTC

How migrating Copilot code review to shared Unix-style code exploration tools reduced review cost by reshaping agent workflows around pull request evidence. The post Better tools made Copilot code review worse. Here’s how we actually improved it. appeared first on The GitHub Blog.

Migrating to shared Unix tools initially increased review cost and reduced effectiveness.
The problem was not the tools but the instructions, which caused the agent to browse broadly instead of focusing on the diff.

AI Web Design (Opus vs. Sol)

2026-07-10 13:49 UTC

This article compares two leading AI models for web design—Opus 4.8 and GPT-5.6 Sol—based on the author's extensive experience. It emphasizes the importance of visual references over text prompts, details each model's strengths and weaknesses, and provides a practical workflow to achieve high-quality designs.

Visual references significantly improve AI web design output.
Opus 4.8 is reliable but conventional; GPT-5.6 Sol is creative but prone to over-structuring.

Older adults know AI is slop. They just like it

2026-07-10 13:41 UTC

Despite being aware that AI-generated content is fake, older adults find emotional comfort and companionship in it. A study found that Chinese users aged 50-75 watch AI family member videos because they offer direct affection and filial piety lacking in real life.

A 67-year-old retired businessman was moved to tears by AI-generated music videos that reminded him of his childhood.
AI virtual family members are popular on Chinese social media, providing daily blessings and companionship to the elderly.

Research

Related tags