Google AI News

Source Mix

Hacker News AI29
arXiv Computational Linguistics3
MarkTechPost3
Google Research Blog2
The New Stack AI2
The Verge AI2
Analytics Vidhya1
arXiv AI1

Topic Mix

Agents32
Research20
Models14
Policy13
Chips8
Startups7
Robotics5
Tools2

Timeline

2026-07-0713
2026-07-0811
2026-07-0910
2026-07-118
2026-07-105
2026-07-122
2026-07-061

Latest Updates

Show HN: Inkfold – workspace across multiple AI providers with shared memory

2026-07-12 07:38 UTC

Inkfold is a platform that provides shared memory across multiple AI providers like ChatGPT, Claude, Gemini, Grok, and more. It captures conversations, builds structured context, and injects relevant memory into new chats, eliminating the need to re-explain yourself. It offers smart, private, or incognito retention modes and subscription or pay-as-you-go pricing. Suitable for individuals, teams, and organizations.

Shared memory across multiple AI providers
Capture, remember, and inject context

Cloudflare Threatens to Cut Google Off from Their Publishers in Searches Due to AI Scraping

2026-07-12 03:43 UTC

Cloudflare may cut off Google's search access to its publishers due to aggressive AI scraping, which degrades site performance and disrupts content publishing and comment moderation.

Cloudflare threatens to block Google search access over AI scraping
Heavy scraping causes performance issues, hindering content publishing and moderation

AI fiction is easy to detect because it's stupid and bad, research finds

2026-07-11 18:53 UTC

A study from University of Maryland and Google DeepMind found that AI-generated fiction is easily detectable due to narrative flaws like over-explaining themes, lack of subplots, and clunky moralizing. The researchers developed StoryScope, a detector that analyzes narrative features, and tested it on over 50,000 AI-generated stories. The study highlighted that different AI models have distinct quirks (e.g., GPT overuses dream sequences, Gemini uses character descriptions). The dataset used includes Books3, which is controversial due to copyright issues. The researchers used AI to assist in writing the paper itself.

AI fiction suffers from predictable narrative structures, such as over-explaining themes and avoiding subplots.
StoryScope detector analyzes narrative features to distinguish AI from human writing with high accuracy.

Free AI Visibility Audit Tool & Agent

2026-07-11 15:59 UTC

This free tool checks whether ChatGPT, Gemini, Claude, Perplexity, Grok, and Google AI can crawl, understand, verify, and cite your website. The report includes full-site crawl inventory, brand entity profile, claim-level evidence ledger, AI intent coverage matrix, technical crawlability audit, schema and structured data plan, trust signal gap analysis, competitor and off-site evidence map, and P0/P1/P2 execution roadmap, with sample cases from ecommerce, AI SaaS, and B2B services.

Free audit tool assesses AI visibility across major AI systems.
Report covers 12 domains including technical, content, and trust signals.

My AI Model Tier List for Mid-2026

2026-07-11 15:43 UTC

A personal, non-benchmark tier list of AI models for coding and auditing as of mid-2026, covering Anthropic Fable, OpenAI Sol, Mistral, Gemini, and DeepSeek, with commentary on US export controls and European perspectives.

Fable (Anthropic) gets a B: fluent but unreliable, prone to hiding bugs.
Sol (OpenAI) gets an S: trustworthy for low-level code and testing.

Litert.js, Google's High Performance Web AI Inference

2026-07-11 14:32 UTC

Google announces LiteRT.js, a JavaScript binding of LiteRT that brings high-performance AI inference to web browsers with hardware acceleration via WebAssembly, outperforming existing solutions by up to 3x.

LiteRT.js enables running .tflite models directly in the browser with native performance through WebAssembly.
Supports CPU (XNNPACK), GPU (WebGPU), and NPU (WebNN) acceleration for maximum efficiency.

Microsoft joins Google in backing Go for AI agents — OpenAI and Anthropic lag

2026-07-11 14:00 UTC

Go has become the lingua franca for cloud infrastructure. Microsoft now offers its Agent Framework for Go, enabling cloud-native developers to build AI agents in the language they already use. Google already supports Go, while OpenAI and Anthropic do not yet.

Microsoft releases Go SDK for Agent Framework in public preview.
Go is the language behind Kubernetes, Docker, and many cloud tools.

Show HN: AI assistant for Google Chat to translate any file preserving layout

2026-07-11 12:00 UTC

AnyFile Translator is an AI-powered assistant for Google Chat that translates documents, web links, and messages while preserving original formatting. It supports over 100 languages, offers AI content writing, and ensures data privacy with encryption and deletion.

Translate files (PDF, Word, PPT, etc.) while preserving layout
Supports over 100 languages and works within Google Chat

Show HN: Create realistic group photos in real time with AI

2026-07-11 09:48 UTC

Pixailer is an AI tool that lets you upload individual photos and describe a scene to generate realistic group photos of up to 8 people in seconds. It offers multiple AI engines (Google Gemini and OpenAI GPT-Image), supports prompts in several languages, and uses a credit-based payment system with no subscription. Privacy is prioritized: images are not used for training and are deleted after generation.

Upload clear photos, describe the scene, and AI generates a group photo in under 10 seconds
Supports up to 8 people with two AI engines: Express (fast) and Studio (high-fidelity)

Show HN: Schedule tasks for your AI agents from Google Calendar

2026-07-11 01:37 UTC

Agent Caly is a tool that schedules tasks for AI agents from Google Calendar.

Integrates with Google Calendar
Schedule tasks for AI agents

Which 'AI scientist' suits your lab? A guide for the perplexed

2026-07-10 23:58 UTC

The article explores various AI tools designed for scientific research, such as Anthropic's Claude Science, Google DeepMind's Co-Scientist, and the open-source Biomni. These tools accelerate tasks like genome analysis, hypothesis generation, and experimental design. Scientists share their experiences and recommend trying multiple tools, starting with small tasks, and verifying outputs while maintaining caution.

Anthropic launched Claude Science platform focused on biology research.
Google DeepMind's Co-Scientist generates scientific hypotheses by mining literature.

AI that helps you BE, not just DO

2026-07-10 15:55 UTC

This article critiques current AI tools that focus solely on task completion (DO) while ignoring the potential to help users understand their own work patterns and improve themselves (BE). The author shares insights from 16 days of self-tracking, revealing patterns like a predictable crash after two hours of deep work and a prime focus window from 11:00 to 12:30. He introduces a stack (Dayflow, Gemini Flash Lite, Clawdbot, self.md) that aims to provide behavioral insight and prediction rather than just task execution.

Current AI (e.g., ChatGPT, Claude) only remember facts users tell them, not how users actually work.
16 days of self-tracking revealed patterns: energy crash after 2 hours of deep work, best focus window 11:00-12:30, frequent Telegram distraction, etc.

Google Research Introduces SensorFM: A Wearable Health Foundation Model Pretrained on One Trillion Minutes of Sensor Data

2026-07-10 08:52 UTC

Google Research, Google DeepMind, and university collaborators have introduced SensorFM, a foundation model for wearable health pretrained on over 1 trillion minutes of sensor data from 5 million participants. The ViT-1D masked-autoencoder backbone, trained on a massive corpus, demonstrates strong scaling behavior. With frozen embeddings and a PCA-50 linear probe, it outperforms feature-engineered baselines on 34 of 35 tasks. The paper also details an agentic classroom that searched 30,516 prediction heads and a clinician evaluation that grounds a Personal Health Agent.

SensorFM is pretrained on 5 million participants with over 1 trillion minutes of sensor data from 100+ countries and 20+ wearable models.
Adaptive and Inherited Masking (AIM) handles missing data effectively, reducing reconstruction error by up to 83.7% over baselines.

A Reliability Assessment of LALM Audio Judges for Full-Duplex Voice Agents

2026-07-10 04:00 UTC

A new study evaluates the reliability of Gemini models as audio judges for full-duplex voice agent conversations. Using 209 stereo sessions scored on 8 dimensions, Gemini 2.5 Flash shows high agreement with human raters on most dimensions, with cost savings of roughly two orders of magnitude. The paper also cautions that model swaps require re-validation on calibration data.

Gemini 2.5 Flash's LALM-human Spearman rho differs from human-human rho by at most 0.07 on 5 of 8 dimensions
LALM agrees within 1 point of the three-rater human mean on 60-92% of sessions for 6 dimensions

VectorizationLLM: Smart Vectorization Based AI Assistant

2026-07-10 04:00 UTC

VectorizationLLM is a specialized large language model based on Google open-weight LLMs, designed to help students learn smart vectorization and related topics in MATLAB for the course CTEC 247 at New York Institute of Technology. It uses a RAG knowledge base and system prompts to provide explanations and examples without giving direct answers.

Built on Google open-weight LLMs
Targets CTEC 247: Applied Computational Analysis II

Academia and the "AI Brain Drain"

2026-07-09 22:25 UTC

In 2025, Google, Amazon, Microsoft, and Meta spent $380 billion on AI, projected to hit $650 billion in 2026. Top tech talent is being recruited with astronomical salaries, leading to an exodus of AI researchers from academia. Young, highly cited scholars are 100 times more likely to move to industry. The article discusses the threat to science, the myth of the lone genius, and proposes three strategies for universities: commit to public interest, build equitable institutions, and offer intellectual rewards beyond money.

Tech firms spent $380B on AI in 2025, expected to reach $650B in 2026, with huge sums on talent. Meta offered $250M to one researcher.
Young, highly cited AI researchers are 100 times more likely to leave academia than their older, average-cited peers.

Solve harder problems with AlphaEvolve now available to everyone on Google Cloud

2026-07-09 21:00 UTC

Google announces the general availability of AlphaEvolve, a code optimization and discovery agent built on Gemini, now available on the Gemini Enterprise Agent Platform. It helps businesses and researchers tackle complex algorithmic optimization problems in logistics, semiconductors, genomics, and more, with proven results from early adopters.

AlphaEvolve is now generally available on the Gemini Enterprise Agent Platform.
It uses a four-step process: define, measure, optimize, apply to generate highly optimized code.

Google will now tell you if an ad was made with AI

2026-07-09 20:11 UTC

Google adds a 'created or edited with AI' label in My Ad Center for Search, Discover, and YouTube ads. Auto-label for Google's tools, manual for others.

Google adds AI label in My Ad Center for Search, Discover, and YouTube.
Auto-label for ads made with Google's AI tools; manual for others.

Gemini 2.5 models accidentally deprecated early by Google

2026-07-09 19:53 UTC

Google deprecated Gemini 2.5 Flash models without warning, earlier than the announced shutdown date, causing confusion among developers.

Google accidentally deprecated Gemini 2.5 Flash models early
Deprecation occurred before the scheduled shutdown date

Cloud Run sandboxes: Lightweight isolation for AI agents

2026-07-09 17:41 UTC

Google Cloud announces the public preview of Cloud Run sandboxes, a native, secure, and ultra-fast runtime environment for executing untrusted code and agent workloads, starting in milliseconds. It supports use cases like LLM code interpreters, headless browsers, and user-submitted code execution, with zero-trust security through credential isolation, default-deny egress, and read-only filesystem overlay.

Cloud Run sandboxes are native, secure runtime environments that start in milliseconds.
Support LLM code interpreters, headless browsers, and user-submitted code execution.

Meta says its new AI model is ready to compete on coding

2026-07-09 14:00 UTC

Meta released Muse Spark 1.1, an AI model now accessible to developers via the new Meta Model API. It features improved coding capabilities, bug detection, multi-agent workflow support, and multimodal perception, aiming to catch up with rivals like OpenAI, Google, and Anthropic.

Muse Spark 1.1 is a major upgrade based on developer feedback, supporting advanced coding tasks.
The model is available in public preview for US developers through the Meta Model API with $20 free credits.

Show HN: WhisperShortcut – voice layer for AI on macOS (BYOK, offline Whisper)

2026-07-09 11:49 UTC

WhisperShortcut is an open‑source macOS app that lets you use voice for dictation, editing, reading aloud, screenshots, and AI chat via keyboard shortcuts. It supports Google Gemini, OpenAI GPT, xAI Grok, and offline Whisper models—no account or subscription required.

Use ⌘1–⌘4 and ⌥Space for dictation, prompt editing, read‑aloud, screenshot, and chat.
Bring your own API keys (Gemini, GPT, Grok) or run fully offline with local Whisper.

SensorFM: Towards a general intelligence and interface for wearable health data

2026-07-09 09:56 UTC

Google Research introduces SensorFM, a foundation model for wearable health pretrained on over one trillion minutes of sensor data from five million people. It learns a general-purpose representation of human physiology that transfers across 35 health tasks, supports label-efficient adaptation, and can ground a Personal Health Agent.

SensorFM is pretrained on over a trillion minutes of wearable sensor data from five million people.
It uses self-supervised learning with a missing-aware masking approach to handle real-world data gaps.

Google AI Studio Adds Import from GitHub to Build a Deployable App

2026-07-09 07:58 UTC

Google AI Studio is rolling out Import from GitHub in Build mode. It transforms an existing repo into a runtime-compatible format. You can then iterate on it, deploy it, and more. Announced by the AI Studio team and product lead Logan Kilpatrick on July 8, 2026, this feature adds the missing inbound path from GitHub, enabling developers to start from existing codebases.

Google AI Studio Build mode now supports importing GitHub repositories and converting them to a runtime-compatible format.
Users can iterate on imports and deploy apps; API keys are configured server-side automatically.

AI software that generates 'rage bait' developed by Germany's far-right AfD

2026-07-09 05:17 UTC

An undercover investigation by Correctiv reveals that Germany's far-right AfD party has developed Alternita, an AI software suite using Google Gemini, OpenAI, and Anthropic Claude to generate provocative social media posts known as 'rage bait', aiming to control messaging and maintain online dominance.

AfD created AI software to generate 'rage bait' content designed to provoke emotional reactions.
The software automatically pulls from far-right news sources and produces posts ready for all major social media platforms.

Show HN: Nully – FOSS AI chat without the bloat

2026-07-08 20:25 UTC

Nully is a lightweight, privacy-focused, self-hostable AI chat app powered by OpenRouter. It saves all chats locally, requires no account, and offers fast streaming with hundreds of models. Performance benchmarks show it loads faster and uses fewer resources than ChatGPT, OpenRouter, Gemini, and Claude. Features include attachments, web search, portable history, and a single small binary.

Nully is a lightweight, open-source, self-hostable AI chat app emphasizing privacy and performance.
All chat data is stored locally on your device; no account or tracking required.

Google AI Studio Adds ‘Import from GitHub’ to Build Mode, Turning an Existing Repo Into an Editable, Deployable App

2026-07-08 18:41 UTC

Google AI Studio is rolling out an 'import from GitHub' feature inside its Build mode. It takes a repo and transforms it into a runtime-compatible format, allowing users to iterate, deploy, and more. This adds the missing inbound GitHub path to Build mode, though details on private repo support and sync behavior are still emerging.

Build mode now supports importing a GitHub repo directly. The repo is transformed into a runtime-compatible format for iteration and deployment.
AI Studio configures GEMINI_API_KEY as a server-side secret, preventing exposure in client-side code.

JetBrains’ next move isn’t a better IDE — it’s a governance layer over Claude Code, Codex, and Gemini CLI

2026-07-08 17:44 UTC

JetBrains launches AI for Teams and Organizations, adding shared context, reusable agentic processes, organization-wide governance, and cost controls on top of existing AI tools, without requiring teams to standardize on one vendor.

JetBrains announces AI for Teams and Organizations, a governance layer above any AI tool.
Features include automations, JetBrains Context (cross-repo knowledge), JetBrains Central (management console), and Central CLI for tracking CLI agents.

I tried Claude Cowork on my Gmail inbox after Gemini choked - and it saved me hours of work

2026-07-08 15:09 UTC

Gmail's AI failed at a nuanced research task, but Claude Cowork found the right pitches, quotes, and permissions, proving connected AI assistants may finally help tackle some aspects of email overload.

Claude Cowork turned inbox chaos into usable article research.
Gmail search struggled with context and discernment.

Start with A – Open-source, self-hosted investment research platform (BYOK AI)

2026-07-08 14:50 UTC

Start with A is an open-source investment research platform integrating research, portfolio monitoring, and journaling into a disciplined workflow. It is self-hosted with BYOK AI support for Gemini, OpenAI, and Anthropic.

Open-source, self-hosted, full data control.
Three modules: Research, Portfolio, Journal with a closed-loop workflow.

Is AI making us dumber?

2026-07-08 12:37 UTC

This article examines the potential cognitive impacts of generative AI, citing research that shows overreliance may weaken critical thinking, creativity, and persistence. Experts compare AI to past innovations like Google and the calculator but warn that AI's nature as a thinking substitute poses unique risks. While long-term studies are lacking, early evidence suggests that AI use can lead to skill atrophy, especially when used without developing foundational abilities. The piece calls for mindful adoption to preserve essential human skills.

Studies show that using AI for writing or math can lead to worse performance when AI is removed, similar to GPS's effect on spatial memory.
AI can reduce persistence; participants who used AI for problem-solving were more likely to give up on later unaided problems.

The Sequence AI of the Week #891: Prompting a Spreadsheet : Inside Google’s TabFM for Tabular AI

2026-07-08 11:02 UTC

Google Research unveils TabFM, a foundation model for tabular data that performs in-context learning on tables, enabling predictions on unseen datasets with a single forward pass, no training or feature engineering required.

TabFM is a new foundation model for tabular classification and regression from Google Research.
It uses in-context learning to make predictions on entire tables in one pass without training or tuning.

AI Models Overthink Problems—and It’s a Security Risk

2026-07-08 11:00 UTC

Research shows that large language models with reasoning capabilities can be tricked into 'overthinking' using logically inconsistent prompts, leading to a denial-of-service attack. Researchers from Zhejiang University and Alibaba developed an evolutionary algorithm that generates malicious prompts, causing outputs up to 26 times longer in leading models like DeepSeek-R1, Qwen3-Thinking, GPT-o3, and Gemini 2.5 Flash.

Researchers demonstrate a new attack exploiting 'overthinking' in AI reasoning models, causing excessive computation.
An evolutionary algorithm corrupts prompts to produce outputs up to 26 times longer than normal.

ZML releases free product to speed inference across AI chips

2026-07-08 08:18 UTC

ZML, a French AI startup endorsed by Turing Award winner Yann LeCun, has released free inference software enabling various open-source LLMs to run on multiple chips including Nvidia, AMD, Google TPU, Apple Metal, and Intel Arc.

ZML, backed by Yann LeCun, launches free inference software
Supports diverse AI chips, challenging Nvidia's dominance

The yes-no bias of large language models reflects answer order and wording, not shifts in moral judgment

2026-07-08 04:00 UTC

A new study decomposes the yes-no bias in LLMs using crossed symmetrization, finding that frontier models' internal moral stance is nearly format-invariant, while Claude models show significant order and lexical biases, GPT-5.5 and Gemini near zero. Bias shrinks with extended reasoning and follows surface wording, not the verdict.

LLMs' yes-no bias in moral dilemmas can be decomposed into order bias (toward last option) and lexical pull (toward 'no'), but the underlying moral scale is format-invariant.
Claude models exhibit substantial bias (story-averaged -0.32 to -0.86), while GPT-5.5 and Gemini are near zero; extended reasoning reduces the artifact.

[AINews] Lilian Weng summarizes 35 papers on Harness Engineering for RSI

2026-07-08 02:20 UTC

This edition of AINews covers a broad range of AI developments from July 6-7, 2026. Highlights include Lilian Weng's deep dive into harness engineering for recursive self-improvement, Meta's launch of Muse Image and preview of Muse Video with agentic generation loops, and major product updates from Anthropic, LangChain, and Google on agent platforms. Other notable items: NVIDIA's Audex audio model, Cohere's Arabic ASR, robotics integrations with Hugging Face and NVIDIA, Liquid AI's Antidoom method to reduce reasoning loop failures, and Anthropic's controversial J-space interpretability work. Also covered: benchmarks for agents and legal AI, research automation, and inference efficiency advances.

Lilian Weng's blog post reframes recursive self-improvement around the harness rather than direct weight modification, emphasizing that harness engineering is critical for specifying goals and context.
Meta's Muse Image and Muse Video showcase agentic generation with planning, tool use, and self-refinement, quickly ranking high on public leaderboards.

Neuronpedia, an open source platform for AI interpretability

2026-07-07 19:42 UTC

Neuronpedia is an open source interpretability platform that enables users to explore, visualize, and steer the internal workings of AI models. It features tools like HeadVis, Natural Language Autoencoders, Circuit Tracer, and steerable activations, supporting over 50 million latent vectors across numerous models. Created by Johnny Lin, it is backed by organizations including Anthropic and Google DeepMind.

Neuronpedia is an open source platform for AI interpretability, offering tools to explore, visualize, and steer model internals.
Key features include HeadVis, Natural Language Autoencoders, Circuit Tracer, and activation steering. It hosts massive collections of sparse autoencoders and models.

Why AI Agents Forget by Design

2026-07-07 17:28 UTC

The article explains that major LLM APIs (OpenAI, Anthropic, Google) are stateless by default, meaning each API call is independent and the model has no inherent memory. This architectural choice forces developers to resend entire conversation histories per request, leading to high costs, latency, and performance degradation (lost-in-the-middle effect). The author identifies four production failure modes: re-explanation, knowledge loss on handoff, contradiction without resolution, and hallucination over abstention. Current mitigation patterns (prompt stuffing, fine-tuning, RAG, vector databases, etc.) are partial solutions. The temporal validity of stored facts remains an unsolved problem.

LLM APIs are stateless: each call resets memory, requiring clients to resend full context
Cost, latency, and model performance degrade with long histories (lost-in-the-middle)

The power of collaboration: How we can reduce traffic congestion

2026-07-07 16:42 UTC

Google Research conducted a large-scale real-world study in 10 US cities showing that slightly rerouting a small fraction of trips (under 2%) using navigation apps can measurably reduce traffic congestion and emissions. The study, published in Nature Cities, found median speed increases of 2% on targeted segments and potential CO2e savings of thousands of tons per city per year.

A six-month experiment in 10 US cities demonstrated that coordinating a small fraction of trips (under 2%) via navigation app interventions improved network-wide traffic efficiency.
Rerouting trips away from congested segments to similar alternative routes led to a median increase of ~2% in driving speeds on targeted segments and reduced fuel consumption.

Zero-Shot Local Document Parsing with Gemma 4: Treating PDFs as Images

2026-07-07 14:00 UTC

This article introduces a method to render PDF pages as images and use Google DeepMind's Gemma 4 vision-language model for local document parsing. The approach unifies scanned and digital PDFs, eliminating the need for OCR or layout parsers, with flexible visual token budgets and complete code examples.

Render PDF pages as high-resolution images and feed them to a vision-language model, dissolving the scanned-versus-digital distinction.
Gemma 4 supports 2D rotary position embeddings and per-layer embeddings, enhancing document understanding, and runs entirely locally.

Observability Design for the AI Era – App, Infra, CI, LLM (Part 1)

2026-07-07 13:24 UTC

The article discusses reshaping the observability stack for the AI era, splitting monitoring into four axes: application (standard OTel stack), infrastructure (GCP metrics into Mimir), CI (post-hoc log pulling to Loki), and LLM (Gemini with Prometheus for real-time cost, Claude Code with BigQuery for SQL aggregation). Emphasizes that data must be shaped before AI can consume it effectively.

Monitoring split into four axes: application, infrastructure, CI, LLM
CI logs pulled post-hoc instead of pushed, decoupling execution from observability

Big tech’s lofty climate goals wrecked by energy-hungry AI

2026-07-07 13:11 UTC

Tech giants' investments in AI are undermining their climate neutrality pledges. Google and Amazon's net-zero targets slip away, while Meta scrambles for new business. Other tech news includes US anger at data centers, Trump's crypto earnings, Tesla's sales, South Korea's AI chip boom, China's robotics push, and Britain's AI growth zones.

Tech giants' AI investments hinder climate goals.
Google and Amazon's net-zero pledges at risk.

OKF: Redefining Knowledge Bases for AI Agents

2026-07-07 11:45 UTC

In June 2026, Google introduced the Open Knowledge Format (OKF), an open specification for how AI agents organise and exchange knowledge. An OKF bundle is just Markdown files, lightweight YAML metadata, and links between concepts, yet it challenges the assumption that every AI application needs embeddings and vector databases.

OKF uses plain text Markdown files, enabling Git version control and explicit linking between concepts.
Traditional RAG loses document structure due to chunking; OKF preserves relationships inherently.

Show HN: Brianni – AI chat on GPT/Claude/Gemini that we can't read (provable)

2026-07-07 10:10 UTC

Brianni is an AI chat app integrating GPT, Claude, and Gemini with a provable claim of operator blindness. Conversations are encrypted with user-derived keys, and plaintext only exists inside AWS Nitro Enclaves with verifiable code measurements. Users can independently verify the system by reproducing the enclave build and comparing the PCR0 hash.

Chat encryption keys are derived from a user-generated recovery phrase never sent to servers.
Plaintext only exists server-side inside hardware-isolated AWS Nitro Enclaves with attested code.

Expanding Managed Agents in Gemini API: background tasks, remote MCP and more

2026-07-07 08:54 UTC

We’re announcing new capabilities in Managed Agents in Gemini API so developers can build reliable, production-ready agents.

New background execution for async interactions.
Direct integration with remote MCP servers.

AI STS Stack for underserved languages paper

2026-07-07 05:31 UTC

The article explores the challenges of building real-time voice AI for low-resource languages like Azerbaijani, comparing end-to-end speech-to-speech models (OpenAI Realtime, Gemini Live) and cascaded pipelines (LiveKit, Pipecat, Vapi). It details failure modes, component availability, and provides a checklist for evaluation. Key findings: Gemini Live spoke well but was too slow; OpenAI Realtime had accent issues. Available components include Azure TTS, ElevenLabs, and Scribe v2. The cascaded stack offers flexibility but requires engineering latency.

Speech-to-speech models often fail for low-resource languages due to language coverage, output quality, or latency.
Cascaded pipelines provide flexibility but require handling latency and finding viable STT/TTS components.

Seduced by the Narrative: Assessing Rule Adherence in Semi-Open Textual Sandboxes

2026-07-07 04:00 UTC

As LLMs are increasingly deployed as autonomous adjudicators in semi-open textual game environments, robust rule adherence becomes critical when user intent conflicts with system rules. However, these models are trained to be helpful and compliant, leaving them vulnerable to a class of attacks we term Rhetorical Injection, where adversarial users exploit narrative framing techniques such as pseudo-logical reasoning and authoritative coercion to bypass adjudication logic. We present CoC-Seduce, a multi-agent adversarial benchmark built on Tabletop Role-Playing Game (TRPG) mechanics, an ideal instantiation of semi-open environments where rules are explicit for adjudication, yet interaction remains entirely in natural language. Three frontier models, i.e., GPT-5.4, Claude Sonnet 4.6, Gemini 3.5 Flash, serve as adversarial generators producing 5,376 samples across 4 world settings and 16 skill categories. We then benchmark 20 target adjudicators against this corpus. Evaluation across 20 models reveals that neither model scale nor explicit reasoning mechanisms reliably confer adjudication robustness, with Pseudo-Logic emerging as the dominant attack vector and cross-cultural settings exposing systematic knowledge gaps across all evaluated families.

LLMs as adjudicators in text games are vulnerable to Rhetorical Injection attacks
Attackers use pseudo-logic and authoritative coercion to bypass rules

AutomationBench-AA

2026-07-07 02:22 UTC

Artificial Analysis announces AutomationBench-AA, an independent leaderboard for Zapier's AutomationBench, testing AI agents on real SaaS workflow automation across 657 tasks. Claude Fable 5 leads at 48.6%, followed by Opus 4.8 and Gemini 3.5 Flash. The benchmark reveals that all models violate some guardrails, with Finance tasks being the hardest. Gemini 3.5 Flash offers the best value, while GLM-5.2 is the top open weights model.

AutomationBench-AA evaluates 657 workflow automation tasks across simulated SaaS environments.
Claude Fable 5 (max) leads at 48.6% objective completion rate.

Meta Tests Pocket, an Experimental App for AI-Generated Mini-Games

2026-07-07 01:11 UTC

Meta is testing Pocket, a social app that lets users create, share, and discover AI-generated mini-games called 'gizmos' without coding. Currently in closed testing on Google Play, it may leverage cross-promotion with Facebook, Instagram, and WhatsApp.

Pocket is a new social app from Meta focused on AI-generated mini-games.
Users create games via natural language prompts, no coding required.

XGBoost beat LLMs at finding civilian-harm posts in Ukraine war Telegram data

2026-07-06 20:28 UTC

Bellingcat developed an XGBoost-based machine learning model that dramatically reduced the time needed to identify civilian-harm incidents from Telegram posts. Trained on a dataset of verified cases and engineered features, XGBoost outperformed logistic regression, random forest, LightGBM, and several large language models (including Gemma and Gemini) in precision and recall. The open-source methodology shifts researcher focus from searching to verifying, and is adaptable to other conflict zones.

Bellingcat's XGBoost model efficiently filters Telegram posts for civilian harm, reducing search time significantly
The model uses feature engineering (keywords, emoji reactions, semantic similarity) and BERT embeddings

Google