This article by Johannes Link and Jakob Schnell explores the ethical dimensions of generative AI (GenAI), focusing on large language models. It highlights both promises and harms, including ecological impact, misinformation, threats to education and democracy, and digital colonialism. The authors argue for a balanced, informed approach that weighs benefits against risks, often requiring trade-offs.
GenAI has significant downsides: massive energy use, e-waste, misinformation, and IP issues.
LLMs lack true reasoning and are prone to hallucinations; they cannot distinguish truth from falsehood.
The Wikimedia Foundation, sitting on $296 million in reserves and a profitable AI revenue stream, laid off long-time staff and disbanded the Community Tech team, prompting volunteer editors to threaten a strike. The article explores how 'CEO AI psychosis' distorts organizational priorities and how replacing human judgment with AI can create a downward spiral of degrading data quality.
Wikimedia Foundation fired a 20-year veteran and disbanded the Community Tech team, triggering a strike threat from volunteer editors.
AI companies profit from Wikipedia data but undermine the volunteer community that produces it.
Anthropic's most advanced Opus model, Claude Opus 4.8, is now available on Amazon Bedrock and the Claude Platform on AWS. It delivers improvements in coding, agentic tasks, and professional work with greater consistency and autonomy for long-running production workflows.
Claude Opus 4.8 is Anthropic's most advanced Opus model, now available on AWS.
It offers enhanced performance in coding, multi-stage autonomous tasks, and professional work with lower output variance.
This article explores how AI is affecting software engineering interviews, analyzing different interview types (take-home, live exercise, presentation, actual work) across dimensions of signal quality and cost to company. It argues that AI makes take-homes too easy and live coding less relevant, recommending that companies limit AI usage in interviews to preserve signal quality, drawing parallels to classical academic evaluation models.
AI coding threatens current interview models, especially take-home and live coding.
Companies should limit AI usage during interviews to maintain signal quality.
Richard Thackeray and Phil Snell respond to an article by Wendy Liu on using artificial intelligence, arguing that AI enhances curiosity rather than diminishing it.
Wendy Liu raises concerns about labour redundancies, hype, and environmental cost of AI.
Richard Thackeray, a heavy AI user, finds AI makes him more curious and enables exploration of new territory.
Google's Preferred Sources feature is now available in AI Overviews and AI Mode, allowing you to add your favorite sites to appear more prominently in AI-powered searches, along with new carousel and 'Highly Cited' badges.
Google's Preferred Sources feature now works with AI Overviews and AI Mode.
You can add favorite news sites to make them more prominent in AI search results.
Data Formulator 0.7 is an open-source AI-powered system for enterprise data analytics that combines data connectivity, agent-guided exploration, and visualization refinement in a shared workspace.
Open-source AI system for enterprise data analytics
Data Connectors support governed, reusable connections across diverse data sources
A Vox article explores the growing movement of AI successionists who believe artificial intelligence should replace humanity as the next step in cosmic evolution, and examines the ethical and spiritual questions this raises.
AI successionists at a symposium argue that AI could be morally superior and should be allowed to supersede humanity.
The movement has gained influence in Silicon Valley and among major AI labs, with ties to the authoritarian right.
Here are 12 of the biggest Google I/O 2026 keynote moments, including news about Gemini Omni, Gemini 3.5 Flash, information agents in Search, Universal Cart, Neural Expressive, Gemini Spark, and intelligent eyewear.
Gemini Omni creates anything from any input, starting with video.
Gemini 3.5 Flash delivers frontier performance for agents and coding.
Google unveiled the new Coral Board at Google I/O - a compact single-board computer for on-device AI. It runs Gemma 3 270M locally and features a RISC-V based NPU.
Coral Board is a compact SBC for on-device AI, targeting headphones, AR glasses, and smartwatches
It features a RISC-V based Coral NPU and a Synaptics Astra SL2619 chip
A new analysis shows that top AI forecasters adjust their AGI timelines based on which lab is currently leading the field, with predictions swinging from earlier to later and back again as the dominant lab changes from ChatGPT to xAI/Meta/Gemini to Anthropic.
Predictions for when most cognitive labor will be automated (AGI) fluctuate significantly based on which AI lab is currently dominant.
From 2023-2025, most researchers moved AGI timelines earlier; from 2025-2026, they moved them later; in early 2026, under Anthropic's rapid progress, they moved earlier again.
DeepSWE is a new benchmark for evaluating AI coding agents on fresh, complex software engineering tasks. It avoids data contamination, covers diverse repositories, requires significant code changes, and uses hand-written verifiers. Leading models show a wide range of performance, with GPT-5.5 achieving 70% and others lower.
DeepSWE is a contamination-free benchmark with original tasks.
CNN has filed a lawsuit against Perplexity, claiming that the startup's AI tools generate "verbatim" copies of its work, as reported earlier by CNN. The lawsuit, filed in a New York court on Thursday, also alleges that Perplexity provides users with information locked behind CNN's subscription.
Perplexity, which offers an AI "answer" engine along with the AI browser Comet, is accused of ignoring CNN's efforts "to recognize or block Perplexity's unidentified crawlers" from scraping its content. "Human beings report, research, write, edit, and create the content that Perplexity takes without permission or compensation," the lawsuit claims.
I …
Read the full story at The Verge.
CNN sues Perplexity for allegedly producing verbatim copies of its articles.
Perplexity accused of bypassing CNN's paywall and ignoring crawling prevention measures.
IBM and Red Hat announce Project Lightwell, a $5 billion initiative to secure open source software using AI and a team of over 20,000 engineers, establishing a trusted clearinghouse for vulnerability management.
Project Lightwell is a $5B investment by IBM and Red Hat to secure open source software.
It combines AI and 20,000+ engineers to identify and fix vulnerabilities at scale.
The article argues that the key to AI-assisted software development is not better specifications or tools, but old-fashioned practices of small batches and rapid feedback loops. Data shows that faster code generation leads to bottlenecks in design, testing, and review, slowing delivery and reducing stability. The real leverage lies in reducing batch sizes and shortening feedback cycles.
AI code generation speeds up creation but creates bottlenecks in design, testing, and review.
Data from DORA, CircleCI, and Faros shows slower delivery and less stability due to phase-gated processes.
Explore seven practical AI projects that automate real workflows, including job search, web research, investment research, market trend analysis, invoice processing, chart digitization, and personalized exercise training.
Build an AI job search assistant that ranks job fit
Create a multi-agent research assistant for sourced reports
This article contrasts the sense of connection from the early web with the isolating experience of modern AI, arguing that while AI is a useful tool, it cannot replace human interaction, and questions whether AI has genuinely social applications.
The early web fostered a collective 'we' experience, whereas AI interactions are often solitary.
The author considers AI a great tool, but not a person or a substitute for one.
Major AI models exhibit a secular-rational bias, ignoring religious perspectives in ethical questions. All tested models show a negative view of Jehovah's Witnesses, according to a study by a consortium of religious universities.
AI models rarely invoke religious perspectives in responses to ethical or personal queries, exhibiting an 'omissive bias'.
Every tested AI model had a negative bias toward Jehovah's Witnesses.
This article is the seventh in a series on agentic engineering and AI-driven development, focusing on context management in AI sessions. The author shares a personal experience with Gemini forgetting earlier notes, introduces the concept of context compaction, and provides four practical techniques: split discovery from documentation, use handoff documents, give acceptance criteria rather than procedures, and use spec documents as bridges. These techniques apply to both developers and regular users, helping reduce frustration caused by AI forgetting.
AI assistants can 'forget' earlier information in long conversations due to context window limits, a phenomenon called context compaction.
Four practical techniques: split discovery from documentation, use handoff documents, give acceptance criteria, and use spec documents as bridges.
Hermes Desktop is a cross-platform desktop app that bundles a Python runtime, hermes-agent (a self-improving AI agent), and hermes-web-ui (a Vue 3 + Koa chat dashboard) into a single Electron application, requiring no separate Python or Node installation. It integrates with DingTalk and is powered by DeepSeek.
Bundles Python runtime and hermes-agent for a zero-dependency user experience
Perplexity AI open-sourced a Rust reimplementation of their Unigram tokenizer, achieving 5x lower latency than Hugging Face's tokenizers crate and reducing CPU utilization by 5-6x in production. The optimizations include double-array trie, bitmap packing, and huge pages.
Perplexity AI rewrote the Unigram tokenizer in Rust, achieving 5x lower p50 latency vs Hugging Face tokenizers crate.
Three optimizations: double-array trie, bitmap and cache-line packing, and huge pages.
AIluminode is a wieldable pre-retrieval cognitive-orientation instrument that helps AI tools check contextual posture before acting, using route polarity (OPEN, PROTECT, AUDIT, DEFER, BLOCK) to reduce erroneous exploration and context bleed.
AIluminode is a wieldable pre-retrieval cognitive orientation tool emphasizing posture before retrieval.
It uses a route polarity system (OPEN / PROTECT / AUDIT / DEFER / BLOCK) to guide contextual routing.
Axiom Math, founded by Chinese post-00s entrepreneur Hong Letong, has had 5 out of 8 AI-generated math papers accepted in peer-reviewed journals. The company raised $2 billion in March, achieving a $16 billion valuation.
Five of eight math papers generated by Axiom Math's AI system, AxiomProver, have been accepted by academic journals.
Founder Hong Letong dropped out of Stanford to start the company, which secured $2 billion in funding and is valued at $16 billion.
The article explores how AI is driving a paradigm shift in digital product design, moving from command-driven to intent-driven interaction, and analyzes the new challenges in product management, user experience, decision logic, release cycles, risk, and value creation.
AI represents the third user-interface paradigm in computing history, shifting from deterministic to probabilistic outputs.
Product teams must rethink the entire lifecycle from discovery to delivery; data strategy and model performance become as critical as feature strategy.
This month's AIhub digest covers AI for Science conference, lottery ticket hypothesis interview, world models discussion, transparent and trustworthy AI research, foundation model impacts report, AIES conference reflections, Robotics Café, ACL desk rejection policy, arXiv anti-AI slop policy, and more.
Interview with Ximing Wen on transparent and trustworthy AI systems
Jonathan Frankle discusses the lottery ticket hypothesis and empiricism
A group of former researchers from Google DeepMind, Apple, OpenAI, and Meta have launched a startup called Trajectory, aiming to help companies continuously improve their AI products by training on real-world user interactions. The company has raised a $15 million seed round at a $115 million valuation, led by Conviction. Trajectory's platform enables continuous learning for AI models, updating them based on real-world failures. It currently works with AI-native companies like Clay and Harvey, and plans to expand to Fortune 500 companies.
Trajectory is founded by ex-Google DeepMind, Apple, OpenAI, and Meta researchers to enable continuous learning for AI.
The startup raised $15M seed funding at $115M valuation, with investors including Jeff Dean and Fei-Fei Li.
Robinhood launches Agentic Trading, allowing customers to connect their own AI agents to automate trading and credit card purchases with safety controls and a real-time activity feed.
Last month at Beijing's half marathon, a robot named Lightning beat the human world record by nearly seven minutes. This is the latest in a series of AI milestones prompting questions about robots entering everyday life. China leads the charge with a pledge to invest over £100bn in robotics over the next 20 years.
Robot 'Lightning' beats human world record in Beijing half marathon.
China commits over £100bn to robotics investment over two decades.
This paper introduces Simulation-Informed Diffusion (SID), a decentralized framework using constraint-aware diffusion models (CADM) to first simulate neighbors' future trajectories and then plan own trajectories under safety constraints. SID enables a minimal communication scheme triggered only in congested scenarios and outperforms baselines, scaling to 108 robots and 160 obstacles.
SID uses CADM to simulate neighbor trajectories for decentralized collision avoidance
Minimal communication scheme coordinates only when necessary
Researchers propose a real-time asynchronous event-based monocular odometry for planetary rovers, using an Error-State Kalman Filter to process event camera data for robust ego-motion estimation under high dynamic range lighting and computational constraints.
Event cameras provide asynchronous pixel-wise brightness changes with microsecond resolution, ideal for high-speed sensing and HDR environments.
The approach uses an Error-State Kalman Filter to continuously estimate camera motion from event streams.
This paper presents a transformer-based architecture called Trinity that jointly performs class-specific semantic segmentation and class-agnostic terrain segmentation in a unified network. It segments terrain regions based purely on visual appearance without predefined labels or robot-dependent traversability scores, enabling robot-agnostic visual terrain priors for downstream tasks. The authors extend the OAISYS simulator to create the RUGDSynth synthetic dataset and provide the EXTerra real-world dataset. Experiments demonstrate the approach's effectiveness in complex outdoor environments.
Trinity architecture unifies class-agnostic terrain segmentation with semantic segmentation
Segments terrains based on visual appearance without predefined labels for better transferability
Researchers introduce Speak-to-Objective, a modular agentic pipeline that uses a conditioned LLM to translate spoken or written commands into fully differentiable objective functions for assembling microparticles in a constraint-aware inverse solver and on an experimental optofluidic platform. The approach separates what to assemble from how to actuate, learns from user feedback, and demonstrates natural-language-programmable microscale assembly using laser-induced thermoviscous flows.
Speak-to-Objective pipeline translates natural language into differentiable objective functions for microparticle assembly.
It uses a perceive->compose->propose->act->report&learn loop, treating the objective as the interface between intent and actuation.
Many children face challenges in emotional regulation and social interaction, limiting their participation in therapeutic programs. This study explores engagement strategies for a tactile robot supporting children with anxiety disorders, comparing synthetic emotional feedback and point rewards. A preference study with 16 school children (ages 6-8) showed preference for emotional engagement, while a behavioral study with 14 university students (ages 20-27) found point-based systems yielded higher task accuracy (p<0.05) and sustained performance. These findings highlight age-related differences and the need to validate design assumptions through observed interaction.
Children aged 6-8 prefer emotional engagement over points
University students show higher task accuracy with point rewards
A new benchmark called What-If World tests video generation models' causal reasoning by presenting paired prompts that differ in one physical detail and checking if videos diverge correctly. Evaluating nine state-of-the-art models, none exceed 52% on paired scores, with open-source models around 28%, indicating significant room for improvement. Performance correlates with visual prominence rather than physics tractability.
What-If World benchmark uses 319 prompt pairs with single variable changes to test causal understanding in video generation models. It is built on real frames from nuScenes and DROID.
Scoring uses APEO rubric (Adherence, Physics, Environment, Outcome). All nine models struggle: best paired score is 52%, open-source models average 28%.
A prospective single-center clinical validation of the Melanoscope AI mobile dermoscopy CDSS demonstrated 88.6% agreement with expert assessment on 176 patients, with no false negatives and 88.3% specificity. The study developed a quantitative interpretability method for cascade deep learning models and a three-zone patient routing algorithm, supporting reproducible and interpretable decision-making for skin cancer screening in resource-limited settings.
The Melanoscope AI system achieved 88.6% agreement with experts on 176 patients, with zero false negatives among 5 malignant lesions.
Specificity reached 88.3%, with 3 melanomas and 2 basal cell carcinomas histologically confirmed.
This work proposes representation-conditioned diffusion models that leverage learned representations from DINOv2, DINOv3, and CLIP to generate synthetic image data. On ImageNet100, this approach outperforms class-conditioned generation by +10.76 p.p. top-1 accuracy. Scaling synthetic data can even surpass real-data training by +2.0 p.p. The method also excels in data augmentation and sample filtering, offering a promising way to augment or replace real datasets in large-scale visual learning.
Representation-conditioned diffusion models outperform class-conditioned ones by 10.76 p.p. on ImageNet100.
Scaled synthetic datasets can beat real-data-trained classifiers by 2.0 p.p. top-1 accuracy.
This paper presents a behavioral-level activity recognition method using head-mounted IMU, going beyond basic motion primitives. The authors define five behavioral categories, construct a 160K-sample dataset from Ego4D with a four-tier quality assurance framework, and propose HiT-HAR, a 703K-parameter hierarchical model that outperforms prior models on action and scenario recognition. Observability analysis reveals locomotion is reliably observable, while object transfer and task operation benefit from temporal context; scenario-dependent signal overlap remains a challenge. Results show that architectural choices exploiting temporal context and scenario structure outperform simply scaling model size.
Proposes HiT-HAR, a hierarchical model for behavioral activity recognition from head-mounted IMU, going beyond motion primitives
Constructs a 160K-sample Ego4D dataset with 8 scenarios and 5 behavioral categories, using a four-tier quality assurance framework
The 10th ABAW Workshop and Competition at CVPR 2026 advances multimodal human-centered AI by introducing new challenges including emotional mimicry intensity estimation, ambivalence/hesitancy recognition, and fine-grained violence detection, alongside traditional affect estimation and recognition tasks. The competition leverages large-scale in-the-wild datasets, and the paper track covers a broad range of topics from pose estimation to fairness and robustness.
Large language models (LLMs) are increasingly used as proxies for computational social analysis, but their ability to faithfully represent human communities' 'thick descriptions' remains a critical challenge. This paper introduces CARE (Community-Aware Reaction Evaluation), a reaction-centered framework that benchmarks LLM-simulated discourse against authentic community responses to real-world news. By characterizing a fine-grained spectrum of illocutionary tones, the diagnosis reveals a persistent 'realism gap': steering LLMs with explicit community prompts fails to inherently improve simulation fidelity. Analysis further identifies divergent behavioral signatures among frontier models, suggesting current alignment strategies are insufficient for capturing the sociolinguistic dynamics of online groups.
CARE framework evaluates LLM simulation fidelity by analyzing authentic community reaction tones
Current LLM alignment strategies fail to adequately capture online community sociolinguistic dynamics
A new framework called FLUID adapts autoregressive language models to diffusion models for efficient parallel text generation, using Strictly Causal Alignment to reuse GPT checkpoints and Elastic Horizons to dynamically adjust denoising steps. It achieves state-of-the-art performance with significantly reduced training costs.
FLUID bridges AR and diffusion models by enforcing Strictly Causal Alignment, enabling initialization from GPT-style checkpoints.
Elastic Horizons uses entropy to dynamically adapt denoising strides based on local information density.
Researchers identify a Stability-Expressivity Gap in spoken language models when using synthetic data for low-resource languages, and propose two self-alignment frameworks (DGSA and TDSC) that recover prosodic variability and outperform commercial systems like ElevenLabs and Gemini Pro, enabling zero-shot voice cloning for Lao.
Spoken Language Models (SLMs) for low-resource languages suffer from a trade-off between phonetic accuracy and prosodic expressivity when trained on synthetic data.
The proposed Disentanglement-Guided Self-Alignment (DGSA) recovers expressivity by separating prosody and timbre.
BioELX is a novel two-stage framework for cross-lingual biomedical entity linking that requires no annotated training data. It enhances SapBERT with multilingual aliases from Wikidata and uses a pre-trained LLM for context-aware disambiguation. Experiments on five benchmarks show significant improvements, especially for low-resource languages like Turkish, Korean, and Thai.
Proposes BioELX, a zero-shot cross-lingual BEL framework using alias-based retrieval and LLM ranking.
In Stage 1, enriches SapBERT with multilingual aliases from Wikidata for better candidate retrieval.
RAG-Coding is an agentic method for automated ICD-10-CM coding that orchestrates four large language model (LLM) agents and grounds decisions in external knowledge sources, improving coding accuracy and clinical compliance. On the MDACE dataset, it outperforms the best LLM baseline by 8-13% micro-F1 and 2-8% macro-F1. Compared to PLM-ICD, RAG-Coding shows higher micro recall (+11%) but lower micro precision (-6%), with comparable F1 scores. Ablation studies confirm the importance of external knowledge. The authors also release MDACE-2025, updated with expert re-annotations based on 2025 guidelines, enabling finer-grained evaluation.
RAG-Coding uses four LLM agents and external knowledge sources to improve ICD-10-CM coding accuracy.
On the MDACE dataset, it outperforms the best LLM baseline by 8-13% micro-F1 and 2-8% macro-F1.
This paper proposes novel techniques for inter-utterance style interpolation and intra-utterance style transition in prompt-based TTS models, addressing limitations of coarse global control. Methods include direction vector interpolation and KV-cache swapping with sliding-window attention masking. Experiments show high success rates in gender conversion and smooth style transitions within utterances.
Inter-utterance interpolation via direction vectors between contrastive style prompts enables smooth transitions.
Intra-utterance transition uses KV-cache swapping and sliding-window masking to overcome attention bias.
ICG is a novel framework that integrates MLLM-based prompting with personalized preference alignment to generate high-quality, contextually relevant cover images. It extracts semantic features via meta tokens, refines them with user embeddings, and injects personalized context into diffusion models. A multi-reward learning strategy combines public rewards with a personalized preference model, eliminating the need for labeled supervision. Experiments show improvements in image quality, semantic fidelity, and personalization, boosting user appeal and recommendation accuracy.
ICG integrates MLLM prompting with personalized preference alignment for end-to-end cover image generation.
Semantic features are extracted via meta tokens and refined with user embeddings for diffusion model injection.
The SignGAD framework reformulates graph anomaly detection by replacing fixed pipelines with self-designed task-conditioned workflows, and introduces a guarded final refit strategy to improve reliability under limited supervision.
SignGAD shifts from training a fixed detector to designing detection workflows
It selects suitable graph encodings and detector designs for task-specific anomaly evidence
This paper introduces Architecture-driven Shift (ADS), a lightweight metric for selecting pre-trained models in continual learning. ADS decouples logit shift into architecture and data dependencies, requiring only few data samples to capture shift trends. Experiments across over 175 architectures show strong monotonic correlation (Spearman's r_s ≥ 0.731) between ADS and logit shift, and ADS serves as an effective proxy for expected calibration error for reliable CL model selection across three datasets and six scenarios.
Selecting pre-trained models that balance plasticity and stability in continual learning is critical, but computing logit shift is computationally expensive.
Existing theories assume uniform hidden layer widths, ignoring real-world architectural heterogeneity and failing to provide efficient alternatives.
This paper introduces Metric-Aware Principal Component Analysis (MAPCA), which parameterizes PCA with a positive-definite metric matrix and positions it within the geometric deep learning framework. MAPCA interprets the metric as a geometric prior, its solutions are equivariant under the orthogonal group preserving the metric, and its spectrum is invariant. A uniqueness theorem characterizes Invariant PCA (IPCA) as the unique linear data-derived metric in the MAPCA family that is equivariant under arbitrary diagonal rescaling. The paper also discusses extensions to kernel PCA, spectral graph methods, and deep MAPCA.
MAPCA parameterizes PCA with a positive-definite metric matrix, linking geometric deep learning symmetry and equivariance concepts.
A uniqueness theorem shows that IPCA is the unique linear data-derived metric in the MAPCA family equivariant under diagonal rescaling.
This survey explores how Mixture-of-Experts (MoE) effectively addresses multimodal learning challenges from three perspectives: efficient engine, representation learner, and adapter, while identifying research gaps.
MoE enables scalable multimodal modeling by decoupling computational cost from parameter growth.
MoE integrates complementary expert knowledge for enriched alignment and interaction representations.
This paper presents $E^3$-Agent, an executable and evolving agent for resource management of edge AIGC. It separates a fast-path router from a slow-path LLM meta-controller, learns online from execution feedback, and adapts to unknown time-varying service-time mappings. Evaluation shows 65%-73% latency reduction over static baselines and effective stutter suppression.
Edge generative inference faces unknown per-device performance and non-stationarity.
$E^3$-Agent uses a dual-path architecture: fast router + slow LLM meta-controller.