This article argues that despite fears, AI has not led to mass layoffs in software engineering. It presents evidence that layoffs attributed to AI are often financial in nature, and that AI compresses execution but not decision-making and delivery. The 'decide-execute-deliver sandwich' model explains why coding agents haven't displaced workers: the bottlenecks are deciding, verifying, and deep understanding.
AI-driven mass layoff stories are often 'AI washing' — layoffs are typically due to financial pressures.
Writing code is not the bottleneck; bottlenecks are deciding what to build, verifying delivery, and deep understanding.
Google claimed its AI agents built an entire operating system with a single prompt and about $900 in API costs, but this analysis highlights multiple issues: the prompt was actually thousands of lines long, the scaffold may be overfitted, and critical details like code, logs, and methodology are missing. The article underscores the need for independent evaluation and proposes norms for 'open-world evaluations'.
Google claims AI agents built an OS for $916, but the single prompt was actually thousands of lines
Unresolved issues include potential overfitting, code copying, and lack of transparency
Introducing CRUX, a collaborative project that conducts open-world evaluations—long, real-world tasks—to measure frontier AI capabilities. The first experiment shows an AI agent autonomously publishing an iOS app, highlighting both progress and risks like app store spam.
Open-world evaluations test AI on complex, real-world tasks beyond standard benchmarks.
CRUX is a collaboration of 17 researchers across sectors to regularly conduct such evaluations.
Researchers propose a framework to measure AI agent reliability, decomposing it into 12 dimensions across four categories. Testing 14 models over 18 months reveals rapid capability improvements but only modest reliability gains, calling for reliability-specific optimization.
Decomposes reliability into four dimensions: consistency, robustness, predictability, and safety, with 12 metrics.
Evaluated 14 models from OpenAI, Google, and Anthropic over 18 months; accuracy improved significantly but reliability modestly.
Applying the AI as Normal Technology framework to legal services, this article argues that advanced AI will not by default help consumers achieve desired legal outcomes at lower costs due to three bottlenecks: regulatory barriers, adversarial dynamics, and human involvement. It also discusses potential institutional reforms.
Three bottlenecks prevent AI from automatically reducing legal costs: regulatory barriers, adversarial dynamics, and human oversight.
Unauthorized practice of law (UPL) rules and entity regulations limit AI adoption.
Moravec's paradox has never been empirically tested; it's a selection effect from ignoring tasks that are either too easy or too hard for both humans and AI.
The evolutionary argument for the paradox is dubious; reasoning may not be a separate skill that can be easily automated in open domains.
This article explores the 'AI as Normal Technology' framework, contrasts it with AI 2027, addresses common confusions, and discusses the slow diffusion and adoption challenges of AI.
The normal technology framework emphasizes the causal chain from capability to impact, highlighting the importance of deployment over development.
Contrary to rapid adoption narratives, AI diffusion faces significant barriers including organizational change and user learning curves.
AI might slow scientific progress by exacerbating the production-progress paradox, introducing software errors, entrenching flawed theories, and undermining human understanding. The article calls for reforms in incentives, meta-science investment, and AI tool design.
Scientific paper output is soaring but actual progress is flat or slowing—the production-progress paradox.
AI could worsen this by encouraging low-quality output, amplifying software errors, and reinforcing prediction over understanding.
Artificial General Intelligence (AGI) is not a milestone because it does not represent a discontinuity in AI properties or impacts. AGI definitions are vague and unobservable, economic impacts take decades through diffusion, and risks stem from design choices rather than capabilities. Businesses and policymakers should focus on gradual diffusion rather than chasing AGI.
AGI has no clear definition and is not an actionable milestone.
Economic impacts of AI will take decades to materialize through diffusion.
A new paper argues that AI should be viewed as a normal technology, not as a superintelligent entity. It emphasizes slow adoption, gradual economic impact, and the importance of human control, contrasting with utopian/dystopian narratives.
AI is normal technology, not a superintelligent species.
Adoption and diffusion of AI happen over decades, not years.
The article examines the debate over whether AI capability progress is slowing. Authors argue model scaling isn't dead, insider predictions are unreliable, inference scaling has promise but limits, and capability gains weakly translate to real-world impact due to product and adoption lags.
Model scaling may not be over; the sudden narrative shift is driven by vested interests.
Inference scaling (e.g., o1) works well for coding/math but not for writing/translation.
An analysis of AI use in 2024 global elections reveals that over half of deepfakes lack deceptive intent, and most deceptive content can be cheaply replicated without AI. Misinformation spreads due to demand, not supply.
39 out of 78 cases of AI use in elections were non-deceptive.
Deceptive AI content could be recreated without AI at low cost.