AI News HubLIVE
站内改写6 分鐘閱讀

待翻譯:Import AI 459: AI oversight is difficult; scaling laws for protein folding models; and pricing the extinction risk of AI systems

AI 服務暫時不可用,以下為來源摘要,待恢復後補全翻譯:Do you feel as though you are living in a revolution?

來源Import AI作者: Jack Clark

AI 服務暫時不可用,以下為來源正文,待恢復後補全翻譯。

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv, cappuccinos, and feedback from readers. If you’d like to support this, please subscribe. Subscribe now The AI economy in the US is growing at 2,000% a year: …The more directly you measure the AI economy, the weirder and more unprecedented it seems to get… Economists with the University of Virginia* and Anthropic, and the Bank of Canada have written a paper outlining both the tremendous growth of the emerging “AI economy” in the US, and wrestling with why this growth is hard to see in aggregate GDP statistics. “The AI economy in the United States has been growing at an unprecedented rate, but this extraordinary growth is largely invisible in conventional GDP statistics,” they write. “Treating the AI sector as a coherent economic entity yields preliminary estimates of nominal AI GDP at approximately $250 billion in 2025, growing at roughly 2,600 percent per year in quality-adjusted real terms.” Why it’s hard to see: There are a couple of factors here - one is that though the datacenter building boom is large it still isn’t quite large enough to uplift GDP significantly. By comparison, where the majority of AI’s economic impact is taking place is in AI inference - the usage of AI’s systems - but there are confounding factors here as it relates to GDP measurement: “Nominal AI revenues grow only moderately because per-unit prices for any given level of AI capability fall almost as fast as quality-adjusted output rises,” they write. If we can’t measure this, we might end up surprised in a way that’s hard to recover from: “AI is the latest in a series of fast-moving technologies that have raised measurement concerns; semiconductors and the internet generated similar debates in their time,” they write. But a key difference is that AI as a technology might have a far bigger impact on labor than these other technologies. “In the prior episodes, the rapidly improving technology was a complement to human labor at the aggregate level,” they write. “AI is the first plausible candidate for large-scale technological mismeasurement in which the rapidly improving sector may become a substitute for human labor”. Three ways of measuring the AI economy: Nominal compute spending: US compute spending rose from $37 billion in 2023 to $90 billion in 2024 to $219 billion in 2025. Raw compute capacity: Due to efficiencies in newer chips, actual capacity grows even faster than spending: “US AI computing capacity grew at more than 200 percent per year”. Quality-adjusted AI output: If you factor in algorithmic progress via inference prices at fixed benchmark performance as well as assumptions about how much cheaper it is getting to train models, then things become even more dramatic: “these efficiency gains imply that quality-adjusted AI output grew at roughly 2,290 percent in 2024 and 2,271 percent in 2025”. The AI economy is much, much larger than normal measures suggest: “Conventional statistics show a sector growing slowly in nominal terms; our measures show one whose underlying capacity is more than doubling annually. A finance ministry running ten-year revenue projections off the conventional data will materially underweight the probability of a labor-tax-base shock—and will be correspondingly unprepared to design responses such as tax system reforms, sovereign wealth funds, or other benefit-sharing schemes that such a shock may call for. A windfall that cannot be seen cannot be shared.” Three recommendations: The authors have three ideas for how we can solve this measurement challenge and better position ourselves to see the true shape of the Ai economy. AI satellite accounts: Statistical agencies should develop “AI satellite accounts” that develop measures (e.g, nominal compute spending), which can help inform overall GDP calculations. Generate better data: Partner between statistical agencies, companies, and academia to generate better primary data, like the allocation between training and inference compute. Factor into projections: Policymakers should incorporate AI productive-capacity measurements into their medium-term economic projections. Why this matters - shut up and play the Jaws theme tune: In the great film Jaws there’s this scene where the shark is in the water and some very tense music plays indicating that the shark is approaching. You, the audience member, find yourself practically jumping out of your seat wanting to yell THERE’S A GOD DAMN SHARK IN THE WATER WHAT ARE YOU DOING IN THERE? That’s what it feels like working on AI and staring at most economic data right now: the vast majority of economic data says there’s nothing especially unusual about today’s economy (in fact, things look rather good in the US - low unemployment, decent growth, etc). But the intuitions of everyone working within AI - including me - is it’s impossible to reconcile the capabilities of the technology and how it is being used with the economy staying normal. In this tortured metaphor, the shark is the “true shape of the AI economy”, and the rest of the people in the film are the general consensus economist and policy community. Anton here might be the audience member, writing a paper that describes the possibility of a shark beneath the surface. Look out, everyone! Read more: Where is AI in GDP statistics? (PIIE). *Disclaimer: Though one of the authors, Anton Korinek, is affiliated with Anthropic, this research was done mostly prior to him joining and outside his work at the company. * Here’s why making AI safe with AI oversight is harder than you think: …Automated alignment research is not a silver bullet… Many researchers in AI safety think the best way to build smarter-than-human machines safely is to have AI systems supervise some of the training process. Researchers with the UK AI Security Institute have written a paper outlining why though this is a tempting idea it is harder than people suspect. Why is automated alignment research hard? “Errors in automated alignment research are likely to be harder to identify than the human baseline,” they write. There are a few reasons for this, including: Optimization pressure: AI research is optimized for human approval. Alien mistakes: When agents make mistakes, they’re un-intuitive to humans. More correlated research: Many more things are shared than with human-generated research. Research volume: The kinds of safety determinations made by automated systems might use far more sets of evidence with far more interactions than human-generated research. Non-human-evaluable arguments: Alignment solutions may rely on arguments that humans are unable to follow. What can we do? They suggest a few interventions that could improve the state of affairs: Measurement: - Recreate completed research projects: Take logs at arbitrary cutoff points from successful projects and see how well an agent can continue with the research project. - Test agent prediction performance over datasets of correlated-events: See how well agents can correctly combine correlated subtasks. - Empirical studies of optimal human-agent team structure: See how well teams of non-expert humans can solve completed projects with the assistance of agents. Generalization: - Simulated generalisation experiments: Test different training proxies using agent performance on completed research problems beyond the knowledge cutoff. - Mechanistic understanding of generalisation: Use whitebox methods such as mechanistic interpretability. Scalable oversight: - Compactification of research paper corpus: Try to produce a small number of research outputs which are based on a much larger underlying research corpus. - Develop and test new scalable oversight protocols: Research scalable oversight techniques that deal with correlated uncertainty. - Test different human scaffolds for uplifting non-expert performance on fuzzy tasks. - Red team automated alignment programs: “The red team prompts an agent to hide errors in a research paper corpus and the blue team attempts to catch these errors with agent assistance”. Why this matters - who controls the future? Whether we are able to supervise smarter-than-human systems is fundamentally a question about who controls the future. If we don’t build techniques that work, then humans will take a backseat, either due to misalignment of these systems or gradual disempowerment as they proceed to out-think us. If we can build smarter-than-human oversight techniques, then we have a better chance of being able to make choices about the future nature of existence. Read more: Automated alignment is harder than you think (arXiv). * 100 Million permissively licensed images: …A nice resource for academics and startups… Researchers with Stanford University, Radical Numerics, the University of Michigan,and Salesforce Research, have released the Giant Permissive Image Corpus (GPIC), a dataset of 100M images with accompanying captions. The key thing about GPIC is that “all GPIC images are permissively licensed for both research and commercial use,” they write. “GPIC is safety-filtered, deduplicated, and centrally hosted on HuggingFace”. More details on the dataset: GPIC consists of 100M training images, 200k validation, and 1M test examples. Each image was captioned with Qwen3-VL-4B. “GPIC is centrally hosted on Hugging Face as 8,000 shards, providing stable and accessible infrastructure for large-scale training,” they write. “We source images from Flickr and Wikimedia, restricting the source pool to CC BY, CC0, Public Domain, and No-Known-Restrictions categories. This licensing criterion ensures that GPIC can be used by both academic and industrial researchers without restricting the release or downstream use of derived artifacts.” Why this matters - fuel for research: Datasets like GPIC are very useful for academics and startups alike and are basically the equivalent of free, clean vegetables. If someone offers you a free, clean vegetable you should probably take it and say thank you. Read the research paper: GPIC: A Giant Permissive Image Corpus for Visual Generation (arXiv). Find out more at the website: GPIC: A Giant Permissive Image Corpus for Visual Generation (official project website). Get the dataset here: GPIC (Hugging Face). *** Improving cancer research with protein prediction models: …Biohub is an example of positive-sum competition among AI developers… Biohub, a research organization founded by Priscilla Chan and Mark Zuckerberg, has released a rival model to DeepMind’s AlphaFold, intensifying a positive-sum race between two technology groups to develop better AI systems for expanding the capabilities of biologists worldwide. The model, ESMFold2, is a “world model of protein biology: a scientific engine for prediction, design, and discovery that can map proteins across the tree of life, predict their structures, and design new protein binders that function in laboratory experiments.” What it consists of: The release contains three parts: ESMC: A “language model that represents proteins, trained on approximately 2.8 billion sequences drawn from across all of life.” ESMFold2: A “design engine built to transform ESMC’s sequence representations into atomically-resolved 3D structure of biomolecular complexes.” According to benchmarks, ESMFold2 outperforms AlphaFold 3, though in some areas their performance is tied. ESM Atlas: “Makes ESMC’s representations navigable across 6.8 billion protein sequences and 1.1 billion predicted structures — the largest application of AI to protein biology to date.” Cancer test: In one experiment, Biohub researchers used the ESM tools “to design protein binders against five targets at the center of cancer and immunology research — EGFR and PDGFRβ (implicated in tumor growth), PD-L1 and CTLA-4 (immune checkpoints that cancer cells exploit to evade detection), and CD45 (a regulator of immune cell si [truncated for AI cost control]