AI News HubLIVE
站内改写6 min read

Why the Human Genome's Tangled Physicality May Confound AI

The human genome is not a simple blueprint or algorithm but a complex, dynamic three-dimensional structure that regulates gene expression through intricate mechanisms like transcription factors, enhancers, chromatin loops, and epigenetic modifications. This complexity challenges AI models that assume straightforward input-output relationships.

SourceHacker News AIAuthor: tzury

Read Later

Copied!

Comments

Read Later

explainers

Why the Human Genome’s Tangled Physicality May Confound AI

June 18, 2026

Our genetic heritage is not a blueprint or an algorithm, as many biologists have imagined, but something else entirely.

Read Later

Samuel Velasco and Hannah Waters/Quanta Magazine

Introduction

Since its molecular structure was deduced in the 1950s, DNA has been hailed by many biologists as the secret of life. They’ve read and studied the information stored in the DNA found in the cells of living organisms, known as their genomes, and claimed that this genetic database must be some kind of blueprint, code script, or computer. But if DNA really does harbor some greater secret about how life works, biologists have yet to find it.

In fact, the human genome is less a script than a puzzle that gets harder the closer they look. Knowing the entire sequence — the order of all 3 billion or so of our DNA’s chemical building blocks, nearly fully deduced by the international Human Genome Project between 1990 and 2003 — hasn’t helped much. That investigation showed that barely 2% of the human genome consists of actual genes, the information-coding sequences of DNA.

It’s now clear that understanding the human genome is no longer a matter of figuring out what each gene does. The deeper and much harder question is how those genes are used, or regulated, a question that seems to involve some and perhaps much of the rest of the genome. By switching suites of genes on and off, the many different cell types in our bodies can all be created from the same material. Cells also regulate their genes from moment to moment in response to a constant inflow of signals from their neighbors and surroundings. But the processes that govern gene regulation are proving so complex that some biologists wonder whether a full understanding of it — of how the genome really works — will ever be within the grasp of our puny minds.

Some are counting on outsourcing the analysis to artificial intelligence. Genomic “foundation models” such as Evo 2, Genos, and Google DeepMind’s AlphaGenome are trained on vast quantities of genomic data, which biologists use to make predictions about how differences in DNA sequence affect biological processes and ultimately the traits (including disease risk) of a whole organism. These algorithms don’t worry about the complicated regulatory stuff going on; all of that is supposedly subsumed by the algorithm’s “training,” through which it deduces correlations from cases we already know about.

This approach is likely to be useful, but for those who crave real understanding of how the genome, and ultimately life itself, works, a computational black box will never suffice. And perhaps more to the point, the genome might not submit to the kind of straightforward input-output approach that such AI models ultimately assume.

That’s because the genome is no blueprint or algorithm. It is something else.

The Old View

Given that it’s the product of around 4 billion years of evolution, perhaps it’s not surprising that our genome is complicated. The surprise has been what those complications are. “Our genome is not what we might make it if we sat down at the drawing board,” said the biologist Karen Adelman, who studies gene regulation at Harvard Medical School.

The traditional view posits that a small proportion of our DNA holds the code for making the protein molecules that orchestrate our cells’ chemistry. Each instruction for a protein is held in a corresponding gene — we have around 20,000 of these — and gene sequences can range in length from a couple of dozen to almost 3 million DNA “letters” (representing molecules called nucleotides). Making a protein from its gene is a two-stage affair. First the DNA is read, letter by letter, by an enzyme called a polymerase, which creates a copy of that code in a related molecule called messenger RNA (mRNA). This is called transcription. The mRNA is then read by a piece of molecular machinery called the ribosome, which constructs the protein — a process called translation. The proteins made by the ribosome then go off to do their jobs in making and sustaining the organism.

This picture is still more or less correct. But it turns out that “the genes are probably not the most interesting part of the genome,” Adelman said.

What matters more is how our genes, many of which we share with simpler organisms, are regulated: turned on and off. Which proteins a cell needs changes over time and according to cell type: muscle, brain, skin, and so on. How the genes that encode those proteins are regulated depends on some of the genome that doesn’t code for proteins.

Biologists have known about gene regulation, and the involvement of “noncoding” DNA, since the 1960s. But for many years, most of what they understood about this came from studies of simple organisms like bacteria, where the principles are generally straightforward. It has gradually become clear, though, that in complex eukaryotic organisms like us, gene regulation is far more complicated, involving overlapping systems of oversight and control, each with its own intricacies.

Transcription Factors

Transcription gets started by proteins called transcription factors, which are like the operations managers of gene regulation. These proteins stick to sections of DNA (typically close to the target gene) and recruit the polymerase enzyme to make an mRNA copy. In bacteria, transcription factors are rather like keys that fit the locks of unique binding sites on DNA. But that’s not how they work in complex organisms. In us, the logic of transcription factors is more difficult to parse.

For one thing, our transcription factors don’t show strong preferences for particular DNA binding sites. What’s more, they tend to work in pairs or groups. And a given transcription factor might have different effects in different contexts, such as activating gene transcription in one cell type but suppressing it in another, depending on which other transcription factors are around.

In bacteria, regulation tends to have an “OR” logic, Adelman said, whereby a particular signal turns a gene on or off: It’s either this or that. But in the human genome the logic is more like what computer scientists designate “AND.” Many signals are integrated to reach a regulatory decision: this and that and also that other thing. In this case, regulation can be more responsive to nuances of context, and the regulatory knobs are tunable rather than being just on/off. “This is part of the beauty” of our regulatory complexity, Adelman said.

When they interact with the genome, transcription factors bind to pieces of DNA called enhancers — which present a puzzle of their own.

Enhancers

Enhancers are gathering points for transcription factors, and they are thought to be the decisive influence on transcription: They deliver the “go” signal for a waiting polymerase to make an mRNA version of the DNA sequence. Seems simple enough, but mapping enhancers to their respective genes is far from straightforward. Our genome has hundreds of thousands, perhaps millions, of enhancers. That means we have many more of them than we have genes. Each gene might be influenced by many enhancers, and each enhancer might influence multiple genes.

“It’s embarrassing that 25 years after the Human Genome Project, we don’t know where all the enhancers are in the genome, let alone what they do when they act and which genes they control,” said Wendy Bickmore, a genome biologist at the University of Edinburgh.

Biologists do know that most enhancers won’t respond to a single transcription factor. Their activation “requires a cocktail,” Bickmore said. “That’s what gives [an enhancer] that exquisite specificity — because it’s only in a particular cell at a particular time that you have the right combination of factors to bind and activate that enhancer.”

Some enhancers are, as you’d expect, close to the genes they regulate, or even sit on DNA inside a gene. But others sit far away from the gene — perhaps millions of nucleotides away, with more genes in between.

The existence of such so-called “distal” enhancers “seems bonkers,” Bickmore said. “How do you get that information from over there to over here, to the gene that needs to be activated? That’s a largely unanswered question.”

One of the answers comes in the form of a loop.

Loops and Hubs

Distal enhancers are brought to the gene they regulate on great loops of DNA or, more strictly, of chromatin, the combination of DNA and its packaging proteins that are unraveled as if from a ball of wool. The loops are created by a protein motor called cohesin, which runs up and down the DNA strand and extrudes it as needed.

Once cohesin has formed a loop to bring elements together, what then? It was once thought that they then stick together or assemble into a molecular machine, but they don’t. Rather, the components appear to form a loose but dense blob in which they interact rather weakly, fleetingly, and indiscriminately — a sort of committee, sometimes called a condensate.

These transcription hubs are extremely fluid and differ from one cell to another. “There’ll be a bit of loop extrusion going on over here, in the next cell it might be over here, and the whole thing is turning over incredibly fast,” Bickmore said. Even if the cells are notionally identical — both skin cells, say — exactly what the gene-regulatory machinery is up to at any moment is never quite the same in any two of them.

Chromatin loops are just one reason why a gene’s transcription depends on the shape and structure of the chromatin around it.

Chromatin Shape

The textbook image of a chromosome — one of the 46 units into which our genomes are divided — is of a compact, X-shaped cluster of chromatin. But any time a cell is not actively dividing, its chromatin is unwound into what looks like a tangled mess. There is order to the chaos, however. Some parts of chromatin are densely packed into a form called heterochromatin. The compacted DNA there is relatively inaccessible to transcription factors; the genes it contains are typically silenced. Meanwhile, other parts are relatively loose, open, and accessible: This is called euchromatin.

There are special enzymes involved in packaging and repackaging chromatin, thereby controlling transcription. In other words, what matters is not just the encoded information in the DNA but also how it exists physically and dynamically in space. “We’ve stopped thinking about the genome as a linear piece of DNA code,” Bickmore said. “Thinking about this incredibly dynamic three-dimensional folding as absolutely inherent to regulation is a very exciting change.”

One aspect of this 3D organization is the clustering of segments of chromatin into compartments called topologically associating domains (TADs). Within a TAD, the genes seem to be coregulated: switched on or off in groups. Such groups keep suites of genes active or silent together to form and provide function in different cell types. Cohesin is also involved in the shuffling of chromatin to construct TADs — a dynamic process in which the chromatin is constantly rearranged in our cells.

Chromatin shape can also be influenced by chemical modifications called epigenetic marks: small molecules attached to DNA packaging proteins called histones or stuck directly to DNA. Some of these epigenetic modifications can alter the electrical charges on histones, which changes how the proteins attract or repel one another and so rejigs the chromatin packing. Epigenetic modifications to chromatin are like annotations of the DNA script that change its meaning in a given context. When cells divide, the epigenetic annotations are copied, too.

How and when the marks get added and changed, and what each type of mark means for gene activity, are complex questions with no simple answers. Some researchers talk of a

[truncated for AI cost control]