2026-07-04 22:11 UTCIn-site rewrite6 min readUpdated: 2026-07-04 22:42 UTC

Can you train AI to invert AES?

Discusses the possibility of training AI to invert AES by treating it as a satisfiability problem, exploring both opaque and lowered circuit encodings, and considering complexity theory constraints like average-case hardness and phase transitions.

SourceHacker News AIAuthor: juansebastianl

04 — July 2026

Can you train AI to invert AES?

At the core of every https browser session, every wifi connection with WPA2/3, every hardware level device encryption module, sits the AES algorithm. This makes inverting the algorithm - the process of finding the private key or the clear text - one of the most simultaneously dangerous and valuable algorithmic problems today.

As AI eats software, I think it's possible that AI could solve the problem of AES Inversion. Of course, I dont know for sure, but this post lays out the argument for how it might happen. If you want to try and train a model to do what Im talking about, check out the RL env in this github repo.

Why might AES be vulnerable?

In order for a model to be able to solve AES you have to fight two problems - the sparsity of the reward, and complexity theory. Obviously one of these is more fundamental than the other, while a clever researcher can find intermediate rewards for almost any problem, superintelligence doesn't change the mathematical constraints imposed on us by complexity theory. I will address the sparse reward problem in the next section, but for now let us turn our attention to complexity theory.

I wont explain the basics of AES in the post, but if you want a good introduction to it, I recommend this one on the Braincoke blog. The important thing to understand about AES is that it's a substitution-permutation cipher, which means that it works by taking a block of bytes and permuting them according to a set of functions. The goal is to create a something that behaves like a one-way function, which are a hypothesized set of functions that are easy to compute but hard to invert - specifically any probabilistic polynomial time algorithm fails to invert them with probability $p$. They are ideally injections so that if you have a quick algorithm for inversion, you can recover the original text. This means that AES is built from many rounds of relatively simple permutations parameterized by the key(s).

Inverting AES involves inverting a large number of composed permutations and therefore is naturally a combinatorial problem. Now no matter how superintelligence pans out in the long run, no superintelligence can bypass mathematics or complexity theory. For us to believe that AES is efficiently invertible in the first place, I have to make an argument as to why I believe such an efficient inversion likely exists.

I will start by stating that AES can be stated as a satisfiability problem.

Choosing a particular mode of AES

Because AES can actually be run in different modes, we chose a concrete target of AES-XTS, which the mode used for full-disk encryption. It uses two independent AES-256 keys: $K_1$ encrypts the data and $K_2$ derives a per-block tweak. For the block at sector number $s$ and block index $j$. Here $\oplus$ is simply XOR. $\otimes$ is elementwise multiplication in a particular finite field. We first set up our per-block tweak:

$$T_0 = \mathrm{AES}_{K_2}\!\big(\mathrm{LE}_{128}(s)\big), \qquad T_j = T_0 \otimes \alpha^{\,j} \ \text{ in } \mathrm{GF}(2^{128}),$$

And using the per-block tweak we can encrypt our text:

$$C = \mathrm{AES}_{K_1}\!\big(P \oplus T_j\big) \oplus T_j,$$

where $\mathrm{LE}_{128}(s)$ is the sector number as a little-endian 128-bit block and $\alpha$ is a multiplicative generator in $\mathrm{GF}(2^{128})$. The data path is ordinary AES-256: an initial AddRoundKey, then $R-1$ rounds of SubBytes, ShiftRows, MixColumns and AddRoundKey, then a final round without MixColumns.

Inversion is the reverse. We observe the ciphertext $C$ and the public indices $s$ and $j$, and we want the plaintext $P$ (and, implicitly, the keys). To attack this with a solver we first have to write the cipher down as a concrete object the solver can pick apart, so before I state the constraints it's worth laying out the model itself.

AES as a circuit

We compile a window of XTS blocks into one flat circuit made of three kinds of record:

Values are the byte-sized quantities in play. Some are inputs we search over (the plaintext and the key bytes), some are constants (the observed ciphertext), and some are internal wires - the intermediate bytes that appear part-way through an AES computation.

Ops are the primitive operations that compute one value from others: an S-box lookup, a MixColumns byte, an XOR, the "multiply by $x$" step of the tweak chain, a round-key byte from the key schedule, and so on. The ops are just AES itself, chopped into individual steps.

Constraints are the predicates that ought to hold at a solution:"this wire equals the op that is supposed to produce it", "this predicted ciphertext byte equals the target". Each constraint is what later turns into a residual if you're doing Lagrangian optimization.

If you were writing a SAT solver to solve this problem, the state the solver actually mutates is just the buffers of plaintext, $K_1$, $K_2$, and (in one of the two encodings below) the wires.

The one real modelling decision is how finely we expose the cipher, and there are two choices that sit at opposite ends.

In the opaque encoding the whole block is a single monolithic op: hand it the plaintext and the keys, it runs the reference cipher end to end, and it returns the predicted ciphertext. There are no wires at all. The only constraints are that the predicted ciphertext equals the target (all 128 bits at once). This is compact and it is a faithful scorer, but as a search landscape it is a black box. The only things you can move are plaintext and key bytes, and because of AES's avalanche a single key-bit flip scrambles all sixteen output bytes unpredictably

there is no meaningful sense of being "partway" through the cipher, so local

search has nothing to hold onto.

In the lowered encoding we go towards intermediary rewards. Every AES micro-step becomes its own op writing its own internal wire, and every wire carries a consistency constraint asserting that the wire really does equal the op that defines it, and every ciphertext byte gets its own equality constraint against the target. These intermediate wires are the extra unknowns $W$.

The whole point of lowering is local structure. Instead of a black box whose only handles are the ~64 key and plaintext bytes, the search now has thousands of intermediate wires it can nudge one at a time, re-scoring only the handful of constraints each nudge touches. The price is that all those wires now have to be kept mutually consistent with one another - which is exactly what the consistency constraints measure, and what the solver has to drive to zero alongside everything else.

So we cast inversion as a constraint-satisfaction problem over the unknowns

$$x = \big(\,\underbrace{P}_{\text{plaintext}},\ \underbrace{K_1}_{\text{data key}},\ \underbrace{K_2}_{\text{tweak key}},\ \underbrace{W}_{\text{internal AES wires}}\,\big),$$

whose constraints fall into three classes:

Consistency (lowered encoding only) - every internal wire equals the op that produces it, i.e. the intermediate bytes really do form one valid AES computation.

Goal - the predicted ciphertext equals the observed $C$.

ASCII - Just to make the problem more tractable, we can constrain every plaintext byte to be text ASCII, which is not unreasonable if you have a good sense of what you're decrypting in the AES-XTS setting.

Each constraint $i$ becomes a non-negative residual $r_i(x) \ge 0$ that is zero exactly when the constraint is satisfied. Equality constraints are measured as a Hamming distance in bits rather than a flat right/wrong, so getting "closer" to the target (fewer wrong bits) lowers the residual and gives the solver an energy landscape to follow. The assignment is feasible when

$$H(x) = \sum_i r_i(x) = 0.$$

satisfiability is hard

Immediately if you're at all familiar with complexity theory, this should raise some red flags. The fact that inversion of AES is a satisfiability problem, and satisfiability problems are in general NP-hard, should make you wary of the idea of being able to find an efficient version. Ill make two observations:

The relevant complexity theoretic fact about about cyphers is not their worst-case complexity - It is their average-case complexity.

Any individual instance of a class of NP problems being worst-case hard is actually relatively rare

The security of AES is not based on the worst-case complexity class that it lives in. Instead it's based the fact that for any given key-plaintext pair we can be relatively certain that the inversion problem is quite hard. This is due to the fact that the average-case complexity of one-way functions is very high. However, Bogdanov and Trevisan (2005) and Akavia et. al. (2005) show that the average case complexity here is not drawn from the worst case complexity of the class of NP (assuming polynomial hierarchy). In general, it depends quite a bit on which complexity class you're working in, whether the worst-case complexity reduces down to the average-case complexity.

The second fact is more subtle but it's easiest to understand in the case of SAT problems. If you take a satisfiability problem with a given number of clauses $c$ for variables $n$, there is a ratio $q = c/n$ of variables to clauses. What's interesting is that as this ratio changes, two really important things happen: whenever the number of variables is high and the number of clauses is low the problem is relatively easy to satisfy because there are many degrees of freedom. Whenever the ratio is low, the problem is very hard to satisfy, often times impossible.

There's this intermediary point, which is typically described as a phase transition from satisfiable to unsatisfiable, where the problem becomes overdetermined in a way that makes it impossible to satisfy.

ratio q = c/n 2.00

P(satisfiable) 100%

median DPLL decisions 37

regime underconstrained — easy, satisfiable

±J Ising spin glass with bond frustration set by the hardness at q = 2.00. Near the transition the landscape is glassy and the spins never settle; far from it they order almost immediately.

The easy–hard–easy phase transition in random 3-SAT, measured with a DPLL solver on 100 random instances per ratio at n = 75 variables. Red curve: median branching decisions (normalized). Dashed curve: fraction satisfiable. Drag the slider (or the plot) to move the clause-to-variable ratio through the satisfiability crossover. How the spin-glass animation works // A 28x28 grid of "spins", each either +1 or -1. Every spin is // coupled to its four neighbors by a fixed random bond J that is // either +1 ("these two want to agree") or -1 ("these two want to // disagree"). The energy of the whole grid is // // E = -sum over neighbor pairs of J * spin_a * spin_b // // so a bond contributes low energy when it gets what it wants. const GRID = 28; const TEMPERATURE = 0.9;

const spin = new Int8Array(GRID * GRID); const couplingToRight = new Int8Array(GRID * GRID); const couplingToBelow = new Int8Array(GRID * GRID);

// The lattice wraps around at the edges (a torus), so every site // has exactly four neighbors. const wrap = (v) => (v + GRID) % GRID; const siteAt = (row, col) => wrap(row) * GRID + wrap(col);

function randomizeSpins() { for (let s = 0; s < spin.length; s++) { spin[s] = Math.random() < 0.5 ? -1 : 1; } }

// "Frustration" is the knob the slider controls. At 0 every bond is // +1 and the grid is a plain ferromagnet: flipping toward your // neighbors always helps, so it orders into big domains almost // immediately. As frustration grows, more bonds are -1 and loops // appear in which no assignment can satisfy every bond -- the // defining feature of a spin glass. The energy landscape shatters // into many competing local minima, which is the same picture as a // SAT instance near the phase transition. function rebuildCouplings(frustration) { const pAntiferro = 0.5 * frustration; for (let s = 0; s < spin.length; s++) { cou

[truncated for AI cost control]