2026-06-06 05:00 UTCIn-site rewrite7 min readUpdated: 2026-06-30 13:03 UTC

Jürgen Schmidhuber: World Models, RL and Year That Changed AI

In this interview, AI pioneer Jürgen Schmidhuber reflects on the 1991 breakthroughs in his Munich lab, discussing world models, reinforcement learning, artificial curiosity, and the history of deep learning. He contrasts LLMs with RL for decision-making and shares insights on chess AI and the future of artificial intelligence.

SourceHacker News AIAuthor: __patchbit__

Ravid Shwartz-Ziv: Everyone, Ravid here. Quick note before we start. We had so much fun talking with Jorgen that we decided to do another one. So this is the first of two episodes with Jorgen Schmidlberg. The second one is coming soon. Enjoy! everyone and welcome back to the information bottleneck Hey JÃ¶rgen!

Juergen: Hello, Ravid, and hello, Alan. How are you doing?

Allen Roush: We're doing great, I can tell you that. â How are you?

Juergen: That's fine. Thank you so much.

Ravid Shwartz-Ziv: maybe â tell us â you define or how you want to introduce yourself.

Juergen: My name is JÃ¼rgen and I'm interested in artificial intelligence.

Allen Roush: I think that's a bit of an understatement of the century.

Ravid Shwartz-Ziv: Okay. we can start with like, â I'm not if like this is the but like long time ago, right? 1991, okay. No one like the field is the offline machine learning and deep learning is so small, right? There is almost no money in it. â And â you have â a small medium â Germany â â then what? Tell why it's why such an important moment in time.

Juergen: So 1991 is the only palindromic year of the 20th century. And it was important for AI because in this little lab that you mentioned in Munich, we were able to come up with all kinds of algorithms which are now essential to what the big companies, the most valuable companies in the world are doing. A thousand billion dollars are being invested in scaling up things that started back then. We didn't invent deep learning, no, that happened in Ukraine in 1965, but then in 1991 we were able to make major steps forward and we had a whole bunch of interesting things going on there.

Ravid Shwartz-Ziv: So why, how it can be that it's exactly like in one place, there are so many discovered, like you rediscover or reinvent so many things exactly in one place, in one point in time. How, what is the process that actually make it happen?

Juergen: Yeah, so I'm personally not very smart and â I'm just smart enough to, or back then I was smart enough to realize that I'm not very smart, but it might be possible to build something that is much smarter than myself such that this thing learns to do all the things that I cannot do myself and then I cannot retire. And back then â there was no... competition or almost no competition because nobody was really interested in artificial intelligence back then except for you know very few people and â the reason is back then compute was about 10 million times more expensive than today because every five years computers getting 10 times cheaper. So in 30 years we have a fact of a million and 35 years and now it's 35 years. We have a fact of 10 million roughly. And back then these algorithms that are, know. behind the P and the T and chat GPT and stuff like that, they could be applied only to tiny little networks with a few hundred parameters. And today we have billions and trillions of parameters and we can do so much more for the same price. So back then, because you couldn't do much, you could do the same thing in principle, predict the next token and stuff like that, but you couldn't scale it up. That was only possible in the 2010s, something like that, and then more recently in the 2020s. And that's the reason why back then we had little competition. So it wasn't as easy as today to get scooped by somebody else doing a similar. kind of research.

Allen Roush: So I have a question about kind of your escapades and work in another part of AI. I see that you're listed in the chess programming wiki. And I'm fascinated by how the field looked at chess programming and chess development as a proxy for AI capabilities up until at least Deep Blue in 2001. so I guess what I'm curious about is what was your to that era of AI development. â And what did you think about, â do you think that there were echoes of the current race in like large language models back in the 90s and 2000s? For example, if you remember what RIBCA was, I still remember the RIBCA benchmark cheating scandal from the mid 2000s. And I'm curious if you see parallels there.

Juergen: Yeah, so chess of course is really one of the first domains where AI was applied to before I was born. And as far as I know, the first guy who wrote down the chess program, and that was in 1948 or something, or maybe 1946, I'm not sure. Conrad Zuse, the same guy who built the first general purpose computer in 1941 in Berlin. And he also had the first high level programming language, which was called Plan Kalkyl. crazy name. And then he wrote programs in Plan Kalkyl and one of them was a chess program basically. So that was long before I was born. I was born in 1963, two years before deep learning was invented in the Ukraine. And â then in the 80s I think I had a Sinclair computer and I tried to implement a little chess program myself. But my work on chess is completely irrelevant in the grand scheme of things. However, Chess profited from the same trend that we have been profiting from, which is every five years computers getting ten times cheaper. So back then when Zuse did his general purpose computer in 1941, he could do roughly one operation per second. One. And then 30 years later, you could do a million operations for the same price. And today we can do a million, I think almost a billion, billion, not quite, â instructions for the same price. So the â chess programs â were dominated by... exhaustive search. So you just do this recursive look ahead search and then you assume the opponent is able to look ahead one step fewer than your look ahead. And then you have this recursive search going on and more or less this thing then in 1997 led to a chess program that was able to beat the chess. champion, Kasparov, not using any neural networks. Maybe at the same time, more interesting, or actually three years earlier than that, there was Tisauro's neural network, which learned through reinforcement learning to play backgammon. So that was a learning program, and it reached human level competitiveness, I think around 1994. So that was more a foreshadowing of what was going to come in the field of board games. And it had more to do with modern AI than what we saw in the 1997 defeat of

Ravid Shwartz-Ziv: start to talk about different direction, different subfield in AI and â would love to hear your opinion. â worked in the 90s on â curiosity and increasing motivations, right? It looks like there are a lot â in common between the current RL exploration research. But what happened? â

Juergen: and

Ravid Shwartz-Ziv: Why you think our L is only become better now? Why you think like what I don't got lost in translations and needed to rediscover the to get us our current algorithms.

Juergen: Yeah, reinforcement learning is more complicated than supervised learning. What you see today in the most popular AI models, large language models, how do they work? They basically apply supervised learning to... â imitate all the data on the web. So predict parts of the data from other parts of the data. Predict the next word in a sentence from the previous words. Predict the next pixel in an image from all the previous pixels that you have seen, stuff like that. And â that can be done by just one single network that is â using gradient descent to, you predict parts of the data from other parts of the data and you train it on all the data you can find in the in on the world wide web and that means that you will insert an enormous tremendous human bias human oriented bias in your network because it is going to be trained on all the data that some humans at some point found interesting which means it will be totally biased towards humans maybe not to all human groups equally, but at least it will be super biased towards what humans find interesting. And that's why these large language models today are so useful in so many applications. However, The large language models are not about decision making. For decision making, which is associated with reinforcement learning, which is about learning to make decisions, you need to do more. So you need to first have a prediction model, something like â an LLM or foundation model as it is called now, which just learns to predict the future given the actions of the â actor. And the actor has to learn to use this model this model of the consequences of these actions to plan, to come up with mental simulations of the consequences of possible future action sequences and then pick an action sequence that is rewarding, that leads to a lot of reward. What does that mean? Well, there are these special inputs that have a value, the reinforcement signals. Our body is full of little pain sensors and we also put pain sensors into our robots such that the robots, who in the beginning are totally stupid, they learn to understand what's bad for them. And whenever they produce actions that make the robot hand bump against an obstacle, against a table or something, then they learn to predict over time that this is going to happen. And so they have a little model of the world, a world model. I called it a world model in 1990, which then can be used to do... planning in the sense that you do several mental experiments and then you choose action sequences that lead to high predictive reward given the world model. Now you need at least two things now. There is the prediction machine and that is kind of standard in the sense that it's really working well. So we have really good prediction machines. They are so good that they pass the Turing test. The Turing test is just about typing text to another guy behind the screen and then the goal is to figure out is the other guy a human or is it a computer? And if you can't do that any longer then the Turing test is passed which means that the Turing test is actually a bad way of measuring intelligence because it's just about a tiny little aspect of intelligence, you know? Are you able to... to predict whether the other guy on the other side is a machine or not. And the harder part is the decision-making part, where you need the other network, which uses a prediction machine, to come up with good decisions. But then the other guy, the controller, I usually call it the controller and the model, the controller has to use the model to plan ahead and to predict its future. and to select action sequences that are promising. However... In the beginning the model is stupid, you know, and so the controller has to live with that and it has to figure out ways of Making the model of the world better. So basically it has to invent action sequences experiments that lead to data that improve the world model so it has to have this thing which I call it artificial curiosity in 1990 I had a a tech report which was called on making the world differentiable, blah, blah, using recurrent neural networks as world models and as controllers, which used the world models to plan. But then also, the controller had an incentive to come up with experiments with action sequences that lead to a better world model. And the very first naive approach of 1990 was really â take the error of the prediction machine, of the world model, and use that as incentive for the controller. So in other words, the controller is maximizing the same thing that the model of the wild is minimizing. So the model sees the output actions of the controller and the other data which is coming from the environment. The controller also sees the stuff which is coming from the environment. â it is a generative model because it has little Gaussian units in there which, you know, compute the mean and the variance. of probability distributions over actions. And the

[truncated for AI cost control]