AI News HubLIVE
站内改写

Railway: The Agent-Native Cloud — Jake Cooper

3M Users, 100K Signups/Week, Own-Metal Data Centers, $200K+ Coding Agent Spend, and the Death of PRs

Article intelligence

EngineersAdvanced

Key points

  • Railway has grown from 100 hand-acquired users to 3M, adding 100K signups per week.
  • Owns bare metal data centers with 3-month payback period and 70% margins.
  • Believes the pull request is dying and agents need a new deployment loop.
  • Invests >$200K/month on coding agents to accelerate development.

Why it matters

This matters because railway has grown from 100 hand-acquired users to 3M, adding 100K signups per week.

Technical impact

May affect model selection, inference cost, product capability, and evaluation benchmarks.

Take the 2026 AI Engineering Survey and get >$2k in credits and AIE WF tickets!

This was recorded before Railway suffered a major GCP outage on May 19, despite being a multi-AZ, multi-zone mesh ring, with HA fiber interconnects between their Metal GCP AWS, because workload discoverability was unintentionally still tied to GCP. All has been resolved with a post-mortem.

Railway did not start as an AI infrastructure company.

It was founded in 2020 years before agents became the default way people thought about deploying software. Jake Cooper, formerly at Bloomberg and Uber, started Railway with a simple obsession: the activation energy to ship something to production should be near zero. Push code, get a URL, iterate. No Docker files, no Kubernetes manifests, no Ansible scripts stacked on Ansible scripts.

For years, this was a slow grind. Railway spent its first 18 months hand-acquiring its first 100 users with Jake personally greeting every Discord signup on a second monitor.

src

Today, Railway has raised $124m and is growing very fast. A 35-person team supports 3 million users, adding roughly 100,000 signups a week. Their bare metal data centers have a 3-month payback period vs. renting in the cloud, with 70% margins funding aggressive cloud bursting when needed. The servers they own have actually appreciated in value as RAM prices have climbed basically meaning the value of their hardware now exceeds the capital they've raised.

From rebuilding Railway’s network overlay over a weekend to moving the vast majority of workloads onto its own bare metal data centers, Jake Cooper is trying to build a new cloud for an agent-native world. In this episode, Railway’s founder and “conductor” joins swyx and Alessio to unpack why the next era of software infrastructure is not just “Heroku but newer,” what agents need that humans did not, and why the old deployment loop of Git, PRs, CI/CD, and static cloud resources may be heading for a rewrite.

We go deep on Railway’s infrastructure stack: own-metal data centers, three-month cloud payback periods, cloud bursting, data center debt, Railpack, Nixpacks, Temporal, feature flags, Central Station, content-addressable filesystems, agent-safe production forks, and why the CLI may become more important than the canvas in an agent world. Jake also shares the founder journey behind Railway, how the company survived losing $500K/month, why it now serves millions of users with only 35 people, and why he believes the pull request is dying.

We discuss:

How Railway went from a slow six-year grind to adding 100,000 users a week

How Railway thinks about agents as the next dominant software species

Why agents need version control, observability, compute, storage, and orchestration at 1000x scale

The economics of Railway’s own-metal data centers and three-month payback

How Railway uses cloud bursting while scaling its own infrastructure

Why data center debt can be a better tool than venture debt for infra startups

Central Station, Railway’s internal system for clustering customer feedback and incidents

Why responsible disclosure and over-communication matter for platforms

Why feature flags, progressive rollouts, and shadow traffic are essential for agents

Temporal’s strengths, pain points, and why workflows matter for agents

Railpack, Nixpacks, Nix, and lazy-loaded content-addressable filesystems

Why “cattle, not pets” may change if you can clone the pets

Why Railway is building a new cloud from scratch instead of copying hyperscalers

The solo founder path, focus, writing, and how Jake thinks about company building

Railway:

Website: https://railway.com/

X: https://x.com/Railway

Jake Cooper:

LinkedIn: https://www.linkedin.com/in/thejakecooper/

X: https://x.com/JustJake

Timestamps

00:00:00 Introduction: What Is Railway? 00:02:07 Jake’s Path to Railway 00:06:13 Railway’s Six-Year Growth Story 00:08:52 Rebuilding the Business After the Free Tier 00:11:17 Agents as the Next Software Platform 00:13:29 Railway’s Infrastructure Philosophy 00:15:42 Bare Metal, Cloud Economics, and the Compute Crunch 00:17:22 Cloud Bursting and Five-Cloud Networking 00:20:20 Data Center Debt and Infra Financing 00:23:31 Data Centers in Space 00:25:24 What Agents Need From Infrastructure 00:28:24 CLIs, Canvas, and Agent-Native UX 00:35:15 Central Station, Incidents, and Responsible Disclosure 00:40:30 Safe Rollouts, SRE Agents, and Production Forks 00:45:00 AI SRE, Specs, Code, and Tests 00:48:24 Self-Replicating Infrastructure and the New Serverless 00:53:18 Heroku, Temporal, and Workflow Engines 01:04:07 Railpack, Nixpacks, and Lazy-Loaded Filesystems 01:06:01 Coding Agents, Token Spend, and Roadmap Acceleration 01:10:56 The Pull Request Is Dying 01:12:28 Feature Flags and the Agent-Era SDLC 01:16:15 Cattle, Pets, and Cloning Machines 01:19:29 Solo Founder Lessons 01:24:12 Focus, GPUs, and Building a New Cloud 01:28:20 Closing Thoughts

Transcript

Alessio [00:00:00]: Hey, everyone. Welcome to the Latent Space Podcast. This is Alessio, founder of Kernel Labs, and I’m joined by Swyx, editor of Latent Space.

Swyx [00:00:10]: Hey, hey, hey. Today we’re in the studio with Jake Cooper of Railway.

Alessio [00:00:14]: Conductor of Railway.

Swyx [00:00:15]: Conductor at Railway. Yeah.

Alessio [00:00:16]: Choo-choo.

Swyx [00:00:17]: Do you actually have that anywhere, like on your business card?

Jake [00:00:20]: We call some of our volunteer moderators conductors. I don’t have a business card. We’re not that big yet. At some point I will. I got handed a nice business card from the Supermicro folks, and I was like, “Damn, this is pretty official.”

Swyx [00:00:30]: Business cards are coming back.

Jake [00:00:32]: They’re cool. They’re hip. The conductor thing is good. We’re trying to figure out what we want to call each other internally. Some people think it’s super cringe and say, “You don’t need a name for people internally.” Some people want to call each other something. We still don’t have a really good one.

Jake [00:00:55]: We’ve got New Railcrews, Trainiacs. Nothing has stuck yet.

Swyx [00:01:00]: I like Trainiac. Trainiac sounds good. Railwayians. For those who don’t know, what is Railway? Let’s give people a crisp definition up front.

Jake [00:01:09]: Railway is the easiest way to ship anything. You go to the canvas, or you talk with Claude, and you say, “Deploy a Postgres instance, deploy my GitHub repository, run this code,” and you’re off to the races.

Swyx [00:01:22]: You’ve got a nice animation on the landing page.

Jake [00:01:24]: Thank you. None of my work, by the way. They don’t let me touch the design stuff anymore.

Jake [00:01:25]: We want to make it trivially easy not just to deploy things, but to evolve applications over time. Most tooling right now stacks entropy on top of entropy: Docker, Kubernetes, Ansible scripts, and all these other things. If we can version all of your software and keep track of all the changes, then we can make it trivial to clone environments, fork into a parallel universe, get copies of production data, get copies of any services, make changes, validate them, and collapse them back in without reproducing everything across a staging environment.

The Railway Origin Story: From Uber Systems to a New Cloud

Swyx [00:02:07]: I was looking at your background: Bloomberg, Uber. Nothing immediately stands out as, “This guy is going to found the next great platform as a service.” What prepared you for Railway?

Jake [00:02:21]: It was curiosity to keep going deeper. I started out on front-end stuff, working on Wolfram Mathematica and porting it over. Then I briefly moved to Bloomberg, then toward Uber and distributed systems, taking the Jump Bikes systems and moving them to a distributed system built on top of Cadence, the pre-Temporal Temporal.

Swyx [00:02:44]: Which, by the way, I’m happy to talk about, pros and cons.

Jake [00:02:48]: Totally.

Swyx [00:02:51]: But let’s do the Railway story.

Jake [00:02:52]: It has been a continual step of wanting an experience. Whether it’s walking up to a bike, unlocking it, and having it work frictionlessly, or something else, the depth required to make that happen follows from the experience. A lot of the work I do, and a lot of the team does, is in service of that experience. We fundamentally don’t care how deep we have to go. We will swim to the bottom of the swimming pool to get the experience.

Jake [00:03:17]: I don’t have a physics PhD. I did an EECS degree. It has always been about figuring out the next step: how do we get there? That’s what led to starting Railway for that experience and then moving all the way to bare metal data centers. I was adding patches to the kernel this week to get the experience there because I can see how much better it can be.

Swyx [00:03:49]: Other patches to the Linux kernel this week?

Jake [00:03:51]: Yeah. Not upstream. Our fork.

Swyx [00:03:52]: That’s a flex. Railpack? No, this is different. This is the OS on top of Railpack?

Jake [00:03:57]: No, this is an actual kernel patch. It’s always literally: what do we have to do to get that experience? Then figure it out. Anything is figureoutable.

Swyx [00:04:10]: Would you send the patch upstream, or does it not fit other use cases?

Jake [00:04:13]: Maybe. We have to work out the experience internally. It has to do with the storage layer we’re building for some of the agentic stuff. Maybe it’ll be useful upstream, but it’s deeply useful for us internally.

Open Source, Forks, and Non-Deterministic Versioning

Swyx [00:04:29]: You mentioned open source before. How do you think about starting from open source, and then coding agents letting you do a lot more from forks of it?

Jake [00:04:38]: GitHub’s original sin is that it’s almost a series of broken pointers. You have this thing, then you clone it, and now you’ve lost the whole upstream. How do we make it trivial for people to modify really small pieces of it?

Jake [00:04:51]: We think of Git in a discrete sense: I’ve either made a change and merged upstream, or I haven’t. What would it look like if it were percentage-based, a little more non-deterministic, or a stream of changes that users traverse as a percentage rolled out in general and then rolled all the way up?

Jake [00:05:13]: We have the open-source kickback program and let you deploy templates because we want to make it trivial for people to version these shards over time. It solves a large problem around authentication, authorization, and security. NPM has a way to define, “Don’t take any new packages.” The ideal end state is that you roll out progressively to users with the minimum impact zone and continue rolling up. JPMorgan should probably be the last one on the patch line, for all our sakes, because our money and livelihoods are there.

Jake [00:05:53]: It’s okay if Johnny Vibe Coder gets a broken patch because there’s so much entropy in the system that the rubber has to meet the road at some point. You have to test at varying levels.

The Long Grind: First Users, Free Tier, and Making the Business Work

Swyx [00:06:13]: I wanted to pull up this glorious chart, which is your usage or number of daily signups?

Jake [00:06:22]: Daily signups, I think.

Swyx [00:06:24]: You started six years ago. It was a slow grind, and now you’re on a rocket ship. You say, “Don’t doubt your fight and don’t quit.” Maybe pick out certain points that were key inflections for the company.

Jake [00:06:40]: At the start, it’s about getting your first 100 users, hell or high water. We had a website and a support link. The support link was the Discord channel. I had notifications on with two monitors: the monitor I was working on and the other monitor with Discord. If anybody came in, I was immediately like, “Hey, how’s it going?” It was rare, so getting those first 100 users to come back was the start.

Jake [00:07:14]: Then you build a consult

[truncated for AI cost control]