Show HN: Landmark AI and ML research explained, redrawn, animated
Rudrite Research offers interactive animated explainers of landmark AI and ML papers, making complex systems and ideas legible. Free and open with over 100 explainers and guided reading tracks.
Rudrite Research — the frontier, made legible
Interactive, animated, visual explainers of landmark AI & ML papers — the systems and ideas behind the models you use, redrawn and made legible. Free and open.
Browse all 100 explainers · Guided reading tracks
Attention Is All You Need
FlashAttention
PagedAttention (vLLM)
Megatron-LM
DeepSeek-R1
GPT-3: Language Models are Few-Shot Learners
ZeRO: Zero Redundancy Optimizer
Mixtral of Experts
Training Compute-Optimal Large Language Models
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
BERT: Pre-training of Deep Bidirectional Transformers
DeepSeek-V3
Qwen3
OLMo 2
MiniMax-01
Gemma 4
Scaling Laws for Neural Language Models
Adam: A Method for Stochastic Optimization
Deep Residual Learning for Image Recognition
Denoising Diffusion Probabilistic Models
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
LoRA: Low-Rank Adaptation of Large Language Models
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
GSPMD: General and Scalable Parallelization for ML Computation Graphs
Pathways: Asynchronous Distributed Dataflow for ML
Ring Attention with Blockwise Transformers for Near-Infinite Context
Efficiently Scaling Transformer Inference
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
Fast Inference from Transformers via Speculative Decoding
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Training language models to follow instructions with human feedback
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Constitutional AI: Harmlessness from AI Feedback
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
RoFormer: Enhanced Transformer with Rotary Position Embedding
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Learning Transferable Visual Models From Natural Language Supervision
High-Resolution Image Synthesis with Latent Diffusion Models
Scalable Diffusion Models with Transformers
Robust Speech Recognition via Large-Scale Weak Supervision
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Group Sequence Policy Optimization
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
YaRN: Efficient Context Window Extension of Large Language Models
Efficient Streaming Language Models with Attention Sinks
Generative Adversarial Networks
Segment Anything
Visual Instruction Tuning
s1: Simple test-time scaling
Tülu 3: Pushing Frontiers in Open Language Model Post-Training
Let's Verify Step by Step
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
KAN: Kolmogorov–Arnold Networks
Differential Transformer
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models
RWKV: Reinventing RNNs for the Transformer Era
Titans: Learning to Memorize at Test Time
Byte Latent Transformer: Patches Scale Better Than Tokens
The Llama 3 Herd of Models
Mistral 7B
Phi-4 Technical Report
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Flow Matching for Generative Modeling
Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty
Rewarding Doubt: Calibrated Confidence Expression of LLMs
Why Language Models Hallucinate
τ-bench: Tool-Agent-User Interaction in Real-World Domains
ToolRL: Reward is All Tool Learning Needs
Group-in-Group Policy Optimization for LLM Agent Training
MiniMax-M1: Scaling Test-Time Compute with Lightning Attention
ProRL: Prolonged RL Expands Reasoning Boundaries
The Entropy Mechanism of RL for Reasoning Language Models
Spurious Rewards: Rethinking Training Signals in RLVR
GenPRM: Generative Process Reward Models
From Hard Refusals to Safe-Completions
Proximal Policy Optimization Algorithms
Efficiently Modeling Long Sequences with Structured State Spaces
Auto-Encoding Variational Bayes
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Toolformer: Language Models Can Teach Themselves to Use Tools
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Muon is Scalable for LLM Training
Consistency Models