Adaptive Parallel Reasoning (APR) empowers LLMs to dynamically decide when to parallelize reasoning, how many threads to spawn, and how to coordinate them. This article analyzes the motivation, methods, training strategies, and open questions in the field.
APR allows models to adaptively allocate compute between sequential and parallel reasoning, overcoming limitations of fixed parallelism methods.
Compared to Tree-of-Thoughts and Best-of-N, APR avoids redundant computation and requires no domain-specific heuristics.
GRASP is a new gradient-based planner for learned dynamics (a world model) that makes long-horizon planning practical by lifting the trajectory into virtual states for parallel optimization, adding stochasticity to state iterates for exploration, and reshaping gradients to avoid brittle state-input gradients through high-dimensional vision models.
GRASP uses virtual state lifting to parallelize optimization across time, enabling faster long-horizon planning.
It injects Gaussian noise into state updates to explore the optimization landscape and avoid local minima.
This article presents SPEX and ProxySPEX, algorithms that efficiently identify critical interactions in large language models from three perspectives: feature attribution, data attribution, and mechanistic interpretability. Leveraging structural properties like sparsity, low-degreeness, and hierarchy, these methods discover influential interactions between features, training data, and internal components with fewer ablations, demonstrating strong performance across long contexts, datasets, and model components.
SPEX reframes interaction discovery as a sparse recovery problem using sparsity and low-degreeness, drastically reducing computational cost.
ProxySPEX exploits hierarchy to achieve similar performance with about 10x fewer ablations.
Researchers have developed a framework to evaluate and optimize imaging systems based on mutual information, predicting performance across four domains and enabling efficient design without task-specific decoders.
Mutual information quantifies the useful info in measurements, unifying traditional metrics like resolution and SNR.
The method estimates information directly from noisy measurements using known noise models and learned distributions.
This post introduces a reinforcement learning algorithm based on the divide-and-conquer paradigm, which does not rely on temporal difference (TD) learning. The proposed algorithm, Transitive RL (TRL), scales well to long-horizon tasks by recursively splitting trajectories and achieves state-of-the-art performance on challenging OGBench benchmarks without needing to tune the n-step TD hyperparameter.
Proposes a divide-and-conquer RL algorithm named Transitive RL (TRL) that avoids TD learning.
TRL reduces the number of Bellman recursions logarithmically, enabling efficient handling of long-horizon tasks.
A new theory from Berkeley AI Research proves that word2vec reduces to unweighted least-squares matrix factorization, with final representations given by PCA. The model learns discrete orthogonal subspaces sequentially from small initialization, each corresponding to interpretable concepts. The theory predicts features in closed form based on corpus statistics and hyperparameters, matching experiments closely.
BAIR introduces PEVA, a model that predicts egocentric video conditioned on whole-body actions. It uses an autoregressive conditional diffusion transformer trained on Nymeria dataset to simulate atomic actions, long video generation, and visual planning.
PEVA takes whole-body kinematic poses as input to predict future ego-centric video frames.
It uses a 48-dimensional action space encoding full-body joint movements.
A new BAIR research proposes two fine-tuning defenses against prompt injection attacks, StruQ and SecAlign, which reduce success rates of optimization-free attacks to ~0% and optimization-based attacks to 8%, without additional computational cost or human labor.
Prompt injection is the top security threat for LLM-integrated applications per OWASP
StruQ uses structured instruction tuning to reduce attack success rates to near zero for simple attacks
PLAID is a multimodal generative model that simultaneously generates protein 1D sequence and 3D structure by learning the latent space of protein folding models. It trains on sequence-only data, accepts compositional function and organism prompts, and addresses limitations like all-atom generation, organism specificity, and control specification for practical drug design.
PLAID uses latent diffusion on protein folding models to co-generate sequence and structure.
Only sequence data is needed for training, leveraging databases 2-4 orders of magnitude larger than structure databases.
We deployed 100 reinforcement learning (RL)-controlled cars into rush-hour highway traffic to smooth congestion and reduce fuel consumption for everyone. Through data-driven simulations, RL agents learned to maximize energy efficiency while maintaining throughput and safety. Field tests show that a small proportion of well-controlled autonomous vehicles (AVs) can significantly improve traffic flow and fuel efficiency, achieving 15-20% energy savings.
Deployed 100 RL-controlled vehicles on I-24 highway for large-scale field test to smooth stop-and-go waves.
RL controllers use only local sensor information (speed, lead vehicle speed, gap) for decentralized operation.
BAIR researchers introduce Anthology, a method for conditioning large language models with detailed personal backstories to create representative, consistent, and diverse virtual personas. This approach outperforms traditional demographic-based conditioning in approximating real human survey responses, offering a cost-effective alternative for social science research.
Anthology uses naturalistic backstories to condition LLMs, enabling more realistic virtual personas.
It outperforms demographic-only methods in matching response distributions and consistency.