Sub-Billion Reasoning Didn't Start with RL

MobileLLM-R1, a 950M-parameter model, matches Qwen3-0.6B on MATH500 and AIME while using 11.7% of its pretraining data. That didn't come from RL alone. It sits on a three-year stack: architecture, quantization, and data curation. This post traces that stack.

Read More

Does Reasoning Require Scale?

A 950M parameter model solves more competition math problems than models nearly twice its size. The gap isn't parameter count, it's training methodology and inference strategy. But cheap reasoning shifts the bottleneck to reliability: small models can reason, they just don't know when they're wrong.

Read More

On-Device LLMs: State of the Union, 2026

Three years ago, running a language model on a phone meant a toy demo. Today, billion-parameter models run in real time on flagship devices. This shift came not from faster chips alone, but from rethinking how we build, compress, and deploy models.

Read More