Sub-Billion Reasoning Didn't Start with RL
MobileLLM-R1, a 950M-parameter model, matches Qwen3-0.6B on MATH500 and AIME while using 11.7% of its pretraining data. That didn't come from RL alone. It sits on a three-year stack: architecture, quantization, and data curation. This post traces that stack.