Efficient Video Intelligence in 2026

Five years ago, video understanding mostly meant action recognition on Kinetics-400 or short-clip captioning. Today, vision-language models reason about hour-long footage, on-device tracking segments any object at 16 FPS on a phone, and a single 100M-parameter encoder can match domain experts across image understanding, dense prediction, and VLM tasks.

The Audio-Visual Gap in Embodied AI

Vision models and audio models have both improved rapidly, but multimodal LLMs still struggle with first-person video that requires understanding both. I think this is a major bottleneck in egocentric AI, and it's a data problem more than an architecture problem.

Diffusion Models Learn to Think

Every major reasoning system is autoregressive. I think we've been confusing a training limitation for an architectural one. dTRPO makes RL training for diffusion LLMs tractable, reopening the architecture question for reasoning.

Sub-Billion Reasoning Didn't Start with RL

MobileLLM-R1, a 950M-parameter model, matches Qwen3-0.6B on MATH500 and AIME while using 11.7% of its pretraining data. That didn't come from RL alone. It sits on a three-year stack: architecture, quantization, and data curation. This post traces that stack.

Does Reasoning Require Scale?

A 950M parameter model solves more competition math problems than models nearly twice its size. The gap isn't parameter count, it's training methodology and inference strategy. But cheap reasoning shifts the bottleneck to reliability: small models can reason, they just don't know when they're wrong.

On-Device LLMs: State of the Union, 2026

Three years ago, running a language model on a phone meant a toy demo. Today, billion-parameter models run in real time on flagship devices. This shift came not from faster chips alone, but from rethinking how we build, compress, and deploy models.

Scaling Down Beats Scaling Up: The Algorithmic Attack on AI's Memory Wall

A few years ago, watching teams throw hardware at AI’s memory problems, I started thinking: what if the bottleneck isn’t hardware, but how we use it?

The Personal Context Graph: Why On-Device AI will capture the layer that cloud models can't

There’s a growing consensus in AI that the next trillion-dollar platform won’t be another chatbot or copilot. It’ll be the system that captures context graphs: the decision traces, exceptions, and precedents that currently live in Slack threads, deal desk conversations, and people’s heads.