Selected Recent Publications

Language Models

  1. MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes, ICLR (2026) [PDF]

  2. SpinQuant: LLM Quantization with Learned Rotations, ICLR (2025). [PDF]

  3. MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases, ICML (2024). [PDF]

  4. AutoMixer: Checkpoint Artifacts as Automatic Data Mixers, ACL (2025) [PDF]

  5. Target-Aware Language Modeling via Granular Data Sampling, EMNLP (2024). [PDF]

  6. Towards Zero-Shot Multilingual Transfer for Code-Switched Responses, ACL (2023). [PDF]

  7. Agent-as-a-Judge: Evaluate Agents with Agents, ICML (2025). [PDF]

  8. ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization, NeurIPS (2025) [PDF]

  9. Streamlining Language Models via Semantic Basis Analysis, TMLR (2025) [PDF]

  10. Self-Vocabularizing Training for Neural Machine Translation, NAACL SRW (2025). [PDF]

  11. Scaling Parameter-Constrained Language Models with Quality Data, EMNLP Industry (2024). [PDF]

  12. LLM-QAT: Data-Free Quantization Aware Training for Large Language Models, ACL Findings (2024). [PDF]

  13. Revisiting Sample Size Determination in Natural Language Understanding, ACL Findings (2023). [PDF]

  14. MobileLLM-Pro Technical Report, arXiv (2025) [PDF]

  15. An Introduction to Vision-Language Modeling, arXiv (2024). [PDF]

  16. MiniGPT-v2: Large Language Model As a Unified Interface for Vision-Language Multi-task Learning, arXiv (2023). [PDF]


Efficient AI & Model Compression

  1. CPT: Efficient Deep Neural Network Training via Cyclic Precision, ICLR (2021) (Spotlight). [PDF]

  2. AlphaNet: Improved Training of Supernet with Alpha-Divergence, ICML (2021) (Long Presentation). [PDF]

  3. APOLLO: SGD-like Memory, AdamW-level Performance, MLSys (2025) [PDF]

  4. Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications, TMLR (2025). [PDF]

  5. NASViT: Neural Architecture Search for Efficient Vision Transformers with Gradient Conflict aware Supernet Training, ICLR (2022). [PDF]

  6. DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks, ICML (2022). [PDF]

  7. Double-win Quant: Aggressively Winning Robustness of Quantized Deep Neural Networks via Random Precision Training and Inference, ICML (2021). [PDF]

  8. AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling, CVPR (2021). [PDF]

  9. One weight bitwidth to rule them all, Embedded Vision Workshop, ECCV (2020) (Best Paper Award). [PDF]

  10. Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts, ACL Findings (2024). [PDF]

  11. ScaleNAS: Multi-Path One-Shot NAS for Scale-Aware High-Resolution Representation, AutoML (2022). [PDF]

  12. Contrastive Quant: Quantization makes Stronger Contrastive Learning, DAC (2022). [PDF]

  13. NASGEM: Neural Architecture Search via Graph Embedding Method, AAAI (2021). [PDF]

  14. Energy-Aware Neural Architecture Optimization With Splitting Steepest Descent, NeurIPS Workshop (2019). [PDF]

  15. Llama Guard 3-1B-INT4: Compact and Efficient Safeguard for Human-AI Conversations, arXiv (2024). [PDF]

  16. Low-Rank+ Sparse Tensor Compression for Neural Networks, arXiv (2021). [PDF]

  17. CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs, arXiv (2018). [PDF]

  18. Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations, arXiv (2017). [PDF]


Computer Vision & 3D

  1. DepthLM: Metric Depth from Vision Language Models, ICLR (2026) (Oral Presentation) [PDF]

  2. LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding, ICML (2025). [PDF]

  3. EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything, CVPR (2024) (Highlight). [PDF]

  4. EdgeTAM: On-Device Track Anything Model, CVPR (2025) [PDF]

  5. MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction, ECCV (2025) [PDF]

  6. Efficient Track Anything, ICCV (2025). [PDF]

  7. CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians, ECCV (2024). [PDF]

  8. Taming Mode Collapse in Score Distillation for Text-to-3D Generation, CVPR (2024). [PDF]

  9. MVDiffHD: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction, ECCV (2024). [PDF]

  10. Fast Point Cloud Generation with Straight Flows, CVPR (2023). [PDF]

  11. Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation, CVPR (2022). [PDF]

  12. KeepAugment: A Simple Information-Preserving Data Augmentation Approach, CVPR (2021). [PDF]

  13. Feature-Align Network with Knowledge Distillation for Efficient Denoising, WACV (2022). [PDF]

  14. PathFusion: Path-consistent Lidar-Camera Deep Feature Fusion, 3DV (2024). [PDF]

  15. EVRNet: Efficient Video Restoration on Edge Devices, ACM MM (2021). [PDF]

  16. SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity, AISTATS (2025). [PDF]

  17. VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice, arXiv (2026) [PDF]

  18. SqueezeSAM: User Friendly Mobile Interactive Segmentation, arXiv (2023). [PDF]

  19. Vision Transformers with Patch Diversification, arXiv (2021). [PDF]

  20. Can Temporal Information Help with Contrastive Self-Supervised Learning?, arXiv (2020). [PDF]


Speech & Audio

  1. Breaking Down Power Barriers in On-Device Streaming ASR: Insights and Solutions, NAACL (2025) [PDF]

  2. TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-Device ASR Models, ICASSP (2024). [PDF]

  3. Stack-and-Delay: A New Codebook Pattern for Music Generation, ICASSP (2024). [PDF]

  4. In-Context Prompt Editing for Conditional Audio Generation, ICASSP (2024). [PDF]

  5. On the Open Prompt Challenge in Conditional Audio Generation, ICASSP (2024). [PDF]

  6. Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition, ICASSP (2024). [PDF]

  7. Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet, ICASSP (2022). [PDF]

  8. Memory-efficient Speech Recognition on Smart Devices, ICASSP (2021). [PDF]

  9. Streaming Parallel Transducer Beam Search with Fast-Slow Cascaded Encoders, INTERSPEECH (2022). [PDF]

  10. Collaborative Training of Acoustic Encoders for Speech Recognition, INTERSPEECH (2021). [PDF]

  11. Data Efficient Reflow for Few Step Audio Generation, SLT (2024). [PDF]

  12. Towards Temporally Synchronized Visually Indicated Sounds Through Scale-Adapted Positional Embeddings, NeurIPS Workshop (2024). [PDF]

  13. SLAP: Scalable Language-Audio Pretraining with Variable-Duration Audio and Multi-Objective Training, arXiv (2026) [PDF]

  14. SyncFlow: Toward Temporally Aligned Joint Audio-Video Generation from Text, arXiv (2024). [PDF]

  15. High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching, arXiv (2024). [PDF]

  16. Enhance Audio Generation Controllability Through Representation Similarity Regularization, arXiv (2023). [PDF]

  17. Exploring Speech Enhancement for Low-resource Speech Synthesis, arXiv (2023). [PDF]

  18. FoleyGen: Visually-Guided Audio Generation, arXiv (2023). [PDF]

  19. LiCo-Net: Linearized Convolution Network for Hardware-efficient Keyword Spotting, arXiv (2022). [PDF]

  20. Noisy Training Improves E2E ASR for the Edge, arXiv (2021). [PDF]

  21. Hello Edge: Keyword Spotting on Microcontrollers, arXiv (2017). [PDF]


Systems ML

  1. DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads, ASPLOS (2024). [PDF]

  2. XRBench: An Extended Reality (XR) Machine Learning Benchmark Suite for the Metaverse, MLSys (2023). [PDF]

  3. Heterogeneous Dataflow Accelerators for Multi-DNN Workloads, HPCA (2021). [PDF]

  4. Mind Mappings: Enabling Efficient Algorithm-Accelerator Mapping Space Search, ASPLOS (2021). [PDF]

  5. RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing, ISCA (2020). [PDF]

  6. Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks, ISCA (2018). [PDF]

  7. Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks, DAC (2020). [PDF]

  8. Improving Efficiency in Neural Network Accelerator using Operands Hamming Distance Optimization, NeurIPS Workshop (2019). [PDF]

  9. Not All Ops are Created Equal!, SysML (2018). [PDF]

  10. Throughput-optimized OpenCL-based FPGA Accelerator for Large-scale Convolutional Neural Networks, FPGA (2016). [PDF]

  11. DNA: Differentiable Network-Accelerator Co-Search, arXiv (2020). [PDF]

  12. Federated Learning with Non-IID Data, arXiv (2018). [PDF]

  13. PrivyNet: A Flexible Framework for Privacy-Preserving Deep Neural Network Training, arXiv (2018). [PDF]

2026

  1. DepthLM: Metric Depth from Vision Language Models, ICLR (Oral Presentation) [PDF]

  2. MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes, ICLR [PDF]

  3. SLAP: Scalable Language-Audio Pretraining with Variable-Duration Audio and Multi-Objective Training, arXiv [PDF]

  4. VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice, arXiv [PDF]


2025

  1. SpinQuant: LLM Quantization with Learned Rotations, ICLR [PDF]

  2. Agent-as-a-Judge: Evaluate Agents with Agents, ICML [PDF]

  3. LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding, ICML [PDF]

  4. ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization, NeurIPS [PDF]

  5. EdgeTAM: On-Device Track Anything Model, CVPR [PDF]

  6. MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction, ECCV [PDF]

  7. Efficient Track Anything, ICCV [PDF]

  8. APOLLO: SGD-like Memory, AdamW-level Performance, MLSys [PDF]

  9. Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications, TMLR [PDF]

  10. AutoMixer: Checkpoint Artifacts as Automatic Data Mixers, ACL [PDF]

  11. Breaking Down Power Barriers in On-Device Streaming ASR: Insights and Solutions, NAACL [PDF]

  12. Streamlining Language Models via Semantic Basis Analysis, TMLR [PDF]

  13. SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity, AISTATS [PDF]

  14. Self-Vocabularizing Training for Neural Machine Translation, NAACL SRW [PDF]

  15. MobileLLM-Pro Technical Report, arXiv [PDF]


2024

  1. EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything, CVPR (Highlight) [PDF]

  2. MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases, ICML [PDF]

  3. Taming Mode Collapse in Score Distillation for Text-to-3D Generation, CVPR [PDF]

  4. Target-Aware Language Modeling via Granular Data Sampling, EMNLP [PDF]

  5. CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians, ECCV [PDF]

  6. MVDiffHD: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction, ECCV [PDF]

  7. DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads, ASPLOS [PDF]

  8. TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-Device ASR Models, ICASSP [PDF]

  9. Stack-and-Delay: A New Codebook Pattern for Music Generation, ICASSP [PDF]

  10. In-Context Prompt Editing for Conditional Audio Generation, ICASSP [PDF]

  11. On the Open Prompt Challenge in Conditional Audio Generation, ICASSP [PDF]

  12. Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition, ICASSP [PDF]

  13. Scaling Parameter-Constrained Language Models with Quality Data, EMNLP Industry [PDF]

  14. Data Efficient Reflow for Few Step Audio Generation, SLT [PDF]

  15. Towards Temporally Synchronized Visually Indicated Sounds Through Scale-Adapted Positional Embeddings, NeurIPS Workshop [PDF]

  16. LLM-QAT: Data-Free Quantization Aware Training for Large Language Models, ACL Findings [PDF]

  17. Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts, ACL Findings [PDF]

  18. PathFusion: Path-consistent Lidar-Camera Deep Feature Fusion, 3DV [PDF]

  19. SyncFlow: Toward Temporally Aligned Joint Audio-Video Generation from Text, arXiv [PDF]

  20. Llama Guard 3-1B-INT4: Compact and Efficient Safeguard for Human-AI Conversations, arXiv [PDF]

  21. High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching, arXiv [PDF]

  22. An Introduction to Vision-Language Modeling, arXiv [PDF]


2023

  1. Fast Point Cloud Generation with Straight Flows, CVPR [PDF]

  2. XRBench: An Extended Reality (XR) Machine Learning Benchmark Suite for the Metaverse, MLSys [PDF]

  3. Towards Zero-Shot Multilingual Transfer for Code-Switched Responses, ACL [PDF]

  4. Revisiting Sample Size Determination in Natural Language Understanding, ACL Findings [PDF]

  5. MiniGPT-v2: Large Language Model As a Unified Interface for Vision-Language Multi-task Learning, arXiv [PDF]

  6. Enhance Audio Generation Controllability Through Representation Similarity Regularization, arXiv [PDF]

  7. Exploring Speech Enhancement for Low-resource Speech Synthesis, arXiv [PDF]

  8. FoleyGen: Visually-Guided Audio Generation, arXiv [PDF]

  9. SqueezeSAM: User Friendly Mobile Interactive Segmentation, arXiv [PDF]


2022

  1. NASViT: Neural Architecture Search for Efficient Vision Transformers with Gradient Conflict aware Supernet Training, ICLR [PDF]

  2. DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks, ICML [PDF]

  3. Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation, CVPR [PDF]

  4. Feature-Align Network with Knowledge Distillation for Efficient Denoising, WACV [PDF]

  5. Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet, ICASSP [PDF]

  6. Streaming Parallel Transducer Beam Search with Fast-Slow Cascaded Encoders, INTERSPEECH [PDF]

  7. ScaleNAS: Multi-Path One-Shot NAS for Scale-Aware High-Resolution Representation, AutoML [PDF]

  8. Contrastive Quant: Quantization makes Stronger Contrastive Learning, DAC [PDF]

  9. LiCo-Net: Linearized Convolution Network for Hardware-efficient Keyword Spotting, arXiv [PDF]


2021

  1. CPT: Efficient Deep Neural Network Training via Cyclic Precision, ICLR (Spotlight) [PDF]

  2. AlphaNet: Improved Training of Supernet with Alpha-Divergence, ICML (Long Presentation) [PDF]

  3. Double-win Quant: Aggressively Winning Robustness of Quantized Deep Neural Networks via Random Precision Training and Inference, ICML [PDF]

  4. AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling, CVPR [PDF]

  5. KeepAugment: A Simple Information-Preserving Data Augmentation Approach, CVPR [PDF]

  6. Heterogeneous Dataflow Accelerators for Multi-DNN Workloads, HPCA [PDF]

  7. Mind Mappings: Enabling Efficient Algorithm-Accelerator Mapping Space Search, ASPLOS [PDF]

  8. NASGEM: Neural Architecture Search via Graph Embedding Method, AAAI [PDF]

  9. Collaborative Training of Acoustic Encoders for Speech Recognition, INTERSPEECH [PDF]

  10. Memory-efficient Speech Recognition on Smart Devices, ICASSP [PDF]

  11. EVRNet: Efficient Video Restoration on Edge Devices, ACM MM [PDF]

  12. Noisy Training Improves E2E ASR for the Edge, arXiv [PDF]

  13. Low-Rank+ Sparse Tensor Compression for Neural Networks, arXiv [PDF]

  14. Vision Transformers with Patch Diversification, arXiv [PDF]


2020

  1. One weight bitwidth to rule them all, ECCV Workshop (Best Paper Award) [PDF]

  2. RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing, ISCA [PDF]

  3. Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks, DAC [PDF]

  4. Can Temporal Information Help with Contrastive Self-Supervised Learning?, arXiv [PDF]

  5. DNA: Differentiable Network-Accelerator Co-Search, arXiv [PDF]


2019

  1. Energy-Aware Neural Architecture Optimization With Splitting Steepest Descent, NeurIPS Workshop [PDF]

  2. Improving Efficiency in Neural Network Accelerator using Operands Hamming Distance Optimization, NeurIPS Workshop [PDF]


2018

  1. Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks, ISCA [PDF]

  2. Not All Ops are Created Equal!, SysML [PDF]

  3. Federated Learning with Non-IID Data, arXiv [PDF]

  4. CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs, arXiv [PDF]

  5. PrivyNet: A Flexible Framework for Privacy-Preserving Deep Neural Network Training, arXiv [PDF]


2017

  1. Hello Edge: Keyword Spotting on Microcontrollers, arXiv [PDF]

  2. Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations, arXiv [PDF]


2016

  1. Throughput-optimized OpenCL-based FPGA Accelerator for Large-scale Convolutional Neural Networks, FPGA [PDF]