Tianchen Zhao is a Phd. student in NICS-EFC Lab-(EffAlg) at Dept. EE, Tsinghua University, supervised by Prof. Yu Wang and Dr. Xuefei Ning. He got his bachelor and master degree in the Dept. EE Beihang University in 2020 and 2023. His primary research focus is EfficientML Algorithms and AI Infrastructure for Building Foundation Models.
I’m expected to graduate in June 2027, I’m currently interested in seeking postdoc positions and industrial opportunities, plzzzzz contact me if you are interested 👋✨ You could find my CV and 简历.

Line 2: Infra for Agentic RL: Long-tail Rollout
- [ECCV'20] DSA: Differentiable Structure Pruning for CNNs
- [CVPR'24] FlashEval: AutoML-based Efficient Data Selection for Evaluation
- [ICCV'23] Ada3D: Efficient adaptive dynamic architecture for 3D point cloud understanding
- [CVPR'22] CodedVTR: Novel Codebook-based 3D Attention for 3D Transformer Backbone Design
(Algo System Co-opt for Sparse/Quant)
- [ECCV'24] MixDQ: Mixed-precision Quantizaiton for GEMMs in VisualGen
- [ICLR'25] ViDiT-Q: Quantization for Diffusion Transformers in VisualGen
- [DAC'25] PARO: Accelerator for Mixed-precision Quantization for Attention in VisualGen
- [NeurIPS'25] PAROAttention: Sparse & Quant for Attention for VisualGen
- [MLSys'25] dp-SP: Multi-GPU Load Balancing for Sparse Attention in VisualGen
- [Ongoing] TideQuant: Algo-level improvement for FP4 Quantization.
(Multi-Agent GRPO)
(Low-latency Tile-based Runtime)
- (Ongoing) Tile-based Runtime Develop Tile-based Runtime for Latency-sensitive Agentic long-tailed Rollout
(Improved Flow Matching for Better Sampling Efficiency)
- [ECCV'26 Sub.] StreamingVLA: Streaming flow matching for async execution for VLAs.
- [Ongoing] Streaming Forcing: Streaming flow matching for frame-wise AR video gen models.
Line 1: EfficientML & Sampling for VisualGen Foundation Model Design

Agent Loop Orchestration
Multi-Request Scheduling
How Sampling Produces Tokens (MTP)
Per-Model Engine for Single-Token Decoding
Per-Operator Path (e.g. Attn/MoE)
Agentic RL post-training algorithms (GRPO)
and infrastructure practice (VeRL)
Line 2: Infra for Agentic RL: Long-tail Rollout
Novel flow matching formulation for streaming VLA and videogen
Line 1: EfficientML & Sampling for VisualGen
Sparse and Quantization: from algorithm to kernel design. (Mixed-precision quantization & Sparse Attention for image and video dits.)
[MLSys’25] dp-SP
Multi-GPU sparse attention load balancing
Tile-based Megakernel-like Runtime targeted as low latency inference for small batch long sequence rollouts.