Tianchen Zhao 赵天辰 (Ziu Tinsan)

Phd. Student at Tsinghua University

Tsinghua University

Biography

Tianchen Zhao is a Phd. student in NICS-EFC Lab-(EffAlg) at Dept. EE, Tsinghua University, supervised by Prof. Yu Wang and Dr. Xuefei Ning. He got his bachelor and master degree in the Dept. EE Beihang University in 2020 and 2023. His primary research focus is EfficientML Algorithms and AI Infrastructure for Building Foundation Models.

I’m expected to graduate in June 2027, I’m currently interested in seeking postdoc positions and industrial opportunities, plzzzzz contact me if you are interested 👋✨ You could find my CV and 简历.

News

[🎉 2025-10] One co-authored paper StreamingVLA is public on Arxiv, covered by covered by MachineIntelligence -机器之心
[🎉 2026-01] One co-authored paper db-SP is accepted at MLSys'25
[🎉 2025-10] Our paper PAROAttn is accepted at NeurIPS'25, covered by MachineIntelligence -机器之心 and VitalBridge-绿洲资本
[🚀 2025-07] Join Miromind AI as a research intern, working on RL post training of DeepResearch Agent.
[🎉 2025-01] One co-authored paper PARO is accepted at DAC'25
[🚀 2025-02] Join ByteDance as a research intern, working on efficient visual generation
[🎉 2025-01] Our paper ViDiT-Q is accepted at ICLR'25
[🎙️ 2024-12] Give a talk at TigerLab at the University of WaterLoo about recent diffusion quantization research.
[🤝 2024-12] Participate in a talk given by the NICS-EffAlg group at AI Time.
[🎉 2024-07] Our paper MixDQ is accepted at ECCV'24
[🎉 2024-03] Our paper FlashEval is accepted at CVPR'24
[🎉 2023-12] One co-authored paper is accepted at DATE'24
[🎙️ 2023-09] Give a talk at TechBeat about our work about Efficient 3D Perception.
[🎓 2023-09] Officially become a member of NICS-EFC Lab, starting my Phd. program.
[🎉 2023-07] Our paper Ada3D is accepted at ICCV'23, check the Project Page.
[🚀 2023-05] Join Infinigence as a research intern.

Research Timeline

Q: How to build efficient foundation models in an efficient way

Line 2: Infra for Agentic RL: Long-tail Rollout

NAS/AutoML

- [ECCV'20] DSA: Differentiable Structure Pruning for CNNs

- [CVPR'24] FlashEval: AutoML-based Efficient Data Selection for Evaluation

3D

- [ICCV'23] Ada3D: Efficient adaptive dynamic architecture for 3D point cloud understanding

- [CVPR'22] CodedVTR: Novel Codebook-based 3D Attention for 3D Transformer Backbone Design

EfficientML for Model Arch.

(Algo System Co-opt for Sparse/Quant)

Intern@Infinigence

- [ECCV'24] MixDQ: Mixed-precision Quantizaiton for GEMMs in VisualGen

- [ICLR'25] ViDiT-Q: Quantization for Diffusion Transformers in VisualGen

- [DAC'25] PARO: Accelerator for Mixed-precision Quantization for Attention in VisualGen

Intern@ByteDance

- [NeurIPS'25] PAROAttention: Sparse & Quant for Attention for VisualGen

- [MLSys'25] dp-SP: Multi-GPU Load Balancing for Sparse Attention in VisualGen

- [Ongoing] TideQuant: Algo-level improvement for FP4 Quantization.

Agentic RL Post-train

(Multi-Agent GRPO)

Intern@MiroMind

- RL with VeRL : multi-agent GRPO and context management RL post train.

RL Infra: Tile-based Engine & Backend Schedule

(Low-latency Tile-based Runtime)

- (Ongoing) Tile-based Runtime Develop Tile-based Runtime for Latency-sensitive Agentic long-tailed Rollout

Efficient Sampling for Diffusion

(Improved Flow Matching for Better Sampling Efficiency)

- [ECCV'26 Sub.] StreamingVLA: Streaming flow matching for async execution for VLAs.

- [Ongoing] Streaming Forcing: Streaming flow matching for frame-wise AR video gen models.

Line 1: EfficientML & Sampling for VisualGen Foundation Model Design

2020 2023 2026

Tips: You could click to get more information for each work.

Research Framework

Agent Loop Orchestration

Orchestration

Multi-Request Scheduling

Scheduling

How Sampling Produces Tokens (MTP)

Sampling

Per-Model Engine for Single-Token Decoding

Engine

Per-Operator Path (e.g. Attn/MoE)

Operator

Multi-Agent RL
Training

Context
Management

Agentic RL post-training algorithms (GRPO)
and infrastructure practice (VeRL)

Line 2: Infra for Agentic RL: Long-tail Rollout

[ECCV’26 Sub.]
StreamingVLA

[On-Going.]
Streaming Forcing

Novel flow matching formulation for streaming VLA and videogen

Line 1: EfficientML & Sampling for VisualGen

Sparse and Quantization: from algorithm to kernel design. (Mixed-precision quantization & Sparse Attention for image and video dits.)

[MLSys’25] dp-SP

Multi-GPU sparse attention load balancing

[ECCV’24] MixDQ

[ICLR’25] ViDiT-Q

[DAC’25] PARO

[NeurIPS’25] PAROAttn.

[On-Going.] TideQuant

Tile-based Megakernel-Style Runtime Design

Tile-based Megakernel-like Runtime targeted as low latency inference for small batch long sequence rollouts.

Tips: You could click to get more information for each work.

Publications

[ICLR 25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

We introduce ViDiT-Q, a quantization method specialized for diffusion transformers. For popular large-scale models (e.g., open-sora, Latte, Pixart-α, Pixart-Σ) for the video and image generation task, ViDiT-Q could achieve W8A8 quantization without metric degradation, and W4A8 without notable visual quality degradation.

Tianchen Zhao, Tongcheng Fang, Haofeng Huang, Enshu Liu, Rui Wan, Widyadewi Soedarmadji, Shiyao Li, Zinan Lin, Guohao Dai, Shengen Yan, Huazhong Yang, Xuefei Ning, Yu Wang