Paper-Conference | Tianchen's Profile

[ICLR 25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

We introduce ViDiT-Q, a quantization method specialized for diffusion transformers. For popular large-scale models (e.g., open-sora, Latte, Pixart-α, Pixart-Σ) for the video and image generation task, ViDiT-Q could achieve W8A8 quantization without metric degradation, and W4A8 without notable visual quality degradation.

Tianchen Zhao, Tongcheng Fang, Haofeng Huang, Enshu Liu, Rui Wan, Widyadewi Soedarmadji, Shiyao Li, Zinan Lin, Guohao Dai, Shengen Yan, Huazhong Yang, Xuefei Ning, Yu Wang

[ICLR 25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation

[ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization

We design MixDQ, a mixed-precision quantization framework that successfully tackles the challenging few-step text-to-image diffusion model quantization. With negligible visual quality degradation and content change, MixDQ could achieve W4A8, with equivalent 3.4x memory compression and 1.5x latency speedup.

Tianchen Zhao, Xuefei Ning, Tongcheng Fang, Enshu Liu, Guyue Huang, Zinan Lin, Shengen Yan, Guohao Dai, Yu Wang

[ECCV24] MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization

[CVPR 24] FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models

We propose an efficient diffusion model evaluation method that condense the textual evaluation dataset, achieving evaluation quality with 5x larger data size.

Lin Zhao, Tianchen Zhao, Zinan Lin, Xuefei Ning, Guohao Dai, Huazhong Yang, Yu Wang

[CVPR 24] FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models

[ICCV 23] Ada3D: Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection

We propose an adaptive inference method for 3D perception that could reduce 60% 3D voxels, 80% 2D pixels, achieving 5x FLOPs/Memory and 1.4x Latency.

Tianchen Zhao, Xuefei Ning, Ke Hong, Zhongyuan Qiu, Pu Lu, Linfeng Zhang, Yali Zhao, Lipu Zhou, Guohao Dai, Huazhong Yang, Yu Wang

[ICCV 23] Ada3D: Exploiting the Spatial Redundancy with Adaptive Inference for Efficient 3D Object Detection

[CVPR 22] CodedVTR: Codebook-based Sparse Voxel Transformer with Geometric Guidance

We propose CodedVTR (Codebook-based Voxel TRansformer), which improves data efficiency and generalization ability for 3D sparse voxel transformers.

Tianchen Zhao, Niansong Zhang, Xuefei Ning, He Wang, Li Yi, Yu Wang

[CVPR 22] CodedVTR: Codebook-based Sparse Voxel Transformer with Geometric Guidance

[ECCV 20] DSA: More Efficient Budgeted Pruning via Differentiable Sparsity Allocation

We propose DSA(differentiable sparsity allocation), which enalbes the differntial allocation for budgted pruning, acclearting the pruning process by 1.5x.

Xuefei Ning, Tianchen Zhao, Wenshuo Li, Peng Lei, Yu Wang, Huazhong Yang

[ECCV 20] DSA: More Efficient Budgeted Pruning via Differentiable Sparsity Allocation