PRLab @ Nanjing University

About

Chenyang Si is a Tenure-Track Associate Professor with PRLab at the School of Intelligence Science and Technology, Nanjing University (Suzhou Campus). Prior to this, he was a Research Fellow at Nanyang Technological University (NTU), Singapore, working with Prof. Ziwei Liu. Before that, he worked as a Research Scientist at the Sea AI Lab of Sea Group. He received his Ph.D. degree in 2021 from CASIA, supervised by Prof. Tieniu Tan, co-supervised by Prof. Liang Wang and Prof. Wei Wang.

His research interests span visual understanding and generation, including fundamental architectures for computer vision, video understanding, generative models, video and image generation, as well as acceleration and optimization of generative models.

Research Interests

Video Generation · Diffusion Models · Visual Understanding · World Models · Embodied AI · Agent · Efficient Generative Models · Evaluation Benchmarks

Academic Services

Area Chair BMVC 2024, BMVC 2025, CVPR 2026

Conference Reviewer CVPR, ICCV, NeurIPS, ECCV, AAAI, ICLR

Journal Reviewer TPAMI, TIP, TMM, TCSVT, IJCV

Research Directions

Video Generation

Our flagship research direction. We study generative models for high-quality and controllable video synthesis, including diffusion-based video models, consistency models, and efficient video generation architectures. Our work spans the full pipeline from foundational architecture design to practical deployment optimization.

RepVideo Vchitect-2.0 LaVie

Agent

We study the core mechanisms that enable autonomous agents to operate in open-ended environments. Our research focuses on agent memory architectures for long-horizon reasoning, agentic reinforcement learning that combines LLM-based planning with RL-driven optimization, agent sandboxes for safe and reproducible evaluation, and agent-driven social and world simulation to model complex multi-agent dynamics at scale.

Embodied AI

We explore embodied intelligence paradigms where agents learn through physical interaction with dynamic environments. Our work focuses on integrating world knowledge from large-scale generative models into embodied systems for manipulation, navigation, and planning in real-world settings.

World Models

We build world models that capture the underlying physical dynamics and causal structure of real-world environments through video prediction and simulation. Beyond passive world simulation, we are actively exploring World Action Models (WAMs) that jointly model perception, dynamics, and action generation for zero-shot policy learning.

FreeU FreeInit DaS VideoBooth

Unified Visual Generation & Understanding

We pursue unified architectures that bridge the fundamental divide between visual generation and understanding within a single framework. Our research investigates how diffusion-based and autoregressive models can serve as a shared backbone for both discriminative and generative tasks, while also exploring diffusion language models to unify vision and language at a deeper representational level.

MetaFormer iFormer Semantic Prompt

Efficient Generative Models

We investigate training-free and training-based acceleration methods for large generative models, reducing inference cost while maintaining generation quality.

FasterCache DCM

Evaluation & Benchmarks

We design comprehensive evaluation benchmarks for video generative models (e.g., VBench) to enable systematic and fair assessment of generation quality.

VBench VBench++

Recent News

2025.07

DCM (Dual-Expert Consistency Model for Efficient Video Generation) and TACA (Cross-Modal Interaction in Multimodal Diffusion Transformers) are accepted by ICCV 2025. Congratulations!

2025.07

FreeMorph (Tuning-Free Generalized Image Morphing with Diffusion Model) is accepted by ICCV 2025. Congratulations!

2025.05

DaS (Diffusion as Shader: 3D-aware Video Diffusion) is accepted by SIGGRAPH 2025 Conference Track and selected in the SIGGRAPH Video Trailer!

PDF Project

2025.01

FasterCache (Training-Free Video Diffusion Model Acceleration) is accepted by ICLR 2025.

PDF Project

2025.01

RepVideo (Rethinking Cross-Layer Representation for Video Generation) is released on arXiv.

PDF Project Code

2024.09

MAN (Momentum Auxiliary Network for Supervised Local Learning) is accepted by ECCV 2024 as an Oral presentation. Congratulations!

2024.07

FreeInit, HPFF are accepted by ECCV 2024. Congratulations!

2024.02

FreeU is accepted by CVPR 2024 as an Oral presentation, and VBench is accepted as a Highlight. Congratulations!

FreeU Project VBench Project

2024.01

PRLab is established at the School of Intelligence Science and Technology, Nanjing University. Welcome to join us!

Join Us

We are actively seeking highly motivated students and researchers to join PRLab at Nanjing University. Our lab focuses on cutting-edge research in visual understanding and generation, with a particular emphasis on video generation and diffusion models. If you are interested in applying, please refer to this Zhihu Post and fill out the Google Form.

Ph.D. Students

Recruiting for 2027 Fall
Strong CS / AI background
Solid Python and PyTorch skills
Passion for computer vision research
Good English communication

Master's Students

Recruiting for 2027 Fall
Undergraduate in CS or related fields
Strong math foundation
Deep learning framework experience
Collaborative and self-driven

Research Interns

Undergraduate / visiting students
At least 3-6 months commitment
Prior research experience preferred
Remote or on-site both welcome
Future Ph.D. applicants encouraged

Interested in joining PRLab?

Please send your CV and a brief research statement. We look forward to hearing from you.

chenyang.si@nju.edu.cn