Pattern & Recognition Lab

Chenyang Si

Associate Professor
School of Intelligence Science and Technology
Nanjing University

Nanjing University Suzhou Campus, No. 1520 Taihu Avenue, Huqiu District, Suzhou, Jiangsu
Chenyang Si

About

Chenyang Si is a Tenure-Track Associate Professor with PRLab at the School of Intelligence Science and Technology, Nanjing University (Suzhou Campus). Prior to this, he was a Research Fellow at Nanyang Technological University (NTU), Singapore, working with Prof. Ziwei Liu. Before that, he worked as a Research Scientist at the Sea AI Lab of Sea Group. He received his Ph.D. degree in 2021 from CASIA, supervised by Prof. Tieniu Tan, co-supervised by Prof. Liang Wang and Prof. Wei Wang.

His research interests span visual understanding and generation, including fundamental architectures for computer vision, video understanding, generative models, video and image generation, as well as acceleration and optimization of generative models.

We are actively recruiting! Openings for 2027 Ph.D. / Master's students and undergraduate / graduate Research Assistants.
Join Us

Research Interests

Video Generation · Diffusion Models · Visual Understanding · World Models · Embodied AI · Agent · Efficient Generative Models · Evaluation Benchmarks

Academic Services

Area Chair BMVC 2024, BMVC 2025, CVPR 2026
Conference Reviewer CVPR, ICCV, NeurIPS, ECCV, AAAI, ICLR
Journal Reviewer TPAMI, TIP, TMM, TCSVT, IJCV

Research Directions

Video Generation

Our flagship research direction. We study generative models for high-quality and controllable video synthesis, including diffusion-based video models, consistency models, and efficient video generation architectures. Our work spans the full pipeline from foundational architecture design to practical deployment optimization.

Agent

We study the core mechanisms that enable autonomous agents to operate in open-ended environments. Our research focuses on agent memory architectures for long-horizon reasoning, agentic reinforcement learning that combines LLM-based planning with RL-driven optimization, agent sandboxes for safe and reproducible evaluation, and agent-driven social and world simulation to model complex multi-agent dynamics at scale.

Embodied AI

We explore embodied intelligence paradigms where agents learn through physical interaction with dynamic environments. Our work focuses on integrating world knowledge from large-scale generative models into embodied systems for manipulation, navigation, and planning in real-world settings.

World Models

We build world models that capture the underlying physical dynamics and causal structure of real-world environments through video prediction and simulation. Beyond passive world simulation, we are actively exploring World Action Models (WAMs) that jointly model perception, dynamics, and action generation for zero-shot policy learning.

Unified Visual Generation & Understanding

We pursue unified architectures that bridge the fundamental divide between visual generation and understanding within a single framework. Our research investigates how diffusion-based and autoregressive models can serve as a shared backbone for both discriminative and generative tasks, while also exploring diffusion language models to unify vision and language at a deeper representational level.

Efficient Generative Models

We investigate training-free and training-based acceleration methods for large generative models, reducing inference cost while maintaining generation quality.

Evaluation & Benchmarks

We design comprehensive evaluation benchmarks for video generative models (e.g., VBench) to enable systematic and fair assessment of generation quality.

Recent News

2025.07
DCM (Dual-Expert Consistency Model for Efficient Video Generation) and TACA (Cross-Modal Interaction in Multimodal Diffusion Transformers) are accepted by ICCV 2025. Congratulations!
2025.07
FreeMorph (Tuning-Free Generalized Image Morphing with Diffusion Model) is accepted by ICCV 2025. Congratulations!
2025.05
DaS (Diffusion as Shader: 3D-aware Video Diffusion) is accepted by SIGGRAPH 2025 Conference Track and selected in the SIGGRAPH Video Trailer!
2025.01
FasterCache (Training-Free Video Diffusion Model Acceleration) is accepted by ICLR 2025.
2025.01
RepVideo (Rethinking Cross-Layer Representation for Video Generation) is released on arXiv.
2024.09
MAN (Momentum Auxiliary Network for Supervised Local Learning) is accepted by ECCV 2024 as an Oral presentation. Congratulations!
2024.07
FreeInit, HPFF are accepted by ECCV 2024. Congratulations!
2024.02
FreeU is accepted by CVPR 2024 as an Oral presentation, and VBench is accepted as a Highlight. Congratulations!
2024.01
PRLab is established at the School of Intelligence Science and Technology, Nanjing University. Welcome to join us!

Join Us

We are actively seeking highly motivated students and researchers to join PRLab at Nanjing University. Our lab focuses on cutting-edge research in visual understanding and generation, with a particular emphasis on video generation and diffusion models. If you are interested in applying, please refer to this Zhihu Post and fill out the Google Form.

Ph.D. Students
  • Recruiting for 2027 Fall
  • Strong CS / AI background
  • Solid Python and PyTorch skills
  • Passion for computer vision research
  • Good English communication
Master's Students
  • Recruiting for 2027 Fall
  • Undergraduate in CS or related fields
  • Strong math foundation
  • Deep learning framework experience
  • Collaborative and self-driven
Research Interns
  • Undergraduate / visiting students
  • At least 3-6 months commitment
  • Prior research experience preferred
  • Remote or on-site both welcome
  • Future Ph.D. applicants encouraged

Interested in joining PRLab?

Please send your CV and a brief research statement. We look forward to hearing from you.

chenyang.si@nju.edu.cn