ExpPortrait: Expressive Portrait Generation via Personalized Representation

CVPR 2026

Junyi Wang, Yudong Guo, Boyang Guo, Shengming Yang, Juyong Zhang,
University of Science and Technology of China,

ExpPortrait introduces a fine-grained and disentangled head representation as the control signal, and leverages the generalization ability of DiT to achieve portrait animation with high controllability, strong identity consistency, and rich expressiveness.

Abstract

While diffusion models have shown great potential in portrait generation, generating expressive, coherent, and controllable cinematic portrait videos remains a significant challenge. Existing intermediate signals for portrait generation, such as 2D landmarks and parametric models, have limited disentanglement capabilities and cannot express personalized details due to their sparse or low-rank representation. Therefore, existing methods based on these models struggle to accurately preserve subject identity and expressions, hindering the generation of highly expressive portrait videos. To overcome these limitations, we propose a high-fidelity personalized head representation that more effectively disentangles expression and identity. This representation captures both static, subject-specific global geometry and dynamic, expression-related details. Furthermore, we introduce an expression transfer module to achieve personalized transfer of head pose and expression details between different identities. We use this sophisticated and highly expressive head model as a conditional signal to train a diffusion transformer (DiT)-based generator to synthesize richly detailed portrait videos. Extensive experiments on self- and cross-reenactment tasks demonstrate that our method outperforms previous models in terms of identity preservation, expression accuracy, and temporal stability, particularly in capturing fine-grained details of complex motion.

Motivation

Method

To establish a faithful mapping from the driving frames to the reference identity, we first build a personalized head representation that captures the subject’s identity and expression space. We then introduce an identity-dependent expression transfer module to robustly transfer poses and expressions across identities. Finally, we fine-tune a video diffusion model , conditioning it on our personalized, detail-rich head representation to synthesize the final high-fidelity video.

Personalized Head Representation

Here are some visual results of our Personalized Head Representation. Our representation achieves a great disentanglement of identity and expression while preserving strong 3D consistency across views and time.

Comparison

We compare our work with Follow-Your-Emoji, LivePortrait, AniPortrait, X-NeMo, and HunyuanPortrait. Our method remarkably outperforms other methods in expressiveness and consistency.

Novel View Synthesis

ExpPortrait preserves consistent expression and identity across varying viewpoints.


BibTeX


      @article{wang2026expportrait,
        title={ExpPortrait: Expressive Portrait Generation via Personalized Representation},
        author={Wang, Junyi and Guo, Yudong and Guo, Boyang and Yang, Shengming and Zhang, Juyong},
        booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
        year={2026}
      }