Xinchen Zhang

I am a first-year master student at IIGroup in Tsinghua University, supervised by Prof. Yujiu Yang. I received my Bachelor's degree at School of Artificial Intelligence, Xidian University.

I am currently a research intern at ByteDance Seed, focusing on reinforcement learning in multimodal large language models. I work closely with Dr. Ling Yang and Prof. Mengdi Wang from the AI Lab at Princeton University. Previously, I was honoured to work with Prof. Hao Zhu and Prof. Licheng Jiao at IPIU in Xidian University.

Email  /  WeChat  /  Github  /  Google Scholar

Welcome to any types of research collaboration and discussions! I am actively looking for research internship opportunities in either academia or industry. Feel free to contact me via email or WeChat.

profile photo
Research

My current research focuses on Multimodal Large Language Models (MLLMs), specifically revolutionizing reinforcement learning for the alignment of unified models and continuously pushing the boundaries of their reasoning and generative ability to unlock full potential in complex scenarios.

Previously, my research centered on text-to-image generation, with a particular emphasis on investigating capabilities of diffusion models under complex and compositional prompts, including IterComp (ICLR'25), RealCompo (NeurIPS'24), HermesFlow, Diffusion-Sharpening.

Research Overview
News
  • [Aug. 2025] RPF-Net is accepted by Pattern Recognition.
  • [Jan. 2025] IterComp is accepted by ICLR 2025.
  • [Nov. 2024] I gave a talk at TechBeat about compositional text-to-image generation.
  • [Oct. 2024] I propose IterComp, leveraging iterative RLHF to achieve fast and realistic text-to-image generation.
  • [Sep. 2024] RealCompo is accepted by NeurIPS 2024.
  • [Feb. 2024] I propose RealCompo, achieving the balance of compositionality and realism in controllable text-to-image generation.
  • [Sep. 2023] Qualified to be exempted from Tsinghua University for postgraduate studies.
  • [May. 2023] Check out our recent work, RPF-Net.
Publications

(* denotes equal contribution.)

profile photo HEAR: High-frequency Enhanced Autoregressive Modeling for Identity-Preserving Image Generation
Shiyi Zhang*, Xinchen Zhang*, Youliang Zhang, Yongxin Xiao, Xiu Li, Jian Song, Yujiu Yang
Under Review

profile photo MMaDA: Multimodal Large Diffusion Language Models
Ling Yang, Ye Tian, Bowen Li, Xinchen Zhang, Ke Shen, Yunhai Tong, Mengdi Wang
arXiv, 2025
Preprint / Code / Checkpoints

profile photo Seed1.5-VL Technical Report
Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, ...Xinchen Zhang, ...
Technical Report
Preprint / Github

profile photo PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning
Yizhen Zhang, Yang Ding, Shuoshuo Zhang, Xinchen Zhang, Haoling Li, Zhong-zhi Li, Peijie Wang, Jie Wu, Lei Ji, Yelong Shen, Yujiu Yang, Yeyun Gong
arXiv, 2025
Preprint / Code

profile photo SparseAR: Not All Visual Tokens Are Crucial in Autoregressive Image Model Training
Ling Yang*, Zhaochen Yu*, Xinchen Zhang*, Peng Cao, Yujiu Yang, Bin Cui, Shuicheng Yan
Under Review

profile photo IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Xinchen Zhang, Ling Yang, Guohao Li, Yaqi Cai, Jiake Xie, Yong Tang, Yujiu Yang, Mengdi Wang, Bin Cui
ICLR 2025
Preprint / Code / Checkpoints (Over 2.6W downloads)

profile photo RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models
Xinchen Zhang, Ling Yang, Yaqi Cai, Zhaochen Yu, Kaini Wang, Jiake Xie, Ye Tian, Minkai Xu, Yong Tang, Yujiu Yang, Bin Cui
NeurIPS 2024
Project page / Preprint / Code

profile photo HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
Ling Yang*, Xinchen Zhang*, Ye Tian, Chenming Shang, Minghao Xu, Wentao Zhang, Bin Cui
arXiv, 2025
Preprint / Code / Checkpoints

profile photo Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening
Ye Tian, Ling Yang, Xinchen Zhang, Yunhai Tong, Mengdi Wang, Bin Cui
arXiv, 2025
Preprint / Code
profile photo Compositional Generalization through Brain-inspired Geometric Constraints on Representation Structure
Chenming Shang, Shiji Zhou, Hengyuan Zhang, Xinchen Zhang, Lei Ke, Yuwang Wang, Yujiu Yang
Under Review

profile photo Recurrent Progressive Fusion-based Learning for Multi-source Remote Sensing Image Classification
Xinchen Zhang, Hao Zhu, Xiaotong Li, Biao Hou, Wenhao Zhao, Xiaoyu Yi, Wenping Ma, Licheng Jiao
Pattern Recognition
Paper / Code
Education
THU logo Tsinghua University
M.Eng. in Big Data Technology and Engineering (2024 - )
Advisor: Prof. Yujiu Yang
XDU logo Xidian University
B.Eng. in Artificial Intelligence (2020 - 2024)
Advisor: Prof. Hao Zhu, Prof. Licheng Jiao
Experience
bytedance logo ByteDance Seed
Research Intern (Feb. 2025 - Present)
Topic: Multimodal Large Language Models
Advisor: Xiaoying Zhang, Youbin Wu, Guang Shi
Services
  • Conference Reviewer:
    • International Conference on Computer Vision (ICCV) 2025
    • International Conference on Machine Learning (ICML) 2025
    • IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2025
    • International Conference on Learning Representations (ICLR) 2025
    • Conference on Neural Information Processing Systems (NeurIPS) 2025
  • Journal Reviewer:
    • International Journal of Computer Vision (IJCV)
Talks
  • IterComp, RealCompo: Towards Compositional Text-to-Image Generation, TechBeat, 2024
Honors & Awards
  • Special Prize Scholarship, 2022
  • First Prize Scholarship, 2021
  • First Prize, The Chinese Mathematics Competitions (CMC), 2021
  • First Prize, China Undergraduate Mathematical Contest in Modeling (CUMCM), 2021
  • First Prize (Meritorious Winner), International Mathematical Contest in Modeling (MCM/ICM), 2022