😊 About Me

I’m a PhD student at The Chinese University of Hong Kong, supervised by Professor JIA Jiaya and Professor YU Bei. Before that, I obtained my master degree at AIM3 Lab, Renmin University of China, under the supervision of Professor JIN Qin. I received my Bachelor’s degree in 2021 from South China University of Technology.

My research interest includes Computer Vision and Multi-modal Large Language Models. Here is my google scholar page.

πŸ”₯ News

  • 2025.10: Β πŸŽ‰πŸŽ‰ We are excited to release ViSurf!
  • 2025.06: Β πŸŽ‰πŸŽ‰ Lyra is accepted by ICCV 2025!
  • 2025.05: Β πŸŽ‰πŸŽ‰ We are excited to release VisionReasoner!
  • 2025.03: Β πŸŽ‰πŸŽ‰ We are excited to release Seg-Zero!
  • 2024.07: Β πŸŽ‰πŸŽ‰ One paper is accepted by ACMMM 2024!
  • 2022.11: Β πŸŽ‰πŸŽ‰ One paper is accepted by AAAI 2023!
  • 2022.10: Β πŸŽ‰πŸŽ‰ Our team rank the 1st in Trecvid 2022 VTT task!
  • 2022.05: Β πŸŽ‰πŸŽ‰ One paper is accepted by ECCV 2022!

πŸ“ Publications

Arxiv preprint
sym

ViSurf: Visual Supervised-and-Reinforcement Fine-Tuning for Large Vision-and-Language Models

Yuqi Liu, Liangyu Chen, Jiazhen Liu, Mingkang Zhu, Zhisheng Zhong, Bei Yu, Jiaya Jia

Project Page

  • ViSurf (Visual Supervised-and-Reinforcement Fine-Tuning) is a unified post-training paradigm that integrates the strengths of both SFT and RLVR within a single stage.
Arxiv preprint
sym

VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning

Yuqi Liu* , Tianyuan Qu* , Zhisheng Zhong, Bohao Peng, Shu Liu, Bei Yu, Jiaya Jia

Project Page[code]

  • VisionReasoner is a unified framework for visual perception tasks.
  • Through carefully crafted rewards and training strategy, VisionReasoner has strong multi-task capability, addressing diverse visual perception tasks within a shared model.
Arxiv preprint
sym

Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement

Yuqi Liu* , Bohao Peng* , Zhisheng Zhong, Zihao Yue, Fanbin Lu, Bei Yu, Jiaya Jia

Project Page[code]

  • Seg-Zero exhibits emergent test-time reasoning ability. It generates a reasoning chain before producing the final segmentation mask.
  • Seg-Zero is trained exclusively using reinforcement learning, without any explicit supervised reasoning data.
  • Compared to supervised fine-tuning, our Seg-Zero achieves superior performance on both in-domain and out-of-domain data.
ICCV 2025
sym

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

Zhisheng Zhong*, Chengyao Wang*, Yuqi Liu*, Senqiao Yang,Longxiang Tang, Yuechen Zhang, Jingyao Li, Tianyuan Qu, Yanwei Li, Yukang Chen, Shaozuo Yu, Sitong Wu, Eric Lo, Shu Liu, Jiaya Jia

Project Page[code]

  • Stronger performance: Achieve SOTA results across a variety of speech-centric tasks.
  • More versatile: Support image, video, speech/long-speech, sound understanding and speech generation.
  • More efficient: Less training data, support faster training and inference.
ACM MM 2024
sym

Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval

Yang Du*, Yuqi Liu*, Qin Jin

  • A benchmark aims to evaluate temporal understanding of video retrieval models.
AAAI 2023
sym

Token Mixing: Parameter-Efficient Transfer Learning from Image-Language to Video-Language

Yuqi Liu, Luhui Xu, Pengfei Xiong, Qin Jin

Project Page

  • We study how to transfer knowledge from image-language model to video-language tasks.
  • We have implemented several components proposed by recent works.
ECCV 2022
sym

TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval

Yuqi Liu, Pengfei Xiong, Luhui Xu, Shengming Cao, Qin Jin

Project Page

  • TS2-Net is a text-video retrieval model based on CLIP.
  • We propose our token shift transformer and token selection transformer.

πŸ“– Educations

  • 2024.08 - 2028.06 (Expect), Ph.D., Department of Computer Science and Engineering, The Chinese University of Hong Kong.
  • 2021.09 - 2024.06, M.Phil., School of Information, Renmin University of China.
  • 2017.09 - 2021.06, B.E., School of Software Engineering, South China University of Technology.

πŸ“• Teaching

  • 2025 Fall, CSCI1580
  • 2025 Spring, ENGG2020
  • 2024 Fall, CSCI3170