I am Peiwen Sun, a Ph.D. student at the Multimedia Laboratory (MMLab), The Chinese University of Hong Kong (CUHK), advised by Prof. Xiangyu Yue. I received my B.Eng. and M.Eng. degrees from Beijing University of Posts and Telecommunications (BUPT).

My research focuses on multimodal learning. I am particularly interested in audio-visual understanding and generation. My goal is to build systems that perceive, reason about, and generate content across audio, vision, and language in the physical world.

🔥 News

  • 2026.06:  🎉 X-Stream was accepted to ECCV 2026. See you in Malmö.
  • 2026.04:  🎉 SpaceVista was accepted to ICML 2026. See you in Seoul.

🧭 Research Journey

My research has moved through three connected chapters. 👉 Click any stage below to instantly filter the publications to that theme.

🔍 Tap a stage to spotlight its papers  · 

📝 Selected Publications

ECCV 2026
X-Stream

X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding

Peiwen Sun*, Xudong Lu*, Huadai Liu*, Yang Bo, Dongming Wu, Huankang Guan, Minghong Cai, Jinpeng Chen, Xintong Guo, Shuhan Li, Fang Liu, Rui Liu, Xiangyu Yue

ECCV 2026

📄 Paper | 🌐 Page | 💻 Code

  • The first exploration for multi-stream streaming understanding, framing MLLMs as multiplexers over concurrent video streams.
ICML 2026
SpaceVista

SpaceVista: All-Scale Visual Spatial Reasoning from mm to km

Peiwen Sun*, Shiqiang Lang*, Dongming Wu, Yi Ding, Kaituo Feng, Huadai Liu, Zhen Ye, Rui Liu, Yun-Hui Liu, Jianan Wang, Xiangyu Yue

ICML 2026

📄 Paper | 🌐 Page | 💻 Code

  • SpaceVista-1M and SpaceVista-7B for all-scale spatial reasoning across five spatial scales with scale-aware experts.
ICLR 2025 Spotlight
Both Ears Wide Open

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation

Peiwen Sun, Sitong Cheng, Xiangtai Li, Zhen Ye, Huadai Liu, Honggang Zhang, Wei Xue, Yike Guo

🎉🎉🎉 ICLR 2025 Spotlight 🎉🎉🎉

📄 Paper | 🌐 Page | 💻 Code

  • BEWO-1M dataset and the SpatialSonic model for language-driven, controllable stereo (spatial) audio generation.
ACM MM 2024 Oral
Unveiling and Mitigating Bias in AVS

Unveiling and Mitigating Bias in Audio Visual Segmentation

Peiwen Sun, Honggang Zhang, Di Hu

🎉🎉🎉 ACM MM 2024 Oral 🎉🎉🎉

📄 Paper | 🌐 Page | 💻 Code

  • Identifies and mitigates “audio priming bias” and “visual prior” in audio-visual segmentation via active queries and contrastive debiasing.
ECCV 2024
Ref-AVS

Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes

Yaoting Wang*, Peiwen Sun*, Dongzhan Zhou*, Guangyao Li, Honggang Zhang, Di Hu

ECCV 2024

📄 Paper | 🌐 Page | 💻 Code

  • A new task and benchmark that segments objects in videos from natural-language expressions enriched with audio-visual cues.
AAAI 2025
Codec Does Matter

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Zhen Ye, Peiwen Sun, Jiahe Lei, Hongzhan Lin, Xu Tan, Zheqi Dai, Qiuqiang Kong, Jianyi Chen, Jiahao Pan, Qifeng Liu, Yike Guo, Wei Xue

AAAI 2025

📄 Paper | 🌐 Page | 💻 Code

  • X-Codec injects semantic features into the codec to improve audio language models across speech, music, and sound.
ECCV 2024
Stepping Stones

Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation

Juncheng Ma, Peiwen Sun, Yaoting Wang, Di Hu

ECCV 2024

📄 Paper | 🌐 Page | 💻 Code

  • A two-stage progressive training strategy that decouples localization from semantics for audio-visual semantic segmentation.
ECCV 2024
Sounding Object Segmentation Preference

Can Textual Semantics Mitigate Sounding Object Segmentation Preference?

Yaoting Wang*, Peiwen Sun*, Yuanchao Li, Honggang Zhang, Di Hu

ECCV 2024

📄 Paper | 💻 Code

  • Leverages textual semantics to strengthen audio guidance and mitigate the sounding-object segmentation preference in AVS.
Interspeech 2023
Audio-Visual Person Verification

A Method of Audio-Visual Person Verification by Mining Connections between Time Series

Peiwen Sun, Shanshan Zhang, Zishan Liu, Yougen Yuan, Taotao Zhang, Honggang Zhang, Pengfei Hu

Interspeech 2023

📄 Paper

  • Mines temporal connections between audio and visual streams for robust audio-visual person verification.

Selected Other Publications

🎖 Honors and Awards

  • Outstanding Graduate 2025, BUPT.
  • Outstanding Graduate 2022, BUPT.
  • China National Scholarship.
  • MCM/ICM Finalist Award (Mathematical Contest in Modeling).

🧑‍🔬 Academic Service

  • ICLR (2023 – Now), ICML (2024 – Now), NeurIPS (2024 – Now)
  • CVPR (2023 – Now), ECCV (2024 – Now), ICCV (2025 – Now)
  • ACM MM (2024 – Now), COLING (2024 – Now), ICASSP (2024 – Now), Interspeech (2024 – Now)

📖 Educations

  • 2025 - now, Ph.D. Student, Multimedia Laboratory (MMLab), The Chinese University of Hong Kong.
  • 2022 - 2025, M.Eng., Beijing University of Posts and Telecommunications (BUPT).
  • 2018 - 2022, B.Eng., Beijing University of Posts and Telecommunications (BUPT).

💻 Internships

  • 2026, Research Intern, Huawei Hong Kong Research Institute.
  • 2025, Research Intern, Astribot Inc.
  • 2024, Research Intern, HKUST.
  • 2021, Research Intern, Tencent.
  • 2020, Research Intern, Megvii.

🎨 Hobbies

Life outside research keeps me curious. 👉 Click any card below to expand it — explore a world map of my photography, dive into my side projects, or see where the outdoors has taken me.

📷 Photography — an interactive photo map

Capturing everyday moments and the places I travel to — click a pin on the map to see the photos.

Gear I've shot with

  • Camera: Sony α7 V (ILCE-7M5) · Sony FE 24-70mm F2.8 GM II · Tamron 70-300mm F/4.5-6.3 Di III RXD
  • Action camera: Insta360 X5 · DJI Osmo Action 5 Pro · GoPro HERO11
  • Gimbal camera: DJI Osmo Pocket 3 · DJI Pocket 2
💻 Coding for daily life — small side projects & tools

Building small, handy tools that solve everyday problems.

  • WhoGoesConf — find which of a researcher’s Scholar co-authors also have a paper at a target conference (e.g., ECCV 2026).
🏔️ Outdoor sports — from 5000 km rides to 100+ dives
  • Cycling — 600 km in 6 days
  • Motorcycling — Beijing to Tibet, 5000 km
  • Hiking — Annapurna Base Camp (ABC) Trek, Nepal
  • Scuba diving — 100+ dives across Southeast Asia
  • Freediving — AIDA 3 diver & spearfishing
  • Surfing — I can do this all day
  • Sailing — RYA (Royal Yachting Association) certified