Peiwen Sun

I am a Researcher at BUPT, focused on machine learning for emotion/sentiment understanding and multimodal understanding. After I graduated from BUPT, I was recommended to the College of artificial intelligence in BUPT for a master's degree. I have recently been a visiting student at Renmin University of China supervised by Di Hu. Heading to my PhD in future.

I used to be an intern at Megvii and Tencent to do researches and projects of audio-visual emotion/sentiment understanding and multimodal understanding.

CV  /  GitHub  /  Page

profile photo
Recent News

🎉7/16/2024: 2 papers are accepted by ACM MM 2024. See you in Melbourne.

🎉7/1/2024: 3 papers are accepted by ECCV 2024. See you in Milan.

Selected Papers

My research focuses on audio-visual learning, multi-modal machine learning, with downstream task of person recognition, emtion/sentiment recognition techniques for video recognition, including the use of sound and images to learn better representations.

mvgpt Unveiling and Mitigating Bias in Audio Visual Segmentation
PeiwenSun, Honggang Zhang, Di Hu

ACM MM (ORAL), 2024
Arxiv

Unveiling and mitigating bias caused by real-world inherent preferences and distributions in Audio Visual Segmention.

mvgpt Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes
Yaoting Wang*, PeiwenSun*, Dongzhan Zhou*, Guangyao Li, Honggang Zhang, Di Hu
*: equal contribution

ECCV, 2024
Arxiv

A novel task of referring segmenation by audio, visual and temporal information.

mvgpt Can Textual Semantics Mitigate Sounding Object Segmentation Preference?
Yaoting Wang*, PeiwenSun*, Yuanchao Li, Honggang Zhang, Di Hu
*: equal contribution

ECCV, 2024
Arxiv

The ability of LLM can be used to solve segmenation preference on Audio Visual Segmention.

mvgpt A Method of Audio-Visual Person Verification by Mining Connections between Time Series
PeiwenSun, Shanshan Zhang, Zishan Liu, Yougen Yuan, Taotao Zhang, Honggang Zhang, Pengfei Hu

INTERSPEECH, 2023
ISCA

A novel audio-visual strategy in person verification that considers connections between time series from a generative perspective.

mvgpt More than Vanilla Fusion: a Simple, Decoupling-free, Attention Module for Multimodal fusion Based on Information Theory
PeiwenSun, Yifan Zhang, Zishan Liu, Donghao Chen, Honggang Zhang

arxiv, 2024
Arxiv

This paper reconsiders the information fused in the multimodal case from a bionics perspective and proposes a simple, plug-and-play, attention module for vanilla fusion based on fundamental information theory and uncertainty theory.

Other papers

I have been exposed to a variety of research directions such as few-shot learning, medical image processing, mathematical modelling etc.

mvgpt FlashSpeech: Efficient Zero-Shot Speech Synthesis
Zhen Ye, Zeqian Ju, Haohe Liu, Xu Tan, Jianyi Chen, Yiwen Lu, PeiwenSun, Jiahao Pan, Bianweizhen, Shulin He, Wei Xue, Qifeng Liu, Yike Guo

ACM MM, 2024
Arxiv

A large-scale zero-shot speech synthesis system with approximately 5% of the inference time compared with previous work.

mvgpt Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation
Juncheng Ma, PeiwenSun, Yaoting Wang, Di Hu

ECCV, 2024
Arxiv

A two-stage training strategy for AVS, which decomposes the AVSS task into two simple subtasks from localization to semantic understanding, which to achieve step-by-step global optimization.

mvgpt A Cascading Patch-based Learning Framework for Few-shot Classification
Wangding Zeng*, PeiwenSun* , Xinying Zhao, Guanlin Wu, Yang Tang, Honggang Zhang
*: equal contribution

IJCNN 2024  
Not yet released

In this paper, we propose a series of strategies to train a better backbone for few-shot learn.

mvgpt Predicting Central Cervical Lymph Node Metastasis of Papillary Thyroid Carcinomas Using Multi-view Ultrasound Images
Zishan Liu, PeiwenSun, Donghao Chen, Honggang Zhang, Yingying Li

MICAD 2023  
Springer

In this paper, the popular semantic segmentation network is firstly applied in thyroid nodule classification in ultrasound images.

mvgpt Race Against Fire
PeiwenSun, Wenjing Ye, Wenqing Yu
Mathematical Contest In Modeling, 2021

Finalist Award  
github

A modeling scheme for the layout and number of drones to extinguish and monitor wildfires in southern Australia.