About Me

Hi! I’m Haibin Wu, a senior applied scientist at Microsoft. I got my Ph.D. degree at National Taiwan University, working with Prof. Hung-yi Lee and Prof. Lin-shan Lee in the area of machine learning and speech processing. My expertise lies in speech language models, speech generation, neural audio codecs, speech enhancement, and deepfake detection. By the way, I was fortunate enough to be funded by a Google PhD Fellowship. I’m a main contributor for S3PRL with 2400+ GitHub stars. I have a keen interest in photography, and you can find my portfolio on my homepage.

Selected Publications

Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
Haibin Wu†, Yuxuan Hu†, Ruchao Fan, Xiaofei Wang, Kenichi Kumatani, Bo Ren, Jianwei Yu, Heng Lu, Lijuan Wang, Yao Qian, Jinyu Li
[ pdf]
Phi-Omni-ST: A multimodal language model for direct speech-to-speech translation
Haibin Wu†, Yuxuan Hu†, Ruchao Fan, Xiaofei Wang, Heng Lu, Yao Qian, Jinyu Li
[ pdf]
On The Landscape of Spoken Language Models: A Comprehensive Survey
Haibin Wu†, Siddhant Arora†, Kai-Wei Chang†, Chung-Ming Chien†, Yifan Peng†, Yossi Adi, Emmanuel Dupoux, Hung-Yi Lee, Karen Livescu, Shinji Watanabe
[ pdf]
TS3-Codec: Transformer-Based Simple Streaming Single Codec
Haibin Wu, Naoyuki Kanda, Sefik Emre Eskimez, Jinyu Li
Interspeech 2025
[ pdf]
Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
Haibin Wu, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Daniel Tompkins, Chung-Hsien Tsai, Canrun Li, Zhen Xiao, Sheng Zhao, Jinyu Li, Naoyuki Kanda
SLT 2024
[ pdf | Webpage | Github]
Ultra-Low Latency Speech Enhancement - A Comprehensive Study
Haibin Wu, Sebastian Braun
ICASSP 2025
[ pdf]
Codec-SUPERB: An In-Depth Analysis of Sound Codec Models
Haibin Wu, Ho-Lam Chung, Yi-Cheng Lin, Yuan-Kuei Wu, Xuanjun Chen, Yu-Chi Pai, Hsiu-Hsuan Wang, Kai-Wei Chang, Alexander H. Liu, Hung-yi Lee
ACL 2024 Finding
[ pdf | Github | Leaderboard | Huggingface]
CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems
Haibin Wu, Yuan Tseng, Hung-yi Lee
Interspeech 2024
[ pdf | Webpage]
SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts
Haibin Wu, Kai-Wei Chang, Yuan-Kuei Wu, Hung-yi Lee
[ pdf | Webpage | Github]
Rethinking complex-valued deep neural networks for monaural speech enhancement
Haibin Wu, Ke Tan, Buye Xu, Anurag Kumar, Daniel Wong
Interspeech 2023
[ pdf]

For the complete list, please visit google scholar.

Research Experience

Senior applied scientist at Microsoft Sep 2024 - Now
Research scientist intern at Microsoft May 2024 - Aug 2024
Research scientist intern at Microsoft Feb 2024 - May 2024
Research scientist intern at Meta May 2023 - Sep 2023
Applied scientist intern at Amazon Sep 2022 - Dec 2022
Research scientist intern at Meta May 2022 - Aug 2022
Visiting Student at the Chinese University of Hong Kong May 2021 - April 2022
Visiting Student at SIGS of Tsinghua University Aug. 2020 - May 2021
Intern at Tencent Jan. 2021 - May 2021

Challenge

2022 ICASSP ADD challenge track 2 Rank: 2/33
2022 ICASSP M2MeT challenge track 1 Rank: 2/14

Honers

Google studnet travel grant Google 2024
ICASSP travel grant ICASSP 2024
Interspeech travel grant Interspeech 2022
Appier Scholarship Appier 2022
Google PHD Fellowship Google 2021
Advanced Speech Technologies Scholarship NTU EECS 2020
Academic Achievement Award NCTU EECS 2019
Academic Achievement Award NCTU EECS 2018
National Scholarship Chinese Ministry of Education 2014