About Me

Hi! I’m Haibin Wu, a senior applied scientist at Microsoft. I got my Ph.D. degree at National Taiwan University, working with Prof. Hung-yi Lee and Prof. Lin-shan Lee in the area of machine learning and speech processing. My expertise lies in speech language models, speech generation, neural audio codecs, speech enhancement, and deepfake detection. By the way, I was fortunate enough to be funded by a Google PhD Fellowship. I’m a main contributor for S3PRL with 2400+ GitHub stars. I have a keen interest in photography, and you can find my portfolio on my homepage.

Selected Publications

  • Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
    Haibin Wu†, Yuxuan Hu†, Ruchao Fan, Xiaofei Wang, Kenichi Kumatani, Bo Ren, Jianwei Yu, Heng Lu, Lijuan Wang, Yao Qian, Jinyu Li
    [ pdf]

  • Phi-Omni-ST: A multimodal language model for direct speech-to-speech translation
    Haibin Wu†, Yuxuan Hu†, Ruchao Fan, Xiaofei Wang, Heng Lu, Yao Qian, Jinyu Li
    [ pdf]

  • On The Landscape of Spoken Language Models: A Comprehensive Survey
    Haibin Wu†, Siddhant Arora†, Kai-Wei Chang†, Chung-Ming Chien†, Yifan Peng†, Yossi Adi, Emmanuel Dupoux, Hung-Yi Lee, Karen Livescu, Shinji Watanabe
    [ pdf]

  • TS3-Codec: Transformer-Based Simple Streaming Single Codec
    Haibin Wu, Naoyuki Kanda, Sefik Emre Eskimez, Jinyu Li
    Interspeech 2025
    [ pdf]

  • Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
    Haibin Wu, Xiaofei Wang, Sefik Emre Eskimez, Manthan Thakker, Daniel Tompkins, Chung-Hsien Tsai, Canrun Li, Zhen Xiao, Sheng Zhao, Jinyu Li, Naoyuki Kanda
    SLT 2024
    [ pdf | Webpage | Github]

  • Ultra-Low Latency Speech Enhancement - A Comprehensive Study
    Haibin Wu, Sebastian Braun
    ICASSP 2025
    [ pdf]

  • Codec-SUPERB: An In-Depth Analysis of Sound Codec Models
    Haibin Wu, Ho-Lam Chung, Yi-Cheng Lin, Yuan-Kuei Wu, Xuanjun Chen, Yu-Chi Pai, Hsiu-Hsuan Wang, Kai-Wei Chang, Alexander H. Liu, Hung-yi Lee
    ACL 2024 Finding
    [ pdf | Github | Leaderboard | Huggingface]

  • CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems
    Haibin Wu, Yuan Tseng, Hung-yi Lee
    Interspeech 2024
    [ pdf | Webpage]

  • SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts
    Haibin Wu, Kai-Wei Chang, Yuan-Kuei Wu, Hung-yi Lee
    [ pdf | Webpage | Github]

  • Rethinking complex-valued deep neural networks for monaural speech enhancement
    Haibin Wu, Ke Tan, Buye Xu, Anurag Kumar, Daniel Wong
    Interspeech 2023
    [ pdf]

For the complete list, please visit google scholar.

Research Experience

  • Senior applied scientist at Microsoft Sep 2024 - Now

  • Research scientist intern at Microsoft May 2024 - Aug 2024

  • Research scientist intern at Microsoft Feb 2024 - May 2024

  • Research scientist intern at Meta May 2023 - Sep 2023

  • Applied scientist intern at Amazon Sep 2022 - Dec 2022

  • Research scientist intern at Meta May 2022 - Aug 2022

  • Visiting Student at the Chinese University of Hong Kong May 2021 - April 2022

  • Visiting Student at SIGS of Tsinghua University Aug. 2020 - May 2021

  • Intern at Tencent Jan. 2021 - May 2021

Challenge

Honers

  • Google studnet travel grant Google 2024

  • ICASSP travel grant ICASSP 2024

  • Interspeech travel grant Interspeech 2022

  • Appier Scholarship Appier 2022

  • Google PHD Fellowship Google 2021

  • Advanced Speech Technologies Scholarship NTU EECS 2020

  • Academic Achievement Award NCTU EECS 2019

  • Academic Achievement Award NCTU EECS 2018

  • National Scholarship Chinese Ministry of Education 2014