I am a PhD student at the Centre for Speech Technology Research (CSTR), affiliated with the Institute for Language, Cognition and Computation (ILCC), University of Edinburgh. I am fortunate to be advised by Dr. Catherine Lai and Prof. Peter Bell, and fully funded by the School of Informatics. I was an Enrichment student at and a research intern at Audio and Acoustics Research Group .

My research aims to advance spoken language technologies in real-world applications by bridging different but relevant domains such as speech & language, emotion & health, humans & machines, etc. In particular, my work focuses on problems that hinder the broader use of spoken language technologies in the wild.

Before PhD study, I used to research on affective computing and human-robot interaction at Honda Innovation Lab, Hiroshi Ishiguro Lab ATR, and Speech and Audio Processing Lab. I was fortunate to be advised by Prof. Tatsuya Kawahara, Prof. Nigel Ward, and Dr. Carlos Ishi.

🔥 News

05.2025, Two papers accepted to Interspeech 2025.
03.2025, One paper accepted to ICME 2025. Addressing Emotion Bias in Music Emotion Recognition and Generation with Frechet Audio Distance
02.2025, Received the ICASSP Travel Grant from the IEEE Signal Processing Society. 🎉
12.2024, Four papers accepted to ICASSP 2025 (two as the first author and two as the project leader) !
11.2024, Our special session Responsible Speech Foundation Models II has been accepted to Interspeech 2025. Submit your papers and compete for the Best Paper Award!
09.2024, Our SpandLDeteriorate workshop has been accepted to ACM MM Asia 2024. Looking forward to your papers!
08.2024, Three papers accepted to SLT 2024 (two as the first author and one as the co-first author)!
04.2024, Our GenSEC challenge has been accepted to SLT 2024. Looking forward to your papers!
03.2024, We won the 1st place (and $1,000) out of 31 teams in Task 1 - Categorical Emotion Recognition at Odyssey 2024 Emotion Recognition Challenge. 🎉
02.2024, One paper accepted to ICASSP 2024 SASB workshop: Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study on Speech Emotion Recognition
01.2024, Our special session Responsible Speech Foundation Models has been accepted to Interspeech 2024. Looking forward to your papers!
11.2023, Microsoft FADTK, a Frechet audio distance toolkit has been released, to which I contributed its speech models.
09.2023, Received the IEEE SPS Scholarship from the IEEE Signal Processing Society. 🎉
09.2023, Received the Outstanding Paper Award at the SAI workshop, ACII 2023, MIT Media Lab. 🎉
06.2023, Our grant proposal (as Co-Investigator) Development of A Human-Centric Elderly Driving Education System has been accepted by the Inter-University Research Institute Corporation, Research Organization of Information and Systems.
03.2023, Received the Gary Marsden Travel Award from ACM SIGCHI. 🎉

🎖 Honors and Awards

2025, ICASSP Travel Grant, IEEE Signal Processing Society
2024, 3rd Place Award, 3-Min Thesis Competition, Students of Society for Affective Science
2024, 1st Place Award, Odyssey 2024 Emotion Recognition Challenge
2023, IEEE SPS Scholarship, IEEE Signal Processing Society
2023, Outstanding Paper Award, SAI workshop, ACII 2023
2023, Gary Marsden Travel Award, ACM SIGCHI
2022, Enrichment Student Award, Alan Turing Institute
2021, Fully-Funded PhD Scholarship, University of Edinburgh
2016, Seiwa International Scholarship, Kyoto University
2013, 3rd Class Academic Excellence Scholarship, NUPT

💻 Research Activities

- Organizing Committee -

Workshop at ICMI 2025: Holistic and Responsible Affective Intelligence
Special Session at ASRU 2025: Responsible Speech and Audio Generative AI
Workshop on Multi-Biological Sensing Data for Speech and Language Deterioration Prediction at ACM MM Asia 2024
GenSEC (old name: GenASR) challenge at SLT 2024
Special Session at Interspeech 2024: Responsible Speech Foundation Models
UK Special Interest Group in Speech-Based Multimodal Information Processing
UK Speech 2022

- Program Committee -

Interspeech Young Female Researchers in Speech Workshop 2024, 2022
ICMI 2021 Doctoral Consortium

- Journal Review -

IEEE Transactions on Affective Computing (2)
Computer Speech and Language (4)
IEEE Transactions on Audio, Speech and Language Processing (1)
Speech Communication (1)
Computers in Human Behavior (1)
Journal of Rehabilitation and Assistive Technologies Engineering (1)
Pattern Analysis and Applications (1)
Robotics and Autonomous Systems (1)

- Conference Review -

ICASSP’23-25, Interspeech’23-25, ICME’25, IJCNN’25, ASRU’23-25, SLT’22-24, UK Speech’22
Interspeech Young Female Researchers in Speech Workshop’24 & 22
CHI’23 Late-Breaking Work
IJCLR’23 CogAI Workshop
ICMI’21 Late-Breaking Report & Doctoral Consortium
HRI’20 Late-Breaking Report

- Organizations & Communities -

ACM, AAAC, ISCA, IEEE, IEEE Signal Processing Society, SIGCHI, UK Speech, UK-SIGMM, Alan Turing Institute

🎙 Talks

12.2024, “Opportunities and Challenges of Language Emotion in Real Applications: LLMs, Multimodal Incongruity, and Human-Robot Interaction”. Tsinghua Laboratory of Brain and Intelligence, Tsinghua University (host: Prof. Dan Zhang)
12.2024, “Multi-view Cognitive State Detection Based on Pre-trained Speech and Language Models”. Speech and Audio Technology Lab, Tsinghua University (host: Prof. Wei-Qiang Zhang)
03.2024, “Opportunities and Challenges of Speech Emotion Recognition in the Era of Foundation Models”. Center for Interdisciplinary Research in Language Sciences, University of Science and Technology of China (host: Prof. Jiahong Yuan)
11.2020, “Affective Human-Robot Interaction”. Cognitive Developmental Robotics Lab, University of Tokyo (host: Prof. Yukie Nagai)

💰 Grants

06.2023, “Development of A Human-Centric Elderly Driving Education System”, Co-Investigator, ¥800,000. Strategic Research Project “2023-SRP-06”, Research Organization of Information and Systems

👔 Experiences

- Teaching -

TA (Coursework marker), Automatic Speech Recognition, University of Edinburgh, 2023 & 2024
TA (Tutor, demonstrator, and project marker), System Design Project, University of Edinburgh, 2023
TA (Coursework and exam marker), Machine Learning, University of Edinburgh, 2022 & 2024

- Supervision -

Cross-lingual Speech Emotion Recognition and Speech Emotion Diarisation: A Comparative Study between Humans and Machines

Zhichen Han, MSc dissertation 2024/25 (Distinction), University of Edinburgh
Revisiting the Shared Suprasegmental Acoustics Between Emotional Speech and Song through Self-Supervised Learning Models

Yujia Sun, MSc dissertation 2024/25 (Distinction), University of Edinburgh
Layerwise Analysis of HuBERT Acoustic Word Embeddings in the Context of Speech Emotion Recognition

Alexandra Saliba, MSc dissertation 2023/24 (Distinction), University of Edinburgh
Hierarchical Cross-Modal Transformer and A Study of Cross-Modal Attention for Affective Computing

Yaoting Wang, MSc dissertation 2022/23 (Distinction), University of Edinburgh
A Cross-Domain Study of Crossmodal Attention Based Multimodal Emotion Recognition

Junling Liu, MSc dissertation 2021/22, University of Edinburgh

- Working -

Research Intern, Microsoft Research Audio and Acoustics Group
Researcher, Honda R&D Innovation Lab
R&D Engineer, NTT Data R&D headquarters
Student Researcher, ERATO ISHIGURO Symbiotic HRI Project, ATR

📖 Education

Ph.D. Candidate, Informatics, University of Edinburgh
M.Sc., Intelligence Science and Technology, Kyoto University
B.Eng., Electronic and Information Engineering, Nanjing University of Posts and Telecommunications

📝 Publications

- Papers -

See my Google Scholar

- Patents -

Feeling estimation device, feeling estimation method, and storage medium. US11107464B2, JP2020091302A, CN111341349A
Information processing apparatus, information processing method, and storage medium. US11443759B2, JP2021026130A, CN112349301A
Information-processing device, vehicle, computer-readable storage medium, and information-processing method. US11710499B2, JP2021124642A, CN113221933A

- Technical Reports -

Crossmodal ASR Error Correction with Discrete Speech Units

Yuanchao Li, Pinzhen Chen, Peter Bell, Catherine Lai. UK Speech. 2024
Multimodal Dyadic Impression Recognition via Listener Adaptive Cross-Domain Fusion

Yuanchao Li, Peter Bell, Catherine Lai. UK Speech. 2023
Exploration of A Self-Supervised Speech Model: A Study on Emotional Corpora

Yuanchao Li, Yumnah Mohamied, Peter Bell, Catherine Lai. UK Speech. 2022
An Extensible End-to-End Multitask Learning Model for Recognizing Driver States

Yuanchao Li. The 12th Honda R&D Technical Forum. 2019
Processing User States in Spoken Dialog Systems for Human-Robot Interaction

Yuanchao Li. International Design Symposium in Kyoto. 2017
Assessment Selection for Human-Robot Interaction based on Emotion Recognition Combining Prosody and Text Information

Yuanchao Li, Tatsuya Kawahara. The 44th Kansai Joint Speech Seminar. 2016

- Book Translation -

The Easiest Handbook for Machine Learning Project: How to Implement AI (Japanese to Chinese)

いちばんやさしい機械学習プロジェクトの教本 – 人気講師が教えるAIを導入する方法

超简单的机器学习 – 人气讲师为你讲解AI在工作中的应用
The Easiest Handbook for Artificial Intelligence Business: Commercializing AI and Machine Learning (Japanese to Chinese)

いちばんやさしい人工知能ビジネスの教本 – 人気講師が教えるAI・機械学習の事業化

超简单的人工智能 – 人气讲师为你讲解AI商业应用

- Media Articles -

Amazon is Building its Grocery Empire. Synced Review
Apple is in a Dilemma on iPhone’s 10-year-old Birthday. Synced Review
Conversational Systems: A General Review. Synced Review
Does Fitness Data Make the Average Person Healthier. Synced Review
25 Tweets to Know You: A New Model to Predict Personality with Social Media. Synced Review
Why AlphaGo is not AI. Synced Review
Artificial Intelligence is the New Electricity – Andrew Ng. Synced Review
The Time to Marry AI May Come Soon. Synced Review
ERICA: The ERATO Intelligent Conversational Android. Synced Review
Statistical Spoken Dialogue Systems and the Challenges for Machine Learning. Synced Review
Emotional Intelligence is the Future of Artificial Intelligence. Synced Review

Yuanchao Li