She adds that text-based LLMs depend on an initial transcription step, and any error at this stage can ripple through the system — causing the chatbot to misinterpret the user’s intent and respond inappropriately.
Empathy isn’t just a nice-to-have; it’s often what turns a frustrating experience into a satisfying one. Yet, as companies automate more frontline interactions, most chatbots still struggle to understand how users feel, resulting in emotionally tone-deaf responses.
Perhaps the root of the problem is that most chatbots are built on text-based large language models (LLMs). Spoken language carries emotional weight through pitch, tone, volume and pacing — elements that do not translate well into written words. When voice interactions are transcribed into plain text, much of that nuance is lost, making it harder for chatbots to detect how a user truly feels, shares Dr Aw Ai Ti, department head of Aural and Language Intelligence at A*STAR Institute for Infocomm Research (I²R).

