What is it about?

It's tough to quickly find negative comments (like complaints, toxic language, or sarcasm) in large amounts of spoken Thai audio, such as online videos. Our research introduces an AI-powered toolkit designed to do this much faster and more accurately for the Thai language. Think of it as a smart assistant: - First, it can quickly scan audio for specific keywords you're interested in using a method called Query-by-Example Spoken Term Detection (QbE-STD). This helps focus analysis and reduce processing time. - Then, it figures out who is speaking and when through Speaker Diarization (SD). - Next, it converts their speech into text using Automatic Speech Recognition (ASR). - Finally, it analyzes this text to detect negative sentiment, toxic language, and sarcasm. We've used special techniques like transfer learning to make these tools work well for Thai, a language which often lacks the large datasets AI typically needs for optimal performance.

Featured Image

Why is it important?

Manually sifting through hours of Thai audio for negative feedback is slow and difficult. This is especially true for Thai, a language with unique characteristics that can be challenging for AI, such as ambiguities in word boundaries and a general lack of AI training data. Our work offers a significant improvement because: - Speed and Efficiency: Our integrated system, especially when using keyword spotting (QbE-STD), dramatically cuts down the time needed to find and analyze negative feedback. For instance, processing average audio files of about 5.5 minutes (331.1 seconds) took under 50 seconds (48.4 seconds) with keyword spotting, compared to over 50 minutes (3014.2 seconds) without it. This makes large-scale analysis practical. - Improved Thai Language Understanding: By applying transfer learning, we've enhanced the accuracy of speech-to-text conversion and negative feedback detection (covering toxicity, sarcasm, and sentiment) specifically for the Thai language. This approach helps overcome the common problem of limited data for such under-resourced languages. This research paves the way for better content moderation, more responsive customer service, and more informed decision-making based on audio feedback in Thai and potentially other similar languages.

Perspectives

I was excited to work on this framework with the team because it tackles a very real-world problem: the sheer volume of online audio content and the pressing need for organizations to quickly understand user feedback, especially negative comments. Developing this for Thai was particularly rewarding. Thai is a language with rich nuances that AI often misses, and there's a significant gap in tools that can handle its specific complexities like unclear word boundaries in text and limited speech data. The ability to combine different AI tools – pinpointing keywords, identifying different speakers, transcribing speech accurately, and then analyzing the resulting text for subtle negative cues like sarcasm or toxicity – into one efficient pipeline is a key step forward. I believe this work not only provides a practical solution for businesses and content platforms dealing with Thai audio but also offers valuable insights and a methodological basis for developing similar advanced audio analytic tools for other under-resourced languages. Our hope is to make audio analytics more accessible and powerful, enabling a better understanding of diverse voices.

Sattaya Singkul
True Digital Group

Read the Original

This page is a summary of: An Enhanced Multimodal Negative Feedback Detection Framework with Target Retrieval in Thai Spoken Audio, July 2024, Institute of Electrical & Electronics Engineers (IEEE),
DOI: 10.1109/icmew63481.2024.10645364.
You can read the full text:

Read

Contributors

The following have contributed to this page