What is it about?
The way people write on social media is very different from formal writing – it's often casual, uses slang, and has complex or unclear sentence structures. This makes it hard for computers to understand the grammar and relationships between words, a task called Dependency Parsing (DP). Our research focuses on finding the best way for AI to do this for Thai language used on social networks. Instead of trying to analyze long, messy sentences, we break the text down into smaller, more manageable chunks called Elementary Discourse Units (EDUs), which usually convey a single idea clearly. We then tested several AI models to see which one could most accurately figure out the grammatical structure of these EDUs in Thai social data.
Featured Image
Photo by John on Unsplash
Why is it important?
Understanding the grammar of social media text is crucial for many AI applications, like figuring out customer opinions from Facebook posts or answering questions based on online discussions. However, Thai social media language, with its unique slang and flexible grammar, poses a big challenge. Our work is important because: - Better Parsing for Informal Thai: We identified that a specific AI model (an improved version of the Elkaref Dependency Parser, which is transition-based) works best for Thai social data, achieving an Unlabeled Attachment Score (UAS) of 81.42%. This means it's better at correctly identifying how words in a sentence relate to each other. - EDUs as a Smart Input: We show that using shorter Elementary Discourse Units (EDUs) instead of full sentences makes parsing more effective for this type of informal text. This is because EDUs simplify the complex structures often found in social media posts. - Enables Better Thai NLP Tools: By improving dependency parsing for Thai social data, we pave the way for more accurate high-level Thai NLP tools, like sentiment analyzers or systems that can automatically extract information from customer feedback on social channels.
Perspectives
Working on parsing Thai social media data was a fascinating challenge. We all know how different online chat language can be from textbook language – it’s full of slang, shortcuts, and often grammatically creative sentences! This 'messiness' is a big hurdle for traditional NLP tools. My goal was to find a practical way to make sense of these complex structures in Thai social posts, which are a huge source of information for businesses trying to understand their customers. The idea of using Elementary Discourse Units (EDUs) as the basic unit of analysis, instead of trying to wrangle entire convoluted sentences, seemed promising. It’s like breaking down a complex problem into smaller, more digestible pieces. It was really interesting to see how different parsing models handled this type of data and to find that a transition-based approach, specifically the improved Elkaref parser, came out on top for our social data. This research, I hope, provides a good foundation for building more robust tools to understand the true voice of Thai users on social media.
Sattaya Singkul
True Digital Group
Read the Original
This page is a summary of: Parsing Thai Social Data: A New Challenge for Thai NLP, October 2019, Institute of Electrical & Electronics Engineers (IEEE),
DOI: 10.1109/isai-nlp48611.2019.9045639.
You can read the full text:
Contributors
The following have contributed to this page







