What is it about?

Large language models (LLMs) are becoming more capable, including being able to understand and reply in many different languages. Although they are trained to avoid answering harmful or illegal questions, they can still be more vulnerable to such attacks in languages that have less training data or fewer human reviewers. This study introduces a method to automatically check how exposed popular LLMs are in multiple languages. The authors tested six models in eight languages with different levels of data support and confirmed the tool’s results by comparing them to human evaluations in two of the languages. They found that LLMs tend to be weaker in low-resource languages, but these weaknesses often lead to unclear or nonsensical answers, which may limit the real-world impact.

Featured Image

Why is it important?

Global safety: As LLMs are used worldwide, they must behave responsibly in all languages, not just English. If they respond dangerously or inappropriately in low-resource languages, that could lead to real harm. Fairness and equity: Users of less-common languages deserve the same level of safety and performance. Ignoring this could reinforce linguistic inequality in AI systems. Security concerns: Attackers could exploit these weak spots in underrepresented languages to bypass safety filters and extract harmful or prohibited content.

Read the Original

This page is a summary of: A Framework to Assess Multilingual Vulnerabilities of LLMs, May 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3701716.3715581.
You can read the full text:

Read

Contributors

The following have contributed to this page