What is it about?

This article proposes a method to identify the nationality of Twitter users using an algorithm called random-forests. This method uses numerical characteristics based on the frequency of interactions between users to generate classification models. The results show that this method can significantly improve accuracy in identifying the nationality of users compared to initial data.

Featured Image

Why is it important?

This study demonstrates the validity of using small labeled samples by using random forests to generate the classification model. Retweets and quotes are variables that strongly influence the classification. The model uses only numerical features and is based on user metrics, not messages.


The proposed methodology does not require large amounts of manually labeled data, saving time by processing primarily numerical values instead of large amounts of text. I hope this article stimulates research in computational social sciences.

Damian Quijano
Universidad Especializada de las Americas

Read the Original

This page is a summary of: Methodological proposal to identify the nationality of Twitter users through random-forests, PLoS ONE, January 2023, PLOS,
DOI: 10.1371/journal.pone.0277858.
You can read the full text:

Open access logo



The following have contributed to this page