Methodological proposal to identify the nationality of Twitter users through random-forests

Damián Quijano; Richard Gil-Herrera

doi:10.1371/journal.pone.0277858

What is it about?

This article proposes a method to identify the nationality of Twitter users using an algorithm called random-forests. This method uses numerical characteristics based on the frequency of interactions between users to generate classification models. The results show that this method can significantly improve accuracy in identifying the nationality of users compared to initial data.

Photo by Rubaitul Azad on Unsplash

Why is it important?

This study demonstrates the validity of using small labeled samples by using random forests to generate the classification model. Retweets and quotes are variables that strongly influence the classification. The model uses only numerical features and is based on user metrics, not messages.

Perspectives

The proposed methodology does not require large amounts of manually labeled data, saving time by processing primarily numerical values instead of large amounts of text. I hope this article stimulates research in computational social sciences.
Damian Quijano
Universidad Especializada de las Americas

This page is a summary of: Methodological proposal to identify the nationality of Twitter users through random-forests, PLoS ONE, January 2023, PLOS,
DOI: 10.1371/journal.pone.0277858.
You can read the full text:

Read

Resources

Open Access version
Methodological proposal to identify the nationality of Twitter users through random-forests
It is a research article that proposes a methodology to identify the nationality of Twitter users through random-forests. The publication is open access and the data is freely accessible.

Contributors

The following have contributed to this page

Damian Quijano
Universidad Especializada de las Americas

Methodological proposal to identify the nationality of Twitter

What is it about?

Why is it important?

Perspectives

Resources