What is it about?

In the corporate or academic environment, knowing the areas of expertise of different professionals is something very relevant. This information can be used to help with tasks such as finding experts in a particular field, identifying which researchers are potentially eligible for grants, and link prediction. We explored the usage of machine learning techniques to recognize the main areas of expertise of researchers using several representations of their scientific production titles as the data source for classification algorithms. We have been able to surpass the current state-of-art results to resolve this problem by using a TF-IDF character n-gram representation for the text in the titles, achieving an accuracy of 95.91%.

Featured Image

Why is it important?

We proposed and compared several machine learning techniques to recognize researchers' areas of expertise using its scientific productions titles as the data source to improve the related approaches. The titles were represented using different strategies, such as TF-IDF character N-Grams (which has shown good results in text classification tasks) and word embedding (namely, the Word2Vec approach). We found out that the character level n-grams with TF-IDF outperforms the word-level representation, the usage of a word embedding approach (Word2Vec), and previous approaches which used the same dataset.

Perspectives

The proposed approach demonstrated to be a promising approach to solve the task of identifying the area of a given researcher using only the text of the title of its publications. In future work, our group intends to use this approach in different text classification domains.

Dr. Luciano Digiampietri
Universidade de Sao Paulo Campus da Capital

Read the Original

This page is a summary of: Improving researcher’s area of expertise identification using TF-IDF Characters N-grams, June 2021, ACM (Association for Computing Machinery),
DOI: 10.1145/3466933.3466984.
You can read the full text:

Read

Contributors

The following have contributed to this page