What is it about?

The continually increasing number of documents produced each year necessitates ever improving information processing methods for searching, retrieving, and organizing text. Central to these information processing methods is document classification, which has become an important application for supervised learning. Recently the performance of these traditional classifiers has degraded as the number of documents has increased. This is because along with this growth in the number of documents has come an increase in the number of categories. This paper approaches this problem differently from current document classification methods that view the problem as multi-class classification. Instead we perform hierarchical classification using an approach we call Hierarchical Deep Learning for Text classification (HDLTex). HDLTex employs stacks of deep learning architectures to provide specialized understanding at each level of the document hierarchy.

Featured Image

Why is it important?

Document classification is an important problem to address, given the growing size of scientific literature and other document sets. When documents are organized hierarchically, multi-class approaches are difficult to apply using traditional supervised learning methods. This paper introduces a new approach to hierarchical document classification, HDLTex, that combines multiple deep learning approaches to produce hierarchical classifications. Testing on a data set of documents obtained from the Web of Science shows that combinations of RNN at the higher level and DNN or CNN at the lower level produced accuracies consistently higher than those obtainable by conventional approaches using naıve Bayes or SVM. These results show that deep learning methods can provide improvements for document classification and that they provide flexibility to classify documents within a hierarchy. Hence, they provide extensions over current methods for document classification that only consider the multi-class problem.

Read the Original

This page is a summary of: HDLTex: Hierarchical Deep Learning for Text Classification, December 2017, Institute of Electrical & Electronics Engineers (IEEE),
DOI: 10.1109/icmla.2017.0-134.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page