What is it about?
In the field of text classification, current research ignores the role of part-of-speech features, and the multi-channel model that can learn richer text information compared to a single model. Moreover, the method based on neural network models to achieve final classification, using fully connected layer and Softmax layer can be further improved and optimized. This paper proposes a hybrid model for text classification using part-of-speech features, namely PAGNN-Stacking1. In the text representation stage of the model, introducing part-of-speech features facilitates a more accurate representation of text information. In the feature extraction stage of the model, using the multi-channel attention gated neural network model can fully learn the text information. In the text final classification stage of the model, this paper innovatively adopts Stacking algorithm to improve the fully connected layer and Softmax layer, which fuses five machine learning algorithms as base classifier and uses fully connected layer Softmax layer as meta classifier. The experiments on the IMDB, SST-2, and AG News datasets show that the accuracy of the PAGNN-Stacking model is significantly improved compared to the benchmark models.
Featured Image
Photo by Mojahid Mottakin on Unsplash
Why is it important?
(1) In the text representation stage, the model introduces part-of-speech features to improve the accuracy of text information representation. (2) In the text final classification stage, the model adopts multi-channel attention-gated neural network can extract rich text features. (3) In the text final classification stage, this paper proposes to use the Stacking algorithm optimize the fully connected layer and Softmax layer, in which five methods: Random Forest (RF) , Support Vectors Machine (SVM), K-Nearest Neighbor (KNN), Gradient Boosting Decision Tree (GBDT) and Adaptive Boosting (AdaBoost) are used as the base classifier, and fully connected layer and Softmax layer are used as the meta classifier.
Perspectives
Read the Original
This page is a summary of: A hybrid model for text classification using part-of-speech features, Journal of Intelligent & Fuzzy Systems, July 2023, IOS Press,
DOI: 10.3233/jifs-231699.
You can read the full text:
Resources
Stanford syntactic analyzer
Stanford Syntactic Analyzer has features such as word separation, part-of-speech annotation, and analyzing syntactic relationships between words in a sentence.
IMDB datasets
Dataset for Sentiment Analysis of Movie Reviews
AG_News dataset
The AG News dataset is a news topic classification dataset. It is constructed using the four most frequently occurring categories in the AG News corpus, and each sample consists of the tag name, title, and description content. The category tag names are Worlds, Sports, Business, and Sci/Tech.
Contributors
The following have contributed to this page