Enhancement to Low-Resource Text Classification via Sequential Transfer Learning
- Author(s): Neil Christian R. Riego, Danny Bell Villarba, Ariel Antwaun Rolando C. Sison, Fernandez C. Pineda, and Herminiño C. Lagunzad
PAPER DETAILS
- Computer Science and Engineering
-
Paper ID: UIJRTV4I80009
-
Volume: 04
-
Issue: 08
-
Pages: 72-82
-
June 2023
-
ISSN: 2582-6832
-
CITE THIS
Abstract
Textual data on many platforms has increased dramatically in recent years. With this amount of data, anyone may do text classification, such as sentiment analysis and hatespeech recognition. However, the lack of various NLP Tools in low-resource areas such as Asia and Africa limits its ability to be leveraged. We provided three (3) contributions. First, we provided a Tagalog product review dataset as a baseline for sentiment analysis tasks. Second, we pretrained and finetuned a Tagalog variation of XLNet in two datasets, reaching 78.05% accuracy in the hatespeech dataset and 95.02% in the shopee, which is 0.33% and 3.87% higher than the benchmark RoBERTa-tagalog model, respectively. Third, in the finetuning step, an improvement using bootstrap aggregation (bagging) is implemented, which boosts accuracy by 0.16% when 70% of the data is used in finetuning three XLNet-Tagalog models. Furthermore, combining RoBERTa-Tagalog and XLNet-Tagalog finetuned in 100% of data results in an accuracy of 79.47%, a 1.26% improvement over the best-performing setup using the XLNet-Tagalog. Finally, the XLNet Tagalog degrades slower than the benchmark model by 4.53. We make all our models and datasets available to the research community.