Enhancement to Low-Resource Text Classification via Sequential Transfer Learning

PAPER DETAILS

CITE THIS

Neil Christian R. Riego, Danny Bell Villarba, Ariel Antwaun Rolando C. Sison, Fernandez C. Pineda, and Herminiño C. Lagunzad, 2023. Enhancement to Low-Resource Text Classification via Sequential Transfer Learning. United International Journal for Research & Technology (UIJRT), 4(8), pp72-82.

Abstract

Textual data on many platforms has increased dramatically in recent years. With this amount of data, anyone may do text classification, such as sentiment analysis and hatespeech recognition. However, the lack of various NLP Tools in low-resource areas such as Asia and Africa limits its ability to be leveraged. We provided three (3) contributions. First, we provided a Tagalog product review dataset as a baseline for sentiment analysis tasks. Second, we pretrained and finetuned a Tagalog variation of XLNet in two datasets, reaching 78.05% accuracy in the hatespeech dataset and 95.02% in the shopee, which is 0.33% and 3.87% higher than the benchmark RoBERTa-tagalog model, respectively. Third, in the finetuning step, an improvement using bootstrap aggregation (bagging) is implemented, which boosts accuracy by 0.16% when 70% of the data is used in finetuning three XLNet-Tagalog models. Furthermore, combining RoBERTa-Tagalog and XLNet-Tagalog finetuned in 100% of data results in an accuracy of 79.47%, a 1.26% improvement over the best-performing setup using the XLNet-Tagalog. Finally, the XLNet Tagalog degrades slower than the benchmark model by 4.53. We make all our models and datasets available to the research community.

Keywords: Bagging-based Approach, Low-resource Language, NLP Tools, Natural Lanaguage Processing RoBERTa, Sentiment Analysis, Sequential Transfer Learning, Tagalog language, Textual data, Text classification, XLNet.

Related Papers

For Conference & Paper Publication​