Abstract
This study investigates the use of Support Vector Machine (SVM) models to enhance text classification for tele-triage in psychiatry. The issue addressed is SVM's tendency to ignore significant textual features, which results in low precision and recall, particularly in multi-class classification tasks with imbalanced classes. In order to address this, the researchers propose generating embeddings using the Large Language Model (LLM) RoBERTa, then reducing the dimensionality using PCA before training the SVM model. The dataset includes 500 Reddit posts with five categories of suicide risk꞉ Attempt, Behavior, Ideation, Indicator and Supportive. Experts used the Columbia Suicide Severity Rating Scale (C-SSRS) to sort these posts. Results show significant improvement over the baseline SVM model. The model initially had trouble with recall and precision, especially for the Attempt class, which had zero precision. Significant gains were observed in the Supportive class (precision: 0.55 to 0.59, recall: 0.43 to 0.57) and Behavior (precision: 0.25 to 0.31, recall: 0.13 to 0.27) following the implementation of the RoBERTa-based strategy. Even though the attempt demonstrated some improvement (precision: 0.00 to 0.33), more optimization is required. These results suggest that incorporating RoBERTa embeddings and PCA for dimensionality reduction can enhance SVM’s performance by preventing the loss of important features. The model still has issues with minority classes, suggesting that more research is needed to enhance recall for underrepresented categories and handle class imbalances.
Keywords: Large Language Model, Multiclass classification, Support Vector Machine, Tele-triage, Text Classification.
Related Papers