Enhancing disease prediction from limited datasets
DOI:
https://doi.org/10.70530/kuset.v19i1.591Keywords:
Text representation models, Classification algorithms, Compatibility, Text embeddingsAbstract
In this paper, the enhancement of disease prediction accuracy for a limited dataset is explored. Text representation models such as ClinicalBERT and TF-IDF vectorizer are utilized to generate text embeddings, which are then paired with robust classification algorithms (estimators), including Random Forest, XGBoost, and linear models like the Passive Aggressive Classifier. While embeddings of advanced text representation models combined with robust classification algorithms are expected to yield satisfactory results, this research focuses on comparing two different text representation models and how the text embeddings they generate perform when combined with estimators in predicting diseases. Additionally, the compatibility of text representation models with classification algorithms, and its impact on accuracy for disease prediction in the limited dataset is examined.Published
2025-03-31
How to Cite
Singh, A. (2025). Enhancing disease prediction from limited datasets. Kathmandu University Journal of Science Engineering and Technology, 19(1). https://doi.org/10.70530/kuset.v19i1.591
Issue
Section
Original Research Articles

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
This work is licensed under CC BY-SA 4.0