Enhancing disease prediction from limited datasets

Authors

  • Ayush Singh Department of Mathematics, School of Science, Kathmandu University, Dhulikhel, Kavre, Nepal.

DOI:

https://doi.org/10.70530/kuset.v19i1.591

Keywords:

Text representation models, Classification algorithms, Compatibility, Text embeddings

Abstract

In this paper, the enhancement of disease prediction accuracy for a limited dataset is explored. Text representation models such as ClinicalBERT and TF-IDF vectorizer are utilized to generate text embeddings, which are then paired with robust classification algorithms (estimators), including Random Forest, XGBoost, and linear models like the Passive Aggressive Classifier. While embeddings of advanced text representation models combined with robust classification algorithms are expected to yield satisfactory results, this research focuses on comparing two different text representation models and how the text embeddings they generate perform when combined with estimators in predicting diseases. Additionally, the compatibility of text representation models with classification algorithms, and its impact on accuracy for disease prediction in the limited dataset is examined.

Published

2025-03-31

How to Cite

Singh, A. (2025). Enhancing disease prediction from limited datasets. Kathmandu University Journal of Science Engineering and Technology, 19(1). https://doi.org/10.70530/kuset.v19i1.591