Abstract
Humans face various diseases that are mainly caused by environmental conditions and living habits. These diseases exhibit several symptoms and can share a relationship based on their symptoms. The identification and interpretation of these groups of symptom-based diseases can aid in developing treatment plans for a new outbreak of disease. This research explores the intersection of machine learning and healthcare, specifically focusing on the enhancement of disease classification through symptom-based cluster analysis. By leveraging unsupervised machine learning algorithms, patterns and relationships within diverse symptom datasets were identified, revealing novel associations and subtypes in disease manifestation. The integration of a Large Language Model (LLM), specifically OpenAI's Generative Pretrained Transformer(GPT), played a pivotal role in interpreting and communicating the complex outputs of the machine learning process. The results indicated a significant improvement in defining distinct clusters based on the relationship between diseases and symptoms, with GPT-4o providing simplified explanations that bridge the gap between machine-generated insights and healthcare professional's understanding. The study's findings offer a more profound understanding of the distinctive features characterising the different clusters of diseases generated by the machine learning models. The healthcare field produces extensive and varied data, which machine learning algorithms can leverage to detect new illnesses and optimize treatment plans 1. Deep learning (DL), when trained on high-quality data, has significantly advanced clinical diagnostics and facilitated disease clustering 2. One example is symptom-based clustering, which can enhance diagnostic accuracy and support personalized patient care 3. Diseases with overlapping symptoms pose significant challenges for accurate clinical diagnosis, a problem that can be mitigated through coordinated care and collaboration between multidisciplinary teams 4. Traditionally, physical exams or laboratory tests are used to identify diseases. This process can be complicated and sometimes inaccurate, as many diseases share similar symptoms 5. ML-enabled techniques help to discover new disease subtypes and understand the diversity of the patient population by uncovering hidden patterns within complex data sets 6. Symptom-based cluster analysis is an effective technique for providing precise and targeted medical information 7. However, interpreting these complex models poses a unique challenge. Watson 8 argued that while clustering algorithms efficiently reveal connections, converting these clusters and patterns into meaningful medical insights is difficult.