A Large Language Model Based on Real-world Data Predicts Chronic Diseases Risks
Based on multiple open-source general large language models (LLMs), we trained risk prediction LLMs for 20 health outcomes, including myocardial infarction, stroke, and diabetes utilizing a supervised fine-tuning approach. These LLMs were developed with the LLaMA-Factory package, where censoring data and competing risk events were identified through natural language, meeting the needs of survival prediction. The trained LLMs can predict individual health outcomes using health examination reports, effectively addressing the sensitivity of traditional predictive models to missing variables. Additionally, these LLMs can be integrated into the vLLM inference framework, with the Paged-Attention mechanism boosting inference efficiency. Our LLMs achieved superior performance compared to traditional approaches and GPT-4 across validation sets for multiple health outcomes, with an AUC of 0.92 for 5-year diabetes risk prediction, demonstrating strong clinical applicability.
LLMs-based Personalized Health Guidance and Disease Intervention and Hallucinations Metrics
The imbalance in existing medical resources leads to significant disparities in health interventions, with disease prevention and intervention lacking benchmark datasets for large language models (LLMs). Furthermore, hallucination issues hinder the practical application of LLMs in the medical field. To address these challenges, this research focuses on personalized health guidance and interventions based on dynamic multi-expert intelligent agents and explores hallucination metrics for LLMs driven by anomaly detection. Through this in-depth study, we aim to improve the quality and accessibility of healthcare services, reduce healthcare costs, promote personalized health management, and enhance the safety and trustworthiness of large medical models.