Real-world Big Data and Public Health Large Models
Leveraging real-world multimodal data, including electronic medical records, electronic health records, health examination reports, medical imaging, and more, to train large models for disease risk prediction and health intervention guidance. The disease risk prediction large models can assess textual health information and predict the risk of major chronic diseases, such as myocardial infarction and stroke, for an individual over different time periods. The health intervention guidance large model, on the other hand, provides personalized, expert health advice and recommendations to high-risk individuals, aiming to prevent the onset and progression of chronic diseases.

A Large Language Model Based on Real-world Data Predicts Chronic Diseases Risks


Based on multiple open-source general large language models (LLMs), we trained risk prediction LLMs for 20 health outcomes, including myocardial infarction, stroke, and diabetes utilizing a supervised fine-tuning approach. These LLMs were developed with the LLaMA-Factory package, where censoring data and competing risk events were identified through natural language, meeting the needs of survival prediction. The trained LLMs can predict individual health outcomes using health examination reports, effectively addressing the sensitivity of traditional predictive models to missing variables. Additionally, these LLMs can be integrated into the vLLM inference framework, with the Paged-Attention mechanism boosting inference efficiency. Our LLMs achieved superior performance compared to traditional approaches and GPT-4 across validation sets for multiple health outcomes, with an AUC of 0.92 for 5-year diabetes risk prediction, demonstrating strong clinical applicability.

LLMs-based Personalized Health Guidance and Disease Intervention and Hallucinations Metrics


The imbalance in existing medical resources leads to significant disparities in health interventions, with disease prevention and intervention lacking benchmark datasets for large language models (LLMs). Furthermore, hallucination issues hinder the practical application of LLMs in the medical field. To address these challenges, this research focuses on personalized health guidance and interventions based on dynamic multi-expert intelligent agents and explores hallucination metrics for LLMs driven by anomaly detection. Through this in-depth study, we aim to improve the quality and accessibility of healthcare services, reduce healthcare costs, promote personalized health management, and enhance the safety and trustworthiness of large medical models.