Background: Artificial intelligence (AI) has the potential to transform medical diagnostics by enhancing the accuracy and efficiency of diagnostic processes. Its application in clinical practice can greatly support me...Background: Artificial intelligence (AI) has the potential to transform medical diagnostics by enhancing the accuracy and efficiency of diagnostic processes. Its application in clinical practice can greatly support medical professionals by offering improved tools for faster and more precise diagnoses. Understanding AI’s capabilities is essential for its successful integration into medical diagnostics. In this context, evaluating the performance of different AI models in the diagnostic process becomes particularly important. The objective of this study is to qualitatively evaluate the diagnostic performance of three AI models—ChatGPT-4o, CodyMD, and Dr. Gupta—based on patient-reported symptoms. Objectives: The aim of the study is to compare the three AI models in terms of diagnostic accuracy, the level of detail in the provided information, the interaction between the models and patients, and the number of differential diagnoses offered. Results: ChatGPT-4o achieved the highest accuracy, correctly diagnosing 90% of the cases. The model provides basic information and focuses on a single most likely diagnosis. CodyMD and Dr. Gupta achieved 50% accuracy, with CodyMD using an interactive approach and offering differential diagnoses with probability percentages for each. Dr. Gupta provided educational medical information and differential diagnoses without probability estimates. Conclusions: The AI models assessed can assist medical professionals in the diagnostic process, but they require further refinement and optimization. ChatGPT-4o stands out for its high accuracy, though increased patient interaction is needed. CodyMD excels in offering an interactive approach and more detailed responses, but requires improved accuracy. Dr. Gupta provided differential diagnoses, but the information provided is suitable for educational purposes. While these models show potential for clinical use, further research is needed to optimize and validate them in real-world settings.展开更多
文摘Background: Artificial intelligence (AI) has the potential to transform medical diagnostics by enhancing the accuracy and efficiency of diagnostic processes. Its application in clinical practice can greatly support medical professionals by offering improved tools for faster and more precise diagnoses. Understanding AI’s capabilities is essential for its successful integration into medical diagnostics. In this context, evaluating the performance of different AI models in the diagnostic process becomes particularly important. The objective of this study is to qualitatively evaluate the diagnostic performance of three AI models—ChatGPT-4o, CodyMD, and Dr. Gupta—based on patient-reported symptoms. Objectives: The aim of the study is to compare the three AI models in terms of diagnostic accuracy, the level of detail in the provided information, the interaction between the models and patients, and the number of differential diagnoses offered. Results: ChatGPT-4o achieved the highest accuracy, correctly diagnosing 90% of the cases. The model provides basic information and focuses on a single most likely diagnosis. CodyMD and Dr. Gupta achieved 50% accuracy, with CodyMD using an interactive approach and offering differential diagnoses with probability percentages for each. Dr. Gupta provided educational medical information and differential diagnoses without probability estimates. Conclusions: The AI models assessed can assist medical professionals in the diagnostic process, but they require further refinement and optimization. ChatGPT-4o stands out for its high accuracy, though increased patient interaction is needed. CodyMD excels in offering an interactive approach and more detailed responses, but requires improved accuracy. Dr. Gupta provided differential diagnoses, but the information provided is suitable for educational purposes. While these models show potential for clinical use, further research is needed to optimize and validate them in real-world settings.