Objective Appropriate medical imaging is important for value-based care.We aim to evaluate the performance of generative pretrained transformer 4(GPT-4),an innovative natural language processing model,providing approp...Objective Appropriate medical imaging is important for value-based care.We aim to evaluate the performance of generative pretrained transformer 4(GPT-4),an innovative natural language processing model,providing appropriate medical imaging automatically in different clinical scenarios.Methods Institutional Review Boards(IRB)approval was not required due to the use of nonidentifiable data.Instead,we used 112 questions from the American College of Radiology(ACR)Radiology-TEACHES Program as prompts,which is an open-sourced question and answer program to guide appropriate medical imaging.We included 69 free-text case vignettes and 43 simplified cases.For the performance evaluation of GPT-4 and GPT-3.5,we considered the recommendations of ACR guidelines as the gold standard,and then three radiologists analyzed the consistency of the responses from the GPT models with those of the ACR.We set a five-score criterion for the evaluation of the consistency.A paired t-test was applied to assess the statistical significance of the findings.Results For the performance of the GPT models in free-text case vignettes,the accuracy of GPT-4 was 92.9%,whereas the accuracy of GPT-3.5 was just 78.3%.GPT-4 can provide more appropriate suggestions to reduce the overutilization of medical imaging than GPT-3.5(t=3.429,P=0.001).For the performance of the GPT models in simplified scenarios,the accuracy of GPT-4 and GPT-3.5 was 66.5%and 60.0%,respectively.The differences were not statistically significant(t=1.858,P=0.070).GPT-4 was characterized by longer reaction times(27.1 s in average)and extensive responses(137.1 words on average)than GPT-3.5.Conclusion As an advanced tool for improving value-based healthcare in clinics,GPT-4 may guide appropriate medical imaging accurately and efficiently。展开更多
基金National Natural Science Foundation of China(Grant Nos.62171297 and 61931013).
文摘Objective Appropriate medical imaging is important for value-based care.We aim to evaluate the performance of generative pretrained transformer 4(GPT-4),an innovative natural language processing model,providing appropriate medical imaging automatically in different clinical scenarios.Methods Institutional Review Boards(IRB)approval was not required due to the use of nonidentifiable data.Instead,we used 112 questions from the American College of Radiology(ACR)Radiology-TEACHES Program as prompts,which is an open-sourced question and answer program to guide appropriate medical imaging.We included 69 free-text case vignettes and 43 simplified cases.For the performance evaluation of GPT-4 and GPT-3.5,we considered the recommendations of ACR guidelines as the gold standard,and then three radiologists analyzed the consistency of the responses from the GPT models with those of the ACR.We set a five-score criterion for the evaluation of the consistency.A paired t-test was applied to assess the statistical significance of the findings.Results For the performance of the GPT models in free-text case vignettes,the accuracy of GPT-4 was 92.9%,whereas the accuracy of GPT-3.5 was just 78.3%.GPT-4 can provide more appropriate suggestions to reduce the overutilization of medical imaging than GPT-3.5(t=3.429,P=0.001).For the performance of the GPT models in simplified scenarios,the accuracy of GPT-4 and GPT-3.5 was 66.5%and 60.0%,respectively.The differences were not statistically significant(t=1.858,P=0.070).GPT-4 was characterized by longer reaction times(27.1 s in average)and extensive responses(137.1 words on average)than GPT-3.5.Conclusion As an advanced tool for improving value-based healthcare in clinics,GPT-4 may guide appropriate medical imaging accurately and efficiently。