摘要
Objective To assess inter-observer variations of pulmonary nodule marking in routine clinical chest digital radiograph (DR) softcopy reading by using a lung nodule computer toolkit.Methods A total of 601 chest posterior-anterior DR images were randomly selected from routine outpatient screening in Peking Union Medical College Hospital. Two chest radiologists with experience more than ten years were first asked to read the images and mark all suspicious nodules independently by using computer toolkit IQQA-Chest, and to indicate the likelihood for each nodule detected. They were also asked to draw the boundary of the identified nodule manually on an enlarged region of interest, which was instantly analyzed by IQQA-Chest. Two sets of diagnostic reports, including the marked nodules, likelihood, manually drawn boundaries, quantitative measurements, and radiologists’ names, were automatically generated and stored by the computer system. One week later, the two radiologists read the same images together by using the same computer toolkit without referring to their previous reading results. Marking procedure was the same except that consensus was reached for each suspicious region. Statistical analysis tools provided in the IQQA-Chest were used to compare all the three sets of reading results.Results In the independent readings, Reader 1 detected 409 nodules with a mean diameter of 12.4 mm in 241 patients, and Reader 2 detected 401 nodules with a mean diameter of 12.6 mm in 253 patients. In the consensus reading, a total of 352 nodules with a mean diameter of 12.4 mm were detected in 220 patients. Totally, 42.3% of Reader 1’s and 45.1% of Reader 2’s marks were confirmed by the consensus reading. About 40% of each reader’s marks agreed with the other. There were only 130 (14.4%) out of the total 904 unique nodules were confirmed by both readers and the consensus reading. Moreover, 5.6% (51/904) of the marked regions were rated identical likelihood in all three readings. Statistical analysis showed significant differences between Readers 1 and 2, and between consensus and Reader 2 in determining the likelihood of the marks (P<0.01), but not between consensus and Reader 1. No significant difference in terms of size was observed in nodule segmentation between either two of the three readings. Conclusion Large variations in nodule marking and nodule-likelihood determination but not in nodule size were observed between experts as well as between single-person reading and consensus reading.