Interreader Reliability of LI-RADS Version 2014 Algorithm and Imaging Features for Diagnosis of Hepatocellular Carcinoma: A Large International Multireader Study.
Radiology. 2018 Jan;286(1):173-185. doi: 10.1148/radiol.2017170376. Epub 2017 Nov 1. Fowler KJ1, Tang A1, Santillan C1, Bhargavan-Chatfield M1, Heiken J1, Jha RC1, Weinreb J1, Hussain H1, Mitchell DG1, Bashir MR1, Costa EAC1, Cunha GM1, Coombs L1, Wolfson T1, Gamst AC1, Brancatelli G1, Yeh B1, Sirlin CB1.
Purpose: To determine in a large multicenter multireader setting the interreader reliability of Liver Imaging Reporting and Data System (LI-RADS) version 2014 categories, the major imaging features seen with computed tomography (CT) and magnetic resonance (MR) imaging, and the potential effect of reader demographics on agreement with a preselected nonconsecutive image set.
Materials and Methods: Institutional review board approval was obtained, and patient consent was waived for this retrospective study. Ten image sets, comprising 38-40 unique studies (equal number of CT and MR imaging studies, uniformly distributed LI-RADS categories), were randomly allocated to readers. Images were acquired in unenhanced and standard contrast material-enhanced phases, with observation diameter and growth data provided. Readers completed a demographic survey, assigned LI-RADS version 2014 categories, and assessed major features. Intraclass correlation coefficient (ICC) assessed with mixed-model regression analyses was the metric for interreader reliability of assigning categories and major features.
Results: A total of 113 readers evaluated 380 image sets. ICC of final LI-RADS category assignment was 0.67 (95% confidence interval [CI]: 0.61, 0.71) for CT and 0.73 (95% CI: 0.68, 0.77) for MR imaging. ICC was 0.87 (95% CI: 0.84, 0.90) for arterial phase hyperenhancement, 0.85 (95% CI: 0.81, 0.88) for washout appearance, and 0.84 (95% CI: 0.80, 0.87) for capsule appearance. ICC was not significantly affected by liver expertise, LI-RADS familiarity, or years of postresidency practice (ICC range, 0.69-0.70; ICC difference, 0.003-0.01 [95% CI: -0.003 to -0.01, 0.004-0.02]. ICC was borderline higher for private practice readers than for academic readers (ICC difference, 0.009; 95% CI: 0.000, 0.021).
Conclusion: ICC is good for final LI-RADS categorization and high for major feature characterization, with minimal reader demographic effect. Of note, our results using selected image sets from nonconsecutive examinations are not necessarily comparable with those of prior studies that used consecutive examination series. © RSNA, 2017.