Automatic Prediction of Conductive Hearing Loss Using Video Pneumatic Otoscopy and Deep Learning Algorithm
Hayoung Byun,Chae Jung Park,Seong Je Oh,Myung Jin Chung,Baek Hwan Cho,Yang-Sun Cho
DOI: https://doi.org/10.1097/AUD.0000000000001217
Abstract:Objectives: Diseases of the middle ear can interfere with normal sound transmission, which results in conductive hearing loss. Since video pneumatic otoscopy (VPO) findings reveal not only the presence of middle ear effusions but also dynamic movements of the tympanic membrane and part of the ossicles, analyzing VPO images was expected to be useful in predicting the presence of middle ear transmission problems. Using a convolutional neural network (CNN), a deep neural network implementing computer vision, this preliminary study aimed to create a deep learning model that detects the presence of an air-bone gap, conductive component of hearing loss, by analyzing VPO findings. Design: The medical records of adult patients who underwent VPO tests and pure-tone audiometry (PTA) on the same day were reviewed for enrollment. Conductive hearing loss was defined as an average air-bone gap of more than 10 dB at 0.5, 1, 2, and 4 kHz on PTA. Two significant images from the original VPO videos, at the most medial position on positive pressure and the most laterally displaced position on negative pressure, were used for the analysis. Applying multi-column CNN architectures with individual backbones of pretrained CNN versions, the performance of each model was evaluated and compared for Inception-v3, VGG-16 or ResNet-50. The diagnostic accuracy predicting the presence of conductive component of hearing loss of the selected deep learning algorithm used was compared with experienced otologists. Results: The conductive hearing loss group consisted of 57 cases (mean air-bone gap = 25 ± 8 dB): 21 ears with effusion, 14 ears with malleus-incus fixation, 15 ears with stapes fixation including otosclerosis, one ear with a loose incus-stapes joint, 3 cases with adhesive otitis media, and 3 ears with middle ear masses including congenital cholesteatoma. The control group consisted of 76 cases with normal hearing thresholds without air-bone gaps. A total of 1130 original images including repeated measurements were obtained for the analysis. Of the various network architectures designed, the best was to feed each of the images into the individual backbones of Inception-v3 (three-column architecture) and concatenate the feature maps after the last convolutional layer from each column. In the selected model, the average performance of 10-fold cross-validation in predicting conductive hearing loss was 0.972 mean areas under the curve (mAUC), 91.6% sensitivity, 96.0% specificity, 94.4% positive predictive value, 93.9% negative predictive value, and 94.1% accuracy, which was superior to that of experienced otologists, whose performance had 0.773 mAUC and 79.0% accuracy on average. The algorithm detected over 85% of cases with stapes fixations or ossicular chain problems other than malleus-incus fixations. Visualization of the region of interest in the deep learning model revealed that the algorithm made decisions generally based on findings in the malleus and nearby tympanic membrane. Conclusions: In this preliminary study, the deep learning algorithm created to analyze VPO images successfully detected the presence of conductive hearing losses caused by middle ear effusion, ossicular fixation, otosclerosis, and adhesive otitis media. Interpretation of VPO using the deep learning algorithm showed promise as a diagnostic tool to differentiate conductive hearing loss from sensorineural hearing loss, which would be especially useful for patients with poor cooperation.