For several decades computer scientists have been attempting to build medical software to help physicians analyze medical images. Until 2012, when deep neural networks first proved their effectiveness, most attempts included extensive feature engineering tailored to specific types of medical images, and were usually low-quality and therefore ineffective in helping doctors in practice. In recent years, deep learning models dedicated specifically to medical image analysis have shown significant improvements, bringing us closer to the widespread use of machine learning in medical image analysis.
In a new Nature paper, NYU researchers analyze 3D MRI images of the proximal femur (thigh bone) among postmenopausal women, with the goal of recognizing signs of osteoporosis, a disease which causes bone weakness. They provide a detailed comparison between 2D and 3D neural networks for medical image recognition and show that 3D convolution neural networks (CNNs) are more effective and less likely to miss regions of interest in medical images.
While medical image analysis with deep learning is not yet comparable to human analysis, recent results may already have the potential to assist physicians when analyzing medical images, a fact which is also exemplified by the recent results of Google’s LYNA algorithm for breast cancer detection.
A major challenge with training machine learning models on medical images is the high cost of obtaining a dataset, due to the combined requirement of patient approval and the need for expert analysis of the image. This is in contrast to non-medical machine learning, which usually relies on freely available Internet images and a Mechanical Turk standard of “human level” image recognition. To build medical machine learning systems with limited data, researchers apply extensive data augmentation, including stretching, gray-scaling, applying elastic deformations, and more, generating a large amount of synthetic training data. Recent studies, detailed in this summary, have shown that it’s possible to build an effective machine learning model with just dozens of expert-marked images.
One advantage of medical images compared to common Internet images is the high density of actionable information in a single image. One image, for instance of a bone, will include many different bone areas which are separately marked as ‘healthy’, ‘unhealthy’, or ‘partially healthy’, and each one provides valuable insight for training and prediction.
In 2015, Ronnenberg et al. presented U-net, a deep Convolutional Neural Network (CNN) architecture that achieved state-of-the-art results on several types of segmentation tasks, where the model is required to identify cells in electron microscope images. U-net receives its name from its u-shaped structure, wherein an image goes through several convolution layers in an encoder phase, and then proceeds through several deconvolution layers in a decoder phase.
In the encoder phase, each layer includes two 3×3 convolutions, followed by a ReLU unit and a 2×2 max pooling operation, which performs the downsampling. In the decoder phase, each layer performs upsampling through 2×2 convolutions (“up-convolution”), then inserts the data into two 3×3 convolutions, with the result concatenated to the corresponding value in the downsampling stage (see image). With this architecture, they achieved state-of-the-art results on two electron microscope analysis challenges – ISBI cell tracking challenge 2015 and EM segmentation challenge.
In 2016, Cicek et al. expanded the U-net model to 3D U-net, demonstrating that U-net can be significantly more effective in analyzing 3D images (such as CT and MRI results) by using a 3D CNN, as opposed to dividing the 3D data into 2D slices and analyzing it with a 2D CNN. Their updated U-net model is very similar in style to the original U-net model but uses 3x3x3 convolutions and 2x2x2 max-pooling on 3D data instead of 3×3 convolutions and 2×2 max-pooling operations on several instances of 2D data.
The researchers compared the 3D U-net structure to 2D U-net with 3D microscopic kidney scans, and found that the average precision improved from 0.796 with 2D U-net to 0.863 with 3D U-net. Interestingly, the study found that when applying massive data augmentation, two 3D scans are enough to gain valuable insight into the structure of the human kidney in its embryonic stages.
In a recent paper published in Nature, Deniz et al. expand on the 2016 3D U-net result and show that the 3D model is effective for analysis of Magnetic Resonance images, generated by the ubiquitous MRI scan. The researchers analyzed 3D images of the proximal femur (thigh bone) among postmenopausal women, with the goal of recognizing signs of osteoporosis, a disease which causes bone weakness and degradation.
The study makes a dedicated effort to compare 2D CNNs with 3D CNNs and finds that a vanilla 3D U-net model shows demonstrably better results than vanilla 2D U-net models in a wide range of MRI analysis metrics:
DSC – Metric considering True Positives, False Positives, and False Negatives
ASD & MSD – Surface-based distance measurements between ground truth and analysis
After testing vanilla CNNs, the researchers further look into post-processing techniques, which can be effective in improving the results of 2D analyses, for instance by removing markings of weak bones which are small and therefore statistically unlikely to be problematic. They find that post-processing indeed improved 2D CNN results and puts them on par with 3D CNN results in terms of precision and DSC, but fail to catch up with non-processed 3D results in the more significant measure of recall (e.g. not missing possible bone weaknesses), as well as surface-based distance measurements like ASD and MSD.
Recent studies are demonstrating that the progress in computer vision through deep neural networks is translating to applications in medical image analysis. With much of medical data coming in a 3D format, using 3D CNNs is often a superior approach and should generally be considered the default approach, especially in cases where post-processing best practices haven’t yet been established.
Our thanks to Cem Deniz and Kyunghyun Cho for insights and clarifications to the Deniz et al. paper.