Following the rise of deep learning, AI is making its way into the medical sector. A recent study showed that AI could detect breast cancer based on screening mammograms with comparable accuracy as expert radiologists. The study was able to achieve these excellent results not only because of the model development by Google Deepmind AI, but also because of a dataset that consisted out of almost 29,000 mammograms. As with every machine learning project, both the quantity and quality of the underlying data are key to its success.
Mammograms are only one type of medical images where AI could add value, Whole Slide Imaging (WSI) is another promising one. Whole slide scanners capture images of tissue sections. These images can take up to several gigabytes and are too large to feed in any machine learning algorithm. A common practice has been to overlay the slide with a grid and extract smaller patches that are than fed to an algorithm.
We enhanced this method by developing a new data generator that can create an infinite stream of patches on the fly and automatically balance the different classes. Instead of using a fixed grid, the generator uses stochastic sampling to sample a steady stream of random patches. Patches are not physically stored on a drive but generated on the fly. The random generation may seem a time-consuming process, but as the bottleneck is in the training of the model, there is no delay in the process flow. As a result, this patch generator allows us to quickly set up a dataset for any machine learning algorithm that uses WSI.
For a more in-depth explanation, take a look at our Medium blog.