Artificial intelligence for microscopy: what you should know

Arti ﬁ cial Intelligence based on Deep Learning (DL) is opening new horizons in biomedical research and promises to revolutionize the microscopy ﬁ eld. It is now transitioning from the hands of experts in computer sciences to biomedical researchers. Here, we introduce recent developments in DL applied to microscopy, in a manner accessible to non-experts. We give an overview of its concepts, capabilities and limitations, presenting applications in image segmentation, classi ﬁ cation and restoration. We discuss how DL shows an outstanding potential to push the limits of microscopy, enhancing resolution, signal and information content in acquired data. Its pitfalls are discussed, along with the future directions expected in this ﬁ eld.


Introduction
A hallmark of human intelligence is the ability to adapt previous knowledge to new situations and to recognize meaning in patterns. Replicating these abilities in non-human agents is the main goal of Artificial Intelligence (AI). Machine learning refers to a subset of AI methods based on extracting useful features from large sets of well-understood data and applying this information to make predictions or decisions on unseen data [1,2]. In the early 2010s, one type of machine learning, Deep Learning (DL), based on so-called neural networks (NNs), became increasingly prominent as a tool for image classification with super-human capabilities [3].
In contrast with classical algorithms which use a set of specifically designed rules to transform an input to a novel output (e.g. a median filter for image denoising) (Figure 1a), an NN is initially presented with a large set of paired input and desired output (respectively noisy and high-quality image, for instance) called the training dataset, from which it learns how to map each input into its corresponding desired output (Figure 1b-i). Therefore, a central difference with conventional algorithms is that the function performed by an NN is essentially determined by the training dataset itself. Once trained, the network can then be used to treat unseen input data to obtain the desired output in a process called inference (Figure 1b-ii).
Although NNs were first envisioned in the 1950s [4], it took decades and the introduction of backpropagation [5,6] until the first NN reached significant performances in pattern recognition tasks in the late 1980s [7][8][9]. While inference with trained networks is generally fast, the training process can be computationally intensive, taking hours to days, especially for complex networks. The success of NNs in image recognition is thus closely linked to the exponential increase in the computational power of processing units, notably Graphical Processing Units (GPUs) and the rapid growth in the availability of large datasets since the 2000s [1,10]. In 2012, the first GPU-enabled NN called AlexNet [3] vastly outperformed the competition at the ImageNet image classification challenge, a seminal breakthrough for the AI field.
Since then, NNs have expanded, outsmarting humans in board games such as Go [11], enabling self-driving cars [12,13] and have significantly improved biomedical image analysis [2,[14][15][16][17]. In the latter field, their applications include automated, accurate classification and segmentation of cell images [18,19], extraction of structures from label-free microscopy imaging (artificial labelling) [20,21] and most recently image restoration (e.g. denoising and resolution enhancement) [22][23][24]. Furthermore, as quantitative imaging is proving increasingly powerful for research, the need for methods able to analyse big-data with (super-)human accuracy has become a highly desirable goal. So, although DL in microscopy is yet to become widely available, the current growth in research efforts hints at a technology with the potential to fundamentally change how imaging data is analysed and how microscopy is carried out.
Here, we give non-specialist readers an overview of the potentials of NNs in the context of some of the major challenges of microscopy. We also discuss some of their current limitations and give an outlook on possible future applications in microscopy. While we briefly cover the basic mathematical principles used in NNs, we refer the reader to the review by Lecun et al. [1], which gives an extended perspective on machine learning and NNs and their historical development. Additionally, we recommend reviews which comprehensively discuss the application of AI in biomedical sciences and computational biology [16,[25][26][27].

How does a neural network learn?
NNs are complex networks of connected 'neurons' arranged in 'layers', a nomenclature inspired by the animal visual cortex [4]. A neuron can be interpreted as a mathematical function with adjustable parameters. A layer is commonly made of a group of neurons that takes the same input data and transfers its output to the next layer in the network. Each layer then provides a new representation of the data to the next layer with growing levels of abstraction. The transformation performed here may contain non-linear operations, such as a rectified linear unit (ReLU) which has been particularly successful for feature extraction tasks [28] since it allows to model much more complex data representations than simpler linear operations. The output of the last layer constitutes the output of the network. The more layers the network has, the deeper it is and the more complex the information it can extract [1,3].
An important form of NN, especially for tasks involving feature recognition in image data are convolutional neural networks (CNNs) [1,9]. Here, neurons extract image features by performing convolutions on the input (a) Classical computer programs convert an input (e.g. noisy image of a cell) into a desired output (e.g. sharp image) via an algorithm with known rules and parameters ('known routine'). On the other hand, NNs are trained with paired of corrupted and ground-truth images, e.g. a noisy and its equivalent high-quality image of a cell. During training (b-i), the untrained network (dark grey) learns to transform the inputs (left) into the output (right) by observing a large number of paired examples from the training dataset. After training (b-ii), the trained network (light grey) can be used to perform the task similarly to a conventional algorithm on novel data, therefore providing the output from new input data. The large black arrowheads represent dataflow.
image. These convolutional layers are often followed by so-called pooling layers which reduce the number of pixels in the image and therefore simplify the feature representations from the convolutional layers. This combination of successive feature extractions and data shrinkage leads to a simplified version of the input image, similar to a barcode, which the network learns to associate to the desired output [9,27].
Here, the NN learns to map from input to output by iteratively adjusting its neurons' parameters such that it minimizes the difference between its own output and the desired output using the training dataset. This is a non-trivial task, especially for deep networks, as this can require iteratively estimating the effect of thousands or millions of parameters. This problem was efficiently addressed by the backpropagation method which allows the network errors to be projected back to every neuron's individual contribution [5,6]. Adjusting the neuron's parameters is then achieved by a method called gradient descent, i.e. changing the parameters such that the error decreases the fastest. Stochastic gradient descent, iteratively using a random example from the training dataset to estimate such parameter change as opposed to evaluating over the entire training dataset, is now the most common method [5,29,30]. After this iterative learning stage, the trained network can be applied to new data for which outputs are not available (Figure 1b-ii).
One issue with deep networks is their potential capacity to imprint the entire training dataset, a process called over-fitting, as opposed to learning generalizable features about the data. This may happen if the training dataset is too small or if the network is too deep (e.g. too many layers). In this case, the network will perform extremely well on the training dataset but will generalize very poorly with new unseen data. Therefore, during training, the network performance is monitored using an unseen validation dataset. Comparing validation and training performance is essential for model selection, i.e. choosing a network architecture that suits the dataset and does not overfit. In a final step, the network is tested on an unseen dataset which was neither contained in training nor validation sets to establish its performance.
Generally, the training dataset should contain many different examples of the desired outputs. For example, a network designed to categorize an animal should be trained with images showing the animal in different positions or environments. While data augmentation can be a powerful way to supplement training datasets [15], generating and curating the training dataset is often the major hurdle for the application of DL. Classification networks such as AlexNet [3] were trained on millions of annotated training instances [10], and while this is not universal [20,22,31], networks used for microscopy applications are often trained with thousands of examples to reach high prediction accuracy [24,32].
In recent years, important technical developments have improved or sped up the learning stage. This includes pre-trained networks (transfer learning [33][34][35], which allows much smaller training datasets to be used), putting two NNs in competition with one another (such as Generative Adversarial Networks, GANs, where one network learns to generate fake datasets and another learns to discriminate fake from real) [24,36] or by allowing the use of very large non-curated datasets directly (self-learning or unsupervised learning) [37][38][39][40][41][42].

Neural networks and microscopy
Researchers in life sciences face several challenges when imaging biological specimens: How can phototoxicity and bleaching of fluorescent labels be balanced against good signal or resolution? How many fluorescent markers can be reliably imaged? And how can relevant and complex information be extracted from large image datasets, without tedious manual annotation and human bias? Aided by the increasing availability of highthroughput imaging, the new generation of DL methods in microscopy has the potential to address some of these problems.
In the following sections, we will present an overview of several exciting recent developments in AI and how they might address some of the current microscopy limitations mentioned above. Despite some conceptual overlaps between the methods presented here, we have separated them into four categories: object detection and classification (facilitating information extraction), image segmentation (allowing large and potentially unbiased high-throughput analysis), artificial labelling (tackling the limitations of the maximum number of fluorescent labels and that of phototoxicity) and image restoration (reducing phototoxicity, improving denoising or resolution).

Object detection and classification
An important goal for microscopy image analysis is to recognize and assign identities to relevant features on an image ( Figure 2). Here, objects on an image can be identified and classified based on the NN analysis. For example, identifying mitotic cells in a tissue sample can be essential for cancer diagnosis. However, manual annotation is tedious, limited in throughput, and experts can introduce bias into such annotations by deciding which image features are important while ignoring others. Although several computational methods have been introduced to accelerate detection or classification tasks [43][44][45], these still often rely on handcrafted parameters, chosen by researchers. The advantage of NNs is their capacity to learn the relevant image features autonomously. Classification NNs have therefore been extensively used in the biomedical imaging field, especially for cancer detection, particularly as large training sets have become more available [15,[46][47][48][49][50] or applied to high-throughput and high-content screens where it has shown expert-level recognition of subcellular features [32,[51][52][53][54]. A new approach in this area is to use unsupervised learning to identify subcellular protein localizations [38]. Lu et al. showed that unsupervised clustering of fluorescent proteins allows explorative studies on protein localization data (as there is no user bias in the input data) and also removes the requirement for manual labelling of a training dataset [38].
NNs have also shown their capacity to accurately identify cellular states from transmitted-light data, for example, differentiating cells based on cell-cycle stage [39], cells affected by phototoxicity [55] or stem cellderived endothelial cells [56]. Determining such cellular identities previously required the introduction of an intracellular label, with the associated risk of affecting the physiology of the cell. These examples show how using NNs could be a less invasive method to identify cell fate or identity.

Image segmentation
Segmentation is the identification of image regions that are part of specific cellular or subcellular structures and often is an essential step in image analysis ( Figure 3). In this case, unlike the classification approach, the NN identifies whether each pixel belongs to a category of structure, typically defined as background vs. signal. A drawback of some existing segmentation platforms [43,45,57] is their need for user-based fine-tuning and manual error-removal, requiring time and expertise or adding human bias [58]. In multiple studies, CNNs have outperformed classical approaches in terms of accuracy and generalization [18,[58][59][60][61], especially when performing cell segmentation in co-cultures of multiple cell types [58]. In the context of histopathology, CNNs have been successfully used to segment colon glands [62][63][64][65][66], breast tissues [67,68] and nuclei [69] outperforming non-DL approaches. Naturally, there is overlap between the challenges of classification and segmentation, hence, segmentation is often used with subsequent classification and can even improve the accuracy of classification [49,70]. Schematic of a trained NN which detects and classifies cells of different types or stages, e.g. to identify mitotic cells.
A categorization map (right) can be obtained from a brightfield image (left). Here, the cell cycle stage of each cell is predicted.
During training, the network was presented with a set of representative images of cells at different stages of the cell cycles that were manually annotated. Schematic of a trained NN that produces segmentation masks from brightfield images. Given a brightfield input image of cells (left), the network assigns pixel values to the segmentation mask corresponding to single cells against the background (right).
During training, the network was presented a set of brightfield images that were manually segmented.
The segmentation field has also pioneered a network architecture called U-net [18] with wider importance in microscopy, especially when both input and output of the NN are an image (image-to-image algorithm). This U-net architecture uses many convolution/pooling layers (the encoder), followed with many layers of de-convolution/upsampling (the decoder) [18,59]. The encoder learns the main features of the image and the decoder reassigns them to different pixels of the image. While initially used for segmentation tasks, these architectures can be adapted to other image-to-image transformations (as opposed to simple classification of the image), making them some of the most important networks for microscopy applications today [17,20,22,69,71].

Artificial labelling
The direct observation of specific structures in cells using light microscopy typically requires the introduction of labels, either by genetic labelling or chemical staining, which can disturb the biological system. Additionally, fluorescence microscopy, especially when using laser illumination, is inherently more phototoxic to cells than transmitted-light imaging [72,73]. Addressing these limitations, two studies using CNNs have shown that specific cellular structures, such as nuclear membrane, nucleoli, plasma membranes and mitochondria, can be extracted by NNs from label-free images [20,21]. While the task of artificial labelling is similar to segmentation, the main difference in this approach lies in the creation of the training dataset which does not require to be hand-labelled. Instead, the training set contains paired images obtained from brightfield and fluorescence modalities of the same cells. The networks then learn to predict a fluorescent label from transmitted light or EM images, alleviating the need to acquire the corresponding fluorescence images (Figure 4). This capability is especially useful when performing long-term, live-cell imaging where low phototoxicity acquisitions are highly advantageous. Interestingly, the networks achieved high accuracy using a training dataset of only 30-40 images [20] and were able to identify dying cells or distinguish different cell types and subcellular structures [21]. Christiansen et al. [21] also demonstrated their network's ability for transfer learning, allowing a pre-trained network to be applied between different microscopes and labels, highlighting the versatility of these networks' performance. However, the lack of good understanding about the origin of the features that the networks are able to produce from the label-free modalities has generated some scepticism and fuelled debate around artificial labelling.

Image restoration: resolution and signal
The amount and quality of features which can be extracted from a microscopic image are limited by fundamental constraints inherent to all optical set-ups: signal-to-noise ratio (SNR) and resolution. Overcoming these limitations constitutes a central goal in microscopy. In particular, super-resolution microscopy (SRM) [74][75][76][77][78] now allows imaging of cellular structures at the nanoscale using light microscopy. However, phototoxicity, bleaching and low temporal resolution still limit the capacity to achieve high-resolution long-term imaging in living specimens. Recently, several research groups have proposed CNN methods addressing some of these issues [22,23].
For such networks, training datasets consist, for instance, of paired images acquired at low and high SNR, respectively and the network learns to predict a denoised (high SNR) image from a noisy input (low SNR). This approach was demonstrated by Weigert et al. [22] with their content-aware image restoration (CARE) methodology, on the highly photosensitive organism Schmidtea mediterranea which allowed a 60-fold decrease in illumination dose, thus enabling longer and more detailed observation of this organism in vivo. CARE also demonstrated the successful restoration of axial resolution in deep microscopy sections, performing better than conventional reconstruction approaches, such as deconvolution. Furthermore, CARE was able to reconstruct SRM images from diffraction-limited images, using the Super-Resolution Radial Fluctuation (SRRF) method as a reference [78][79][80]. Similarly, SRM images can be obtained from conventional confocal microscopy images using STimulated Emission Depletion (STED) microscopy to acquire the high-resolution training dataset [24].
Given the difficulties of creating large annotated training sets, different unsupervised learning methods for image restoration requiring no labelled training data have recently been explored [41,42]. Here, a network learns image denoising on a dataset of noisy images alone. While these methods may not always reach the performance of networks trained with ground-truth data [41], this represents an interesting avenue for tasks where large training sets are difficult or impossible to assemble.

Using neural networks in single-molecule localization microscopy
CNNs have also recently generated interest in the single-molecule localization microscopy (SMLM) field. All available studies were published within the last year, by independent groups, suggesting that the potential of AI for SRM is increasingly recognized in the community [23,[81][82][83][84]. Applying sophisticated network architectures, with combinations of widefield and SMLM data as inputs [23,81], the networks are able to directly map sparse SMLM data of either microtubules, mitochondria or nuclear pores directly into their SRM output images. This demonstrates the strength of CNNs for pattern recognition in redundant data, like SMLM data where only a few frames may suffice to reconstruct an SRM image. Interestingly, some of these algorithms require no parameter tuning or specific knowledge about the imaged structures [81]. Especially, for high emitter density, this is advantageous over conventional SMLM reconstruction algorithms which can be time-consuming. However, the means by which an NN can learn to produce SRM images from sparse or widefield data remain a heavily debated topic in the field of both microscopy and AI.
Other studies have used a different approach to SMLM reconstruction by making use of the intrinsic properties of SMLM data [82,83]. Here, networks are trained to detect the spatial positions of fluorophores from SMLM input images, similar to a typical SMLM algorithm. This approach partially circumvents the controversy because the reconstructed images are therefore more similar to standard SMLM reconstructions making the resolution improvement easier to interpret. While achieving similar accuracy to state-of-the-art SMLM algorithms [85], a main achievement of DL for SMLM is the reconstruction speed with which super-resolved images can be produced. In several studies, this was improved by several orders of magnitude compared with conventional reconstruction algorithms [23,81,82,84].

Discussion
AI is transforming microscopy both by allowing human or super-human performances for many image analysis tasks and as an automated high-performance tool for big-data analysis [32,53,58] (Table 1). However, while performance, versatility and speed of DL are likely to continue increasing, there are significant challenges which will not be solved by improved processing units.
Fundamentally, the task carried out by the NN as well as its performance is determined by the quality of the training dataset. So, any bias present in the training dataset (commonly introduced by the user at the selection level) will be subsequently incorporated in the network. This highlights the need for detailed data curation which depends heavily on the task at hand. For instance, in the case of a classification task, under-represented populations might be less accurately classified, or a model could overfit to the training examples. For the training dataset to cover a representative set, it is often important for it to contain thousands to millions of examples. In the absence of a robust training dataset, a user should either consider selecting a different model architecture or even alternatives to DL which exist in the form of other machine learning approaches or classical computer programs [27].
Another frequently raised concern in the microscopy community over DL is how much network outputs can be trusted to represent the underlying data. This is a real concern since CNNs have been observed to cause image hallucinations [86] or to fail catastrophically simply as a result of minute changes in the image [87]. To address this issue, several groups have assessed the presence of artefacts in their network output images, notably using the SQUIRREL (Super-resolution QUantitative Image Rating and Reporting of Error Locations) approach [22][23][24]80,84,88]. While this may identify the presence of artefacts, it does not address the underlying problem that it is difficult to interpret how CNN networks produce their output from the image input, especially due to the abstraction of data representation in deep networks. This lack of interpretability of network outputs is particularly concerning in the case of resolution enhancement, where it is not clear what information a CNN can extract from a diffraction-limited image to achieve a non-diffraction-limited image and how DL algorithms achieve this without producing significantly more artefacts than standard algorithms [22,24]. Similar concerns exist for artificial labelling, as it may prove challenging to interpret the difference between signal and hallucinations of the network. Besides issues of interpretability, there are other anecdotal examples where networks have 'cheated' their way to high performance, e.g. by using undesirable features such as empty space to identify dead cells [55] or by identifying patterns in the ordering of the training set, but not  [24] This table covers the main four themes where AI has provided solutions to some of the major limitations of microscopy in the recent years.
in the images themselves [89]. This shows how much of the performance of DL methods relies on the choice and curation of training datasets. Furthermore, the design of CNN architectures has been referred to as 'notorious as an empirical endeavour' [21]. Choosing network hyperparameters such as network depth, number of neural connections, learning rate and other hand-coded features of NNs [32,83,90,91] and the necessary hardware often require in-depth technical know-how, which limits accessibility for many potential users in the life sciences.
Nevertheless, AI has great enabling potential for microscopy, given super-human performance in classification tasks and image reconstruction. Hence, the issues discussed above should not discourage the use of NNs as a research tool but be a reason for caution when interpreting the performance of NNs, as for any computational analysis tool.

Outlook
A rapidly increasing number of publications using DL in microscopy suggests that this technology can be a versatile and powerful tool to address some significant problems in biomedical imaging. However, the delay between developments and their applications means that some areas of AI research have not yet been widely translated to microscopy. For example, transfer learning is an area which will likely become more widely investigated, allowing the use of pre-trained networks to carry out a new task, forms of which are only starting to become available [19]. Finding methods to reuse NNs robustly on multiple different tasks, different image sizes or images taken on different microscopes would make DL a more flexible and usable approach for image analysis than is currently possible. Importantly, it would reduce the need for large training datasets and shorten the training time needed for new tasks. It could therefore lower the accessibility barrier of the approach and minimize the need for users to be fully familiar with NN specifics. In turn, this would allow DL to become a more widely used tool within life sciences, rather than a method that needs expert knowledge. Additionally, we expect that the AI field will develop tools to inspect and detect network failures which would build trust and establish the role that AI can and cannot play in modern research. We can also envisage new AI-enabled technologies that allow integrated microscopy platforms [92] to be controlled by an artificial agent, therefore optimizing microscopy at the image acquisition level.

Perspectives
• Importance of the field. The field of Deep Learning (DL) applied to microscopy shows incredible promises to transform the way we acquire and analyse our microscopy data. Historically developed to automate tedious image segmentation and classification in biomedical images, it is beginning to be used in many imaging tasks, identifying subcellular features, allowing recovery of high-quality images from noisy data or specific cellular labels from unlabelled specimens.
• Current state of the field. Applications of DL are currently being developed by expert computer scientists who can deploy the large computing resources required for training these networks. However, these resources are extremely versatile since typical architectures of network (such as U-net architectures) can be used for numerous tasks. Therefore, a new limitation has emerged and lies in the generation and curation of the datasets necessary to train the networks.
• Future directions. The design, implementation and use of DL to microscopy is bound to be democratized largely thanks to the availability of hardware and software packages making these accessible. There, however, remain concerns about the biases built into networks due to, e.g. curation of training data, catastrophic failures of network, which remain to be studied in detail. The field is still undergoing an exponential development and many approaches developed for robotics or computer vision will likely permeate within biomedical research, creating new opportunities for researchers in the life sciences.