Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug 6;14(1):18184.
doi: 10.1038/s41598-024-68918-2.

Convolutional neural network transformer (CNNT) for fluorescence microscopy image denoising with improved generalization and fast adaptation

Affiliations

Convolutional neural network transformer (CNNT) for fluorescence microscopy image denoising with improved generalization and fast adaptation

Azaan Rehman et al. Sci Rep. .

Abstract

Deep neural networks can improve the quality of fluorescence microscopy images. Previous methods, based on Convolutional Neural Networks (CNNs), require time-consuming training of individual models for each experiment, impairing their applicability and generalization. In this study, we propose a novel imaging-transformer based model, Convolutional Neural Network Transformer (CNNT), that outperforms CNN based networks for image denoising. We train a general CNNT based backbone model from pairwise high-low Signal-to-Noise Ratio (SNR) image volumes, gathered from a single type of fluorescence microscope, an instant Structured Illumination Microscope. Fast adaptation to new microscopes is achieved by fine-tuning the backbone on only 5-10 image volume pairs per new experiment. Results show that the CNNT backbone and fine-tuning scheme significantly reduces training time and improves image quality, outperforming models trained using only CNNs such as 3D-RCAN and Noise2Fast. We show three examples of efficacy of this approach in wide-field, two-photon, and confocal fluorescence microscopy.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Backbone and finetuning to train the light microscopy image enhancement model. (A) Previous methods generally train a separate model for every sample or microscopy type. Such training from scratch methods are effective but need many samples and an extended training time to reach optimal performance. Furthermore, since every training set is independent, the model cannot use the other samples or microscopy type to help the current imaging experiment. (B) Here we first train a backbone model from large, diverse, and previously curated data. The trained backbone model is then fine-tuned for every new experiment, using a much smaller amount of new data. Given an effective backbone model architecture, this method will be much faster in training, and allows reusing information acquired in previous experiments. Inspired by the success of transformer model in language pre-training, we propose a novel imaging transformer architecture, CNNT, to serve as the effective backbone.
Figure 2
Figure 2
The CNNT U-net architecture. (A) The whole model consists of pre and post convolution layers and the backbone. The input tensor has the size of [B, Z, C, H, W] for batch, depth, channel, height, and weight. The C input channel is first uplifted to 32 input channels into the backbone. The post-conv layer will convert the output tensor from the backbone to C channel. There is a long-term skip connection over the backbone. (B) The backbone has a Unet structure, consisting of two downsample blocks and two upsample blocks. Every downsample CNNT block will double the number of channels but reduce the spatial size by a factor of two. Every upsample block will reduce the number of channels and expand the spatial size by a factor of two. (C) The CNNT block includes only CNNT cells. Every cell contains CNN attention, instance norm and CNN mixer. This design mimics the standard transformer cell design but replaces the linear attention and mixers with CNN attention and CNN mixers, reducing computational cost for high resolution images. (D) The CNN attention is the key part of the imaging transformer cell. Unlike the linear layers in the standard transformer, the key, value, and query tensors are computed with convolution layers, which reduces computation cost of processing high-resolution images while also maintaining a good inductive bias. The attention coefficients are computed between query and key and applied to the value tensor to compute attention outputs.
Figure 3
Figure 3
Widefield microscopy experiment, imaging MEF cells. The pre-trained CNNT backbone was finetuned on 5 and 10 widefield image samples individually. The resulting model was compared to 3D-RCAN and Noise2Fast for image quality and computing time. (A) The low-quality noisy image as the input to the models. (B, C) The CNNT results after finetuning for 30 epochs on 5 and 10 samples. The quality improvement is noticeable. (D) The 3D-RCAN model trained from scratch for 300 epochs gave good improvement. (E) The Noise2Fast result is subpar. (F) The high-quality ground-truth for SSIM3D and PSNR computation and for reference. (GL) Zoomed in versions of ghted parts in (AF), respectively. The CNNT finetuning is much faster than 3D-RCAN training and Noise2Fast and offers better quality measurements.
Figure 4
Figure 4
Two-photon microscopy experiment, imaging pancreas of a zebrafish. (A) The low-quality image does not provide enough SNR and contrast to delineate the structural features of the pancreas. (B, C, D) The CNNT greatly improved the image quality when using 5, 10, and 20 training samples. The model is robust even for 5 samples, leading to a very fast ~ 3.5 min finetuning time. (E and F) The 3D-RCAN and Noise2Fast training times are much longer with suboptimal quality recovery. (G) The ground-truth in this experiment bears a still lower SNR. (HN) Zoomed in versions of highlighted parts in (AG), respectively. The CNNT models achieved better quality than the ground-truth images, which could be the result of pre-training.
Figure 5
Figure 5
Multi-average tests for the zebrafish imaging with repeated acquisition. (A) The imaging was repeated for N = 64 repetitions to image zebrafish liver and pancreas. Averaging the first n images creates the image Avg n. This gives us a series of images with gradual increase in SNR from Avg 1 to Avg 64. CNNT models were tested for robustness for different levels of input quality with increasing number of averages. The zebrafish liver are shown here. The predicted result for Avg 1 (the lowest quality input) shows residual noise, indicating the model “breaks” at this input SNR. Starting from the Avg 2, mode gives consistently good quality outputs. (B) The pancreas results are shown for Avg 1, 4, 8, 16, 32 and 64. In this case, the model was robust against the lower input SNR and recovered finer features.
Figure 6
Figure 6
Confocal microscopy, imaging of mouse lung tissue. (A) The low-quality image was acquired with very low photon counts. (B, C, D) CNNT finetuning with 5, 10, 20 samples show recovered tissue structures and removal of background random noise. (E) The 3D-RCAN model also gave good improvement in quality. (F) The Noise2Fast had more signal fluctuation, compared to supervised models. (G) The high-quality ground-truth acquisition reveals the tissue anatomical structure. (HN) Zoomed in versions of highlighted parts in (AG), respectively. Again, the timesaving of CNNT finetuning is prominent, with superior or similar image quality.

Update of

Similar articles

References

    1. Sahl, S. J., Hell, S. W. & Jakobs, S. Fluorescence nanoscopy in cell biology. Nat. Rev. Mol. Cell Biol.18, 685–701 (2017). 10.1038/nrm.2017.71 - DOI - PubMed
    1. Daetwyler, S. & Fiolka, R. P. Light-sheets and smart microscopy, an exciting future is dawning. Commun. Biol.6, 1–11 (2023). 10.1038/s42003-023-04857-4 - DOI - PMC - PubMed
    1. Laissue, P. P., Alghamdi, R. A., Tomancak, P., Reynaud, E. G. & Shroff, H. Assessing phototoxicity in live fluorescence imaging. Nat. Methods14, 657–661 (2017). 10.1038/nmeth.4344 - DOI - PubMed
    1. Minaee, S. et al. Image segmentation using deep learning: A survey. IEEE Trans. Pattern Anal. Mach. Intell.44, 3523–3542 (2022). - PubMed
    1. Ouyang, W., Aristov, A., Lelek, M., Hao, X. & Zimmer, C. Deep learning massively accelerates super-resolution localization microscopy. Nat. Biotechnol.36, 460–468 (2018). 10.1038/nbt.4106 - DOI - PubMed

LinkOut - more resources