Sage Journals: Discover world-class research

Abstract

In recent years, the transformer model (the neural network behind ChatGPT) garnered attention for its capabilities over convolution-based neural networks (CNNs) in natural language processing (NLP) tasks. Their success is attributed to their ability to learn long-range dependencies and spatial correlations that retain contextual information. As an extension, the application of transformers to images (called vision transformers, or ViT in short) was introduced and was shown to achieve impressive results over CNNs for various computer vision tasks. Deep learning (DL) networks have been used for various medical image analysis tasks that are based mostly on CNNs. Researchers have conducted studies on ViTs for the medical image analysis and have found that their results are comparable and sometimes exceed those of CNNs. However, there were few areas where ViTs demonstrated low performance compared with CNNs. This review article analyzes whether ViTs have the potential to replace the current state-of-the-art CNNs in various medical imaging tasks in oncology. We discuss various cancer studies that use both CNNs and ViTs, both individually and their hybrid forms, their performances, and their merits and demerits, and we provide potential solutions to various drawbacks found in ViTs and finally future research directions. For the benefit of analysis, this review considers four cancer types, namely, skin cancer, lung cancer, breast cancer, and prostate cancer.

Get full access to this article

View all access options for this article.

References

Siegel

, Giaquinto

, Jemal

. Cancer statistics, 2024. CA Cancer J Clin, 2024; 74(2):203; doi: 10.3322/caac.21820

Zhao

. Skin cancer classification based on convolutional neural networks and vision transformers. J Phys: Conf Ser, 2022; 2405(1):012037; doi: 10.1088/1742-6596/2405/1/012037

Uparkar

, Bharti

, Pateriya

, et al. Vision transformer outperforms deep convolutional neural network-based model in classifying X-ray images. Procedia Comput. Sci, 2023; 218:2338–2349; doi: 10.1016/j.procs.2023.01.209

Dosovitskiy

, Beyer

, Kolesnikov

, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv, 2020:arXiv:2010.11929v2; doi: 10.48550/arXiv.2010.11929

Azad

, Kazerouni

, Heidari

, et al. Advances in medical image analysis with vision transformers: A comprehensive review. Med Image Anal, 2024; 91:103000; doi: 10.48550/arXiv.2301.03505

Wang

, Khalil

, Firdi

. A survey on deep learning for precision oncology. Diagnostics (Basel), 2022; 12(6):1489; doi: 10.3390/diagnostics12061489

Pesapane

, Trentin

, Ferrari

, et al. Deep learning performance for detection and classification of microcalcifications on mammography. Eur Radiol Exp, 2023; 7(1):69; doi: 10.1186/s41747-023-00384-3

Oyelade

, Irunokhai

, Wang

. A twin convolutional neural network with hybrid binary optimizer for multimodal breast cancer digital image classification. Sci Rep, 2024; 14(1):692; doi: 10.1038/s41598-024-51329-8

Jabeen

, Khan

, Alhaisoni

, et al. Breast cancer classification from ultrasound images using probability-based optimal deep learning feature fusion. Sensors, 2022; 22(3):807; doi: 10.3390/s22030807

10.

Maqsood

, Damaševičius

, Maskeliūnas

. TTCNN: A breast cancer detection and classification towards computer-aided diagnosis using digital mammography in early stages. Appl. Sci, 2022; 12(7):3273; doi: 10.3390/app12073273

11.

Hossain

, Azam

, Montaha

, et al. Automated breast tumor ultrasound image segmentation with hybrid UNet and classification using fine-tuned CNN model. Heliyon, 2023; 9(11):e21369; doi: 10.1016/j.heliyon.2023.e21369

12.

Fraiwan

, Faouri

. On the automatic detection and classification of skin cancer using deep transfer learning. Sensors (Basel), 2022; 22(13):4963; doi: 10.3390/s22134963

13.

Shetty

, Fernandes

, Rodrigues

, et al. Skin lesion classification of dermoscopic images using machine learning and convolutional neural network. Sci Rep, 2022; 12(1):18134; doi: 10.1038/s41598-022-22644-9

14.

Jaisakthi

, Mirunalini

, Aravindan

, et al. Classification of skin cancer from dermoscopic images using deep neural network architectures. Multimed Tools Appl, 2023; 82(10):15763–15778; doi: 10.1007/s11042-022-13847-3

15.

Bhimavarapu

, Battineni

. Skin lesion analysis for melanoma detection using the novel deep learning model fuzzy GC-SCNN. Healthcare (Basel), 2022; 10(5):962; doi: 10.3390/healthcare10050962

16.

Qureshi

, Roos

. Transfer learning with ensembles of deep neural networks for skin cancer detection in imbalanced data sets. arXiv, 2024:arXiv:2103.12068v4; doi: 10.48550/arXiv.2103.12068

17.

Azeem

, Kiani

, Mansouri

, et al. SkinLesNet: Classification of skin lesions and detection of melanoma cancer using a novel multi-layer deep convolutional neural network. Cancers (Basel), 2023; 16(1):108; doi: 10.3390/cancers16010108

18.

Hossain

, Hossain

, Arefin

, et al. Combining state-of-the-art pre-trained deep learning models: A noble approach for skin cancer detection using max voting ensemble. Diagnostics, 2023; 14(1):89; doi: 10.3390/diagnostics14010089

19.

Shimazaki

, Ueda

, Choppin

, et al. Deep learning-based algorithm for lung cancer detection on chest radiographs using the segmentation method. Sci Rep, 2022; 12(1):727; doi: 10.1038/s41598-021-04667-w

20.

Shah

, Malik

HAM

, Muhammad

, et al. Deep learning ensemble 2D CNN approach towards the detection of lung cancer. Sci Rep, 2023; 13(1):2987; doi: 10.1038/s41598-023-29656-z

21.

Wankhade

, Vigneshwari

. A novel hybrid deep learning method for early detection of lung cancer using neural networks. Healthc Anal (NY), 2023; 3:100195; doi: 10.1016/j.health.2023.100195

22.

Tsivgoulis

, Papastergiou

, Megalooikonomou

. An improved SqueezeNet model for the diagnosis of lung cancer in CT scans. Mach Learn Appl 15 December 2022; 10:100399; doi: 10.1016/j.mlwa.2022.100399

23.

Zhang

, Zhang

. LungSeek: 3D Selective Kernel residual network for pulmonary nodule diagnosis. Vis Comput, 2023; 39(2)pages:679–692; doi: 10.1007/s00371-021-02366-1

24.

, Yang

, Zhang

, et al. Convolutional neural networks for medical image analysis: State-of-the-art, comparisons, improvement and perspectives. Neurocomputing, 2021; 444:92–110; doi: 10.1109/TMI.2016.2535302

25.

Willemink

, Roth

, Sandfort

. Toward foundational deep learning models for medical imaging in the new era of transformer networks. RSNA, 2022; doi: 10.1148/ryai.210284

26.

Gulzar

, Khan

. Skin lesion segmentation based on vision transformers and convolutional neural networks—A Comparative Study. Appl. Sci, 2022; 12(12):5990; doi: 10.3390/app12125990

27.

Maurício

, Domingues

, Bernardino

. Comparing vision transformers and convolutional neural networks for image classification: A literature review. Appl. Sci, 2023; 13(9):5521; doi: 10.3390/app13095521

28.

Akinyelu

, Zaccagna

, Grist

, et al. Brain tumor diagnosis using machine learning, convolutional neural networks, capsule neural networks and vision transformers, applied to MRI: A Survey. J Imaging, 2022; 8(8):205; doi: 10.3390/jimaging8080205

29.

Henry

, Emebob

, Omonhinmin

, et al. Vision transformers in medical imaging: A review. arXiv, 2022:arXiv:2211.10043v1; doi: 10.48550/arXiv.2211.10043

30.

Xin

, Liu

, Zhao

, et al. An improved transformer network for skin cancer classification. Comput Biol Med, 2022; 149:105939; doi: 10.1016/j.compbiomed.2022.105939

31.

Murphy

, Venkatesh

, Sulam

, et al. Transformers and convolutional neural networks for disease classification on radiographs for thoracic disease and extremity abnormalities. RSNA, 2022; doi: 10.1148/ryai.220012

32.

Vaswani

, Shazeer

, Parmar

, et al. Attention is all you need. arXiv, 2023:arXiv:1706.03762v7; doi: 10.48550/arXiv.1706.03762

33.

Al-Hammuri

, Gebali

, Kanan

, et al. Vision transformer architecture and applications in digital health: A tutorial and survey, Vis Comput Ind. Biomed and Art, 2023; 6:14; doi: 10.1186/s42492-023-00140-9

34.

Xiong

, Smith

, Graves

, et al. Head and neck cancer segmentation in FDG PET images: Performance comparison of convolutional neural networks and vision transformers. Tomography, 2023; 9(5):1933–1948; doi: 10.3390/tomography9050151

35.

, Wang

, et al. MUSIQ: Multi-scale image quality transformer. arXiv, 2021:arXiv:2108.05997v1; doi: 10.48550/arXiv.2108.05997

36.

Stassin

, Corduant

, Ahmed

, et al. Explainability and evaluation of vision transformers: An in-depth experimental study. Electronics, 2023; 13(1):175; doi: 10.3390/electronics13010175

37.

Lai

. Interpretable medical imagery diagnosis with self-attentive transformers: A review of explainable AI for healthcare. arXiv, 2023; doi: 10.48550/arXiv.2309.00252

38.

Said

, Alsheikhy

, Shawly

, et al. Medical images segmentation for lung cancer diagnosis based on deep learning architectures. Diagnostics, 2023; 13(3):546; doi: 10.3390/diagnostics13030546

39.

Ayana

, Dese

, Dereje

, et al. Vision-transformer-based transfer learning for mammogram classification. Diagnostics (Basel), 2023; 13(2):178; doi: 10.3390/diagnostics13020178

40.

Zhang

, Zhang

, Liu

, et al. SaTransformer: Semantic-aware transformer for breast cancer classification and segmentation. IET Image Process, 2023; 17(13):3789–3800; doi: 10.1049/ipr2.12897

41.

Sarker

, Sarker

, Bebis

, et al. MV-Swin-T: Mammogram classification with multi-view swin transformer. arXiv, 2024:arXiv:2402.16298v1; doi: 10.48550/arXiv.2402.16298

42.

Zhou

, Mosadegh

. Distilling knowledge from an ensemble of vision transformers for improved classification of breast ultrasound. Acad Radiol, 2024; 31(1):104–120; doi: 10.1016/j.acra.2023.08.006

43.

, Li

, Yang

. BreastSAM: A study of segment anything model for breast tumor detection in ultrasound images. arXiv, 2023:arXiv:2305.12447v1; doi: 10.48550/arXiv.2305.12447

44.

Hassan

, Hamad

, Mahar

. YOLO-based CAD framework with ViT transformer for breast mass detection and classification in CESM and FFDM images. Neural Comput & Applic, 2024; 36(12):6467–6496; doi: 10.1007/s00521-023-09364-5

45.

Shen

, Park

, Yeung

, et al. Leveraging transformers to improve breast cancer classification and risk assessment with multi-modal and longitudinal data. arXiv, 2023:arXiv:2311.03217v2; doi: 10.48550/arXiv.2311.03217

46.

Dong

, Wang

, Li

. TC-Net:Dual coding network of Transformer and CNN for skin lesion segmentation. PLoS One, 2022; 17(11):e0277578; doi: 10.1371/journal.pone.0277578

47.

Zhang

, Liu

, Hu

. TransFuse: Fusing transformers and CNNs for medical image segmentation. arXiv, 2021:arXiv:2102.08005v2; doi: 10.48550/arXiv.2102.08005

48.

Nie

, Sommella

, Carratù

, et al. A deep CNN transformer hybrid model for skin lesion classification of dermoscopic images using focal loss. Diagnostics, 2022; 13(1):72; doi: 10.3390/diagnostics13010072

49.

Zhou

, Luo

. Deep features fusion with mutual attention transformer for skin lesion diagnosis. In: IEEE ICIP 2021, Anchorage, AK, USA; 2021, pp. 3797–3801, ISBN:978-1-6654-3102-6; doi: 10.1109/ICIP42928.2021.9506211

50.

Shehzad

, Zhenhua

, Shoukat

, et al. A deep-ensemble-learning-based approach for skin cancer diagnosis. Electronics, 2023; 12(6):1342; doi: 10.3390/electronics12061342

51.

Cai

, Zhu

, Wu

, et al. A multimodal transformer to fuse images and metadata for skin disease classification. Vis Comput, 2022; 5:1–13; doi: 10.1007/s00371-022-02492-4

52.

Abbas

, Daadaa

, Rashid

, et al. Assist-Dermo: A lightweight separable vision transformer model for multiclass skin lesion classification. Diagnostics, 2023; 13(15):2531; doi: 10.3390/diagnostics13152531

53.

Ding

, Yi

, Li

, et al. HI-MViT: A lightweight model for explainable skin disease classification based on modified MobileViT. Digit Health, 2023; 9:20552076231207197; doi: 10.1177/20552076231207197

54.

Guo

, Terzopoulos

. A transformer-based network for anisotropic 3D medical image segmentation, Pattern Recognit (2021), Milan, Italy; 2021, pp. 8857–8861; doi: 10.1109/ICPR48806.2021.9411990

55.

Sun

, Pang

. Efficient lung cancer image classification and segmentation algorithm based on an improved Swin transformer. arXiv, 2022:arXiv:2207.01527v1; doi: 10.48550/arXiv.2207.01527

56.

Gai

, Xing

, Chen

, et al. Comparing CNN-based and transformer-based models for identifying lung cancer: Which is more effective? Multimed Tools Appl, 2023; 83(20):59253–59269; doi: 10.1007/s11042-023-17644-4

57.

Khademi

, Heidarian

, Afshar

, et al. Spatio-temporal hybrid fusion of CAE and SWIn transformers for lung cancer malignancy prediction. arXiv, 2023:arXiv:2210.15297v1; doi: 10.48550/arXiv.2210.15297

58.

Khan

, Lee

. Gene transformer: Transformers for the gene expression-based classification of lung cancer subtypes. arXiv, 2021:arXiv:2108.11833v3; doi: 10.48550/arXiv.2108.11833

59.

Pachetti

, Colantonio

. 3D-Vision-Transformer Stacking Ensemble for Assessing Prostate Cancer Aggressiveness from T2w Images. Bioengineering (Basel), 2023; 10(9):1015; doi: 10.3390/bioengineering10091015

60.

, Deng

, Zhong

, et al. Multi-view radiomics and deep learning modelling for prostate cancer detection based on multi-parametric MRI. Front Oncol, 2023; 13:1198899; doi: 10.3389/fonc.2023.1198899

61.

Hung

ALY

, Zheng

, Miao

, et al. CAT-Net: A cross-slice attention transformer model for prostate zonal segmentation in MRI. arXiv, 2022:arXiv:2203.15163v2; doi: 10.48550/arXiv.2203.15163

62.

Alzate-Grisales

, Mora-Rubio

, García-García

, et al. SAM-UNETR: Clinically Significant Prostate Cancer Segmentation Using Transfer Learning From Large Model. IEEE Access, 2023; 11:118217–118228; doi: 10.1109/ACCESS.2023.3326882

63.

Kim

, Khan

, Banerjee

. Systematic review of hybrid vision transformer architectures for radiological image analysis. J. Acm, 2018; 37(4):16; doi: 10.1101/2024.06.21.24309265

64.

Hatamizadeh

, Tang

, Nath

, et al. UNETR: Transformers for 3D medical image.segmentation. arXiv, 2023:2103.10504v3; doi: 10.48550/arXiv.2103.10504

65.

Liu

, Qu

, Chen

, et al. Transformer acceleration with dynamic sparse attention. IEEE Trans Comput, 2022:1–14; doi: 10.48550/arXiv.2110.11299

66.

Skandarani

, Jodoin

P-M

, Lalande

. GANs for medical image synthesis: An empirical study. arXiv, 2021:arXiv:2105.05318v2; doi: 10.48550/arXiv.2105.05318

67.

Sudhakar

, Prabhu

, Krishnakumar

, et al. Mitigating bias in visual transformers via targeted alignment. arVix, 2023:arXiv:2302.04358v1; doi: 10.48550/arXiv.2302.04358

68.

Park

, Kim

, et al. Federated split vision transformer for COVID-19 CXR Diagnosis using Task-Agnostic Training. arXiv, 2021:arXiv:2111.01338v2; doi: 10.48550/arXiv.2111.01338

69.

Xie

, Zhang

, Shen

, et al. CoTr: Efficiently bridging CNN and transformer for 3D medical image segmentation. arXiv, 2021:arXiv:2103.03024v1; doi: 10.48550/arXiv.2103.03024

70.

Heidari

, Kazerouni

, Soltany

, et al. Hierarchical multi-scale representations using transformers for medical image segmentation. arXiv, 2023:arXiv:2207.08518v2; doi: 10.48550/arXiv.2207.08518

71.

Touvron

, Cord

, Douze

, et al. Training data-efficient image transformers & distillation through attention. arXiv, 2021:arXiv:2012.12877v2; doi: 10.48550/arXiv.2012.12877

72.

Alotaibi

, Alafif

, Alkhilaiwi

, et al. ViT-DeiT: An ensemble model for breast cancer histopathological images classification. arXiv, 2022:1–6; doi: 10.1109/ICAISC56366.2023.10085467

Can Vision Transformers Be the Next State-of-the-Art Model for Oncology Medical Image Analysis?

Abstract

Get full access to this article

References