Hostname: page-component-68c7f8b79f-kbpd8 Total loading time: 0 Render date: 2025-12-18T11:31:54.585Z Has data issue: false hasContentIssue false

Artificial intelligence in breast cancer diagnosis: A systematic literature review

Published online by Cambridge University Press:  21 November 2025

Arslaan Javaeed*
Affiliation:
Oncology, University of Oxford , UK
Anna Schuh
Affiliation:
Oncology, University of Oxford , UK
*
Corresponding author: Arslaan Javaeed; Email: arslaanjavaeed@yahoo.com
Rights & Permissions [Opens in a new window]

Abstract

Breast cancer is the second leading cause of cancer-related deaths among women globally and the most prevalent cancer in women. Artificial intelligence (AI)-based frameworks have shown great promise in correctly classifying breast carcinomas, particularly those that may have been difficult to discern through routine microscopy. Additionally, mitotic number quantification utilizing AI technology is more accurate than manual counting. With its many advantages, such as improved accuracy, efficiency and consistency as shown in this literature review, AI has promise for significantly enhancing breast cancer diagnosis in the clinical world despite the paramount obstacles that must be addressed. Ongoing research and innovation are essential for overcoming these challenges and effectively harnessing AI’s transformative potential in breast cancer detection and assessment.

Topics structure

Topic(s)

Subtopic(s)

Information

Type
Review
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press

Impact statement

This article explores the impact of artificial intelligence (AI) on breast cancer (BC) diagnosis within the field of pathology. It examines several applications of AI in BC pathology and provides a succinct summary of the principal findings from multiple investigations. Incorporating AI with conventional pathology methods may enhance diagnostic accuracy and reduce preventable errors. Studies have shown the efficacy of AI in detecting invasive breast tumors by rapidly analyzing extensive whole-slide images. Advanced convolutional neural networks underpin these discoveries. AI-driven quantitative analysis has facilitated the assessment of an individual’s hormonal status, which is crucial for determining the appropriate BC treatment. This is due to its facilitation of consensus among many observers regarding their findings. AI has the potential to become essential for assessing BC and quantifying mitotic cells, as it can accurately classify moderate-grade breast carcinomas. Furthermore, the utilization of AI for measuring mitotic numbers has proven to be more precise and sensitive than manual methods, resulting in enhanced predictive outcomes. In the context of triple-negative BC, to maximize the benefits of AI in BC pathology, it is essential to address issues such as the necessity for comprehensive annotations and the challenges of differentiation. Despite existing challenges, AI’s numerous contributions to BC pathology indicate a promising future characterized by enhanced accuracy, efficiency and uniformity. It is imperative that we continue researching and developing novel approaches to address these challenges and fully harness AI’s promise to revolutionize BC detection and assessment.

Introduction

History of artificial intelligence

Artificial intelligence (AI) refers to the utilization of technology and computers to imitate human-like cognitive processes and intelligent actions (Försch et al., Reference Försch, Klauschen, Hufnagl and Roth2021; Briganti, Reference Briganti2023). The history of computers can be traced back over 200 years, marked by several significant advancements. The exact year of the first computer’s invention is uncertain, but it is commonly attributed to 1822, when Charles Babbage introduced a design for a functional computer on paper (Grzybowski et al., Reference Grzybowski, Pawlikowska–Łagód and Lambert2024). The history of AI spans several decades (Muthukrishnan et al., Reference Muthukrishnan, Maleki, Ovens, Reinhold, Forghani and Forghani2020), beginning in the 1950s with Alan Turing’s research on the feasibility of intelligent machines, culminating in his landmark 1950 article (Turning, Reference Turning1950; Kaul et al., Reference Kaul, Enslin and Gross2020). In 1956, the term “AI” was coined by John McCarthy (Anyoha et al., Reference Anyoha2017) during a conference at Dartmouth College, where the first AI program, “Logic Theorist,” was introduced (Moor, Reference Moor2006).

Concepts in AI

AI applications in medicine have evolved significantly due to advancements in machine learning (ML) and deep learning (DL) (Lanzagorta-Ortega et al., Reference Lanzagorta-Ortega, Carrillo-Pérez and Carrillo-Esper2022). AI-based models can assist in diagnosing diseases, forecasting therapy responses and promoting preventive medicine (Kaul et al., Reference Kaul, Enslin and Gross2020; Pettit et al., Reference Pettit, Fullem, Cheng and Amos2021).

Machine learning

The term “ML” was first introduced by Arthur Samuel in 1959, with applications in medicine emerging in the 1980s and 1990s (Brown, Reference Brown2021). As a subset of AI, ML uses algorithms to build models that can learn from data and help in segmentation, classification or make predictions (Jiang et al., Reference Jiang, Luo, Huang, Liu and Li2022). It is classified into three categories: supervised, unsupervised and reinforcement learning (Hosny et al., Reference Hosny, Parmar, Quackenbush, Schwartz and Aerts2018; Ono and Goto, Reference Ono and Goto2022). Supervised learning trains models with input and output data to predict outcomes and unsupervised learning analyzes unannotated data to discover patterns without predefined results; similarly, reinforcement learning involves learning through interactions with an environment, receiving rewards or penalties based on actions taken (Jovel and Greiner, Reference Jovel and Greiner2021; Lee et al., Reference Lee, Warner, Shaikhouni, Bitzer, Kretzler, Gipson, Pennathur, Bellovich, Bhat, Gadegbeku, Massengill, Perumal, Saha, Yang, Luo, Zhang, Mariani, Hodgin and Rao2022 Jiang et al., Reference Jiang, Luo, Huang, Liu and Li2022; Al-Hamadani et al., Reference Al-Hamadani, Fadhel, Alzubaidi and Harangi2024).

Deep learning

DL is a subset of ML using artificial neural networks (ANNs) with multiple layers (Sarker, Reference Sarker2021) effective in complex tasks and large datasets (Sidey-Gibbons and Sidey-Gibbons, Reference Sidey-Gibbons and Sidey-Gibbons2019; Birhane et al., Reference Birhane, Kasirzadeh, Leslie and Wachter2023). Neural networks are designed like biological neurological systems based on the fundamental unit perceptron or neuron, and usually comprise of an input, hidden and output layers (Kriegeskorte and Golan, Reference Kriegeskorte and Golan2019). Deep neural networks (DNNs) are advanced models with multiple hidden layers used in healthcare for medical imaging and diagnostics (Baji’et al., Reference Bajić, Orel and Habijan2022; Egger et al., Reference Egger, Gsaxner, Pepe, Pomykala, Jonske, Kurz, Li and Kleesiek2022). Some ANNs have no hidden layer while DNNs have multiple, enabling them to understand complex behaviors (Kufel et al., Reference Kufel, Bargieł-Łączek, Kocot, Koźlik, Bartnikowska, Janik, Czogalik, Dudek, Magiera, Lis, Paszkiewicz, Nawrat, Cebula and Gruszczyńska2023). Convolutional neural networks (CNNs) are specifically designed for image recognition and classification tasks (Alajanbi et al., Reference Alajanbi, Malerba and Liu2021). Recently, these models have shown great potential for accurate diagnoses such as diabetic retinopathy from retinal images (Ragab et al., Reference Ragab, Al-Ghamdi, Fakieh, Choudhry, Mansour and Koundal2022).

AI in medicine

ML, a key technology in AI, is used across various medical specialties, including oncology, cardiology and neurology (Bitkina et al., Reference Bitkina, Park and Kim2023). AI applications include screening, diagnosis, treatment, drug development (Xu et al., Reference Xu, Yang, Arikawa and Bai2023), genomic analysis, patient monitoring and wearable health technology (Shajari et al., Reference Shajari, Kuruvinashetti, Komeili and Sundararaj2023). Additionally, AI enhances doctor–patient interactions, enables remote therapy and manages large datasets (Shajari et al., Reference Shajari, Kuruvinashetti, Komeili and Sundararaj2023; Chen and Decary Reference Chen and Decary2020; Basu et al. Reference Basu, Sinha, Ong and Basu2020). Integrating AI into healthcare can significantly improve the effectiveness, accuracy and personalization of medical diagnoses and treatments (Alowais et al., Reference Alowais, Alghamdi, Alsuhebany, Alqahtani, Alshaya, Almohareb, Aldairem, Alrashed, Bin Saleh, Badreldin and Al Yami2023).

AI in cancer diagnosis

AI has the potential to significantly advance cancer diagnosis by using annotated medical data, advanced ML techniques and enhanced processing power (Sufyan et al., Reference Sufyan, Shokat and Ashfaq2023; Alshuhri et al., Reference Alshuhri, Al-Musawi, Al-Alwany, Uinarni, Rasulova, Rodrigues, Alkhafaji, Alshanberi, Alawadi and Abbas2024). These developments are expected to transform patient care by improving efficiency, accuracy and customization in diagnoses and treatments (Chen and Decary, Reference Chen and Decary2020). Over the past decade, DL architectures have outperformed traditional ML methods in cancer diagnosis, effectively utilizing genomic and phenotype data for cancer classification and treatment (Miotto et al., Reference Miotto, Wang, Wang, Jiang and Dudley2018). Computer-aided detection and diagnosis (CADx) are playing important roles in clinical imaging and are expected to further improve (He et al., Reference He, Chen, Tan, Elingarami, Liu, Li, Deng, He, Li, Fu and Li2020; Jairam and Ha, Reference Jairam and Ha2022). AI technology has the potential to improve the accuracy of clinical image analysis for identifying cancer progression, aiding in the early detection and diagnosis, while medical imaging remains essential for early identification and monitoring of cancer (Suberi et al., Reference Suberi, Zakaria and Tomari2017; Liu et al., Reference Liu, Chi, Li, Li, Liang, Liu, Wang and He2020; Lathwal et al., Reference Lathwal, Kumar, Arora and Raghava2020).

AI in breast cancer pathology

Breast cancer (BC) is the most diagnosed cancer among women and the second leading cause of cancer-related deaths worldwide (Watkins et al., Reference Watkins2019), with approximately 2.3 million new cases and 6,85,000 fatalities reported in 2020 cancer (Hanna et al., Reference Hanna and Pantanowitz2017; Nardin et al., Reference Nardin, Mora, Varughese, D’Avanzo, Vachanaram, Rossi, Saggia, Rubinelli and Gennari2020). It represents 25% of all newly diagnosed cancer cases, and projections suggest an increase to nearly 2.96 million cases by 2040 (Sedeta et al., Reference Sedeta, Jobre and Avezbakiyev2023). Accurate histopathological diagnosis is crucial, as it confirms the presence of tumor cells and helps classify the type and grade of cancer (Cardoso et al., Reference Cardoso, Kyriakides, Ohno, Penault-Llorca, Poortmans, Rubio, Zackrisson and Senkus2019). Discrepancies in diagnoses can significantly affect treatment decisions, highlighting the need for precise pathologic assessments (Soliman et al., Reference Soliman, Li and Parwani2024).

Despite advancements in imaging-based diagnostics and therapies, the field of histopathology has been slow to digitize, beginning this process only about two decades ago (Hanna et al., Reference Hanna, Reuter, Hameed, Tan, Chiang, Sigel, Hollmann, Giri, Samboy, Moradel, Rosado, Otilano, England, Corsale, Stamelos, Yagi, Schüffler, Fuchs, Klimstra and Sirintrapun2019; Försch et al., Reference Försch, Klauschen, Hufnagl and Roth2021). Histopathological diagnosis has remained largely unchanged, still relying on microscopic evaluations by pathologists. This reliance can lead to errors, such as false positives or negatives, especially under stress (Morelli et al., Reference Morelli, Porazzi, Ruspini, Restelli and Banfi2013; Cohen et al., Reference Cohen, Martin, Gross, Johnson, Robboy, Wheeler, Johnson and Black-Schaffer2022). Studies have shown significant variability in pathologists’ assessments, with a concordance rate of only 75.3% overall and a particularly low rate of 48% for ductal carcinoma in situ (DCIS) and atypical hyperplasia, indicating ambiguity in pathology interpretations (Elmore et al., Reference Elmore, Longton, Carney, Geller, Onega, Tosteson, Nelson, Pepe, Allison, Schnitt, O’Malley and Weaver2015).

The use of AI in cancer diagnosis is essential for improving diagnostic processes and addressing the shortage of pathologists alongside the increasing number of cancer cases (Robboy et al., Reference Robboy, Gross, Park, Kittrie, Crawford, Johnson, Cohen, Karcher, Hoffman, Smith and Black-Schaffer2020). AI in pathology relies on whole-slide imaging (WSI) technology (Niazi et al., Reference Niazi, Parwani and Gurcan2019; Ahn et al., Reference Ahn, Shin, Yang, Park, Kim, Cho, Ock and Kim2023), which converts physical pathological slides into high-resolution digital images. These images are an abundant source of information – with sizes up to 1,00,000 × 1,00,000 pixels and are the first step in creating AI-assisted models (Mukhopadhyay et al., Reference Mukhopadhyay, Feldman, Abels, Ashfaq, Beltaifa, Cacciabeve, Cathro, Cheng, Cooper, Dickey, Gill, Heaton, Kerstens, Lindberg, Malhotra, Mandell, Manlucu, Mills, Mills, Moskaluk and Taylor2018; Niazi et al., Reference Niazi, Parwani and Gurcan2019). WSI aids in the easy sharing and consultation of images, reducing interpretation errors and enhancing the analysis of complex cases (Hanna et al., Reference Hanna, Reuter, Hameed, Tan, Chiang, Sigel, Hollmann, Giri, Samboy, Moradel, Rosado, Otilano, England, Corsale, Stamelos, Yagi, Schüffler, Fuchs, Klimstra and Sirintrapun2019; Jones et al., Reference Jones, Nazarian, Duncan, Kamionek, Lauwers, Tambouret, Wu, Nielsen, Brachtel, Mark, Sadow, Grabbe and Wilbur2015). Overall, WSI presents a promising alternative to conventional microscopic examination, which has limitations due to its ephemeral nature (Mukhopadhyay et al., Reference Mukhopadhyay, Feldman, Abels, Ashfaq, Beltaifa, Cacciabeve, Cathro, Cheng, Cooper, Dickey, Gill, Heaton, Kerstens, Lindberg, Malhotra, Mandell, Manlucu, Mills, Mills, Moskaluk and Taylor2018; Tizhoosh and Pantanowitz, Reference Tizhoosh and Pantanowitz2018; Hanna et al., Reference Hanna, Reuter, Hameed, Tan, Chiang, Sigel, Hollmann, Giri, Samboy, Moradel, Rosado, Otilano, England, Corsale, Stamelos, Yagi, Schüffler, Fuchs, Klimstra and Sirintrapun2019).

Digitalization in pathology has been slower compared to other medical specialties, largely due to pathologists’ reluctance to abandon traditional methods and the presence of barriers like regulatory and cost issues as well as “pathologist technophobia” (Hanna et al., Reference Hanna, Reuter, Hameed, Tan, Chiang, Sigel, Hollmann, Giri, Samboy, Moradel, Rosado, Otilano, England, Corsale, Stamelos, Yagi, Schüffler, Fuchs, Klimstra and Sirintrapun2019; Hanna and Pantanowitz 2019; Moxley-Wyles et al., Reference Moxley-Wyles, Colling and Verrill2020; Försch et al., Reference Försch, Klauschen, Hufnagl and Roth2021). Despite these challenges, AI has shown promise in enhancing diagnostic capabilities. Studies indicate that digital slide reviewing is as effective as manual methods (Loughrey et al., Reference Loughrey, Kelly, Houghton, Coleman, Carson, Salto-Tellez and Hamilton2015; Elmore et al., Reference Elmore, Barnhill, Elder, Longton, Pepe, Reisch, Carney, Titus, Nelson, Onega, Tosteson, Weinstock, Knezevich and Piepkorn2017; Tabata et al., Reference Tabata, Mori, Sasaki, Itoh, Shiraishi, Yoshimi, Maeda, Harada, Taniyama, Taniyama, Watanabe, Mikami, Sato, Kashima, Fujimura and Fukuoka2017). AI algorithms have been developed for detecting and classifying BC, achieving high accuracy in differentiating between benign and malignant tumors (Cruz-Roza et al., Reference Cruz-Roa, Gilmore, Basavanhally, Feldman, Ganesan, Shih, Tomaszewski, González and Madabhushi2017). A DL model differentiated benign and malignant tumors when tested on eight categories of images, four benign and four malignant, with an accuracy of 93.2% (Han et al., Reference Han, Wei, Zheng, Yin, Li and Li2017). The breast cancer histology (BACH) challenge demonstrated that AI could achieve accuracy levels comparable to pathologists, improving overall diagnostic performance and interobserver concordance (Polonia et al., Reference Polónia, Campelos, Ribeiro, Aymore, Pinto, Biskup-Fruzynska, Veiga, Canas-Marques, Aresta, Araújo, Campilho, Kwok, Aguiar and Eloy2021). Additionally, DL models have successfully identified markers in BC and utilized nuclear characteristics to predict risk categories for patients (Romo-Bucheli et al., Reference Romo-Bucheli, Janowczyk, Gilmore, Romero and Madabhushi2016; Lu et al., Reference Lu, Romo-Bucheli, Wang, Janowczyk, Ganesan, Gilmore, Rimm and Madabhushi2018; Whitney et al., Reference Whitney, Corredor, Janowczyk, Ganesan, Doyle, Tomaszewski, Feldman, Gilmore and Madabhushi2018).

For example, the visual assessment of mitotic figures in BC histological sections stained with hematoxylin and eosin (H&E), referred to as the mitotic score, serves as the gold standard method for evaluating the proliferative activity of BC (Aleskandarany et al., Reference Aleskandarany, Green, Benhasouna, Barros, Neal, Reis-Filho, Ellis and Rakha2012; van Dooijeweert et al., Reference van Dooijeweert, van Diest and Ellis2021) and pathologists face challenges in manually counting mitosis in histopathology slides, a process that is time-consuming (Cree et al., Reference Cree, Tan, Travis, Wesseling, Yagi, White, Lokuhetty and Scolyer2021). To address this, various contests such as the MITOSIS detection contest and others have facilitated advancements in automated counting methods (Aubreville et al., Reference Aubreville, Stathonikos, Donovan, Klopfleisch, Ammeling, Ganz, Wilm, Veta, Jabari, Eckstein, Annuscheit, Krumnow, Bozaba, Çayır, Gu, Chen, Jahanifar, Shephard, Kondo, Kasai, Kotte, Saipradeep, Lafarge, Koelzer, Wang, Zhang, Yang, Wang, Breininger and Bertram2024). Recent studies have demonstrated that DL models can accurately count mitotic figures from H&E-stained slides of early stage ER-positive breast tumors, significantly reducing the time required for pathologists to read slides (Roux et al., Reference Roux, Racoceanu, Loménie, Kulikova, Irshad, Klossa, Capron, Genestie, Le Naour and Gurcan2013; ICPR, 2014; Romo-Bucheli et al., Reference Romo-Bucheli, Janowczyk, Gilmore, Romero and Madabhushi2017; Veta et al., Reference Veta, Heng, Stathonikos, Bejnordi, Beca, Wollmann, Rohr, Shah, Wang, Rousson, Hedlund, Tellez, Ciompi, Zerhouni, Lanyi, Viana, Kovalev, Liauchuk, Phoulady, Qaiser and Pluim2019). Notably, algorithms utilizing advanced architectures like ResNet-101, and Faster R-CNN have shown high accuracy, with one approach reducing reading time by 27.8% (Pantanowitz et al., Reference Pantanowitz, Hartman, Qi, Cho, Suh, Paeng, Dhir, Michelow, Hazelhurst, Song and Cho2020). Furthermore, a comprehensive automated system developed by Nateghi et al. (Reference Nateghi, Danyali and Helfroush2021) can identify regions of interest for high mitotic activity, count mitoses from WSI, and predict tumor proliferation scores, outperforming previous methods. Although these AI models have yet to be deployed in formal clinical practice, the integration of digitalization and AI in pathology has the potential to enhance accuracy, reduce human errors and optimize the time needed for pathologists to review slides, ultimately benefiting both pathologists and patients (Aeffner et al., Reference Aeffner, Zarella, Buchbinder, Bui, Goodman, Hartman, Lujan, Molani, Parwani, Lillard, Turner, Vemuri, Yuil-Valdes and Bowman2019; Kim et al., Reference Kim, Kang, Song and Kim2022).

This systematic review aims to examine AI models and their effectiveness in diagnosing BC, taking into account existing problems in the field of pathological diagnosis as well as the potential advantages of incorporating AI. Additionally, it investigates the potential of AI models to offer second opinions and their integration into the pathology workflow.

Methodology

Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement

This review process follows the guidelines set out in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement, which was first developed in 2009 and updated in 2020. PRISMA functions as a framework specifically created to standardize the process of conducting systematic reviews and improve the thoroughness of their reporting (Page et al., Reference Page, McKenzie, Bossuyt, Boutron, Hoffmann, Mulrow, Shamseer, Tetzlaff, Akl, Brennan, Chou, Glanville, Grimshaw, Hróbjartsson, Lalu, Li, Loder, Mayo-Wilson, McDonald, McGuinness, Stewart, Thomas, Tricco, Welch, Whiting and Moher2021).

Search strategy

A comprehensive literature search was carried out across three electronic databases: PubMed, EMBASE and Cochrane Library, to identify original articles that met the specified inclusion and exclusion criteria and were published up to April 2024. The search involved the use of keywords, their synonyms and Boolean operators, as illustrated in Figure 1. Additionally, the bibliographies of all relevant articles were meticulously reviewed to review additional studies that could potentially be included in the analysis. The titles were meticulously screened using the predetermined inclusion criteria, focusing on studies that assessed the application of AI in BC diagnosis. Notably, no restrictions were placed on publication year, country of origin or age of the studies. Subsequently, a full-text screening was carried out to include the most pertinent research papers for subsequent data extraction and analysis.

Figure 1. Boolean search with keywords and their synonyms.

Selection process

The literature search results were screened in a two-step process. Initially, the titles and abstracts of all articles were assessed for eligibility. After identifying relevant articles, full-text screening was conducted for the studies that met the eligibility criteria. The screening was done by applying the predefined inclusion criteria.

Eligibility criteria

All research papers using AI in BC diagnosis or staging in comparison with pathologists’ report or known datasets as a reference test were included. All types of original studies (either prospective or retrospective) containing their own data on AI validation or development and validation were included. The studies published in English language were included. We further excluded other types of publications such as reviews, single cases, editorial material, books, comments and papers in languages other than English, Articles dealing with other malignancies, articles that used AI to analyze data other than histological or cytological images (e.g., MRI, mammography) and articles with non-AI approaches for diagnosis (i.e., slide flow).

Inclusion criteria

The study included

  • BC diagnosis based on histopathology.

  • AI models and their diagnostic accuracy, sensitivity, specificity and area under the curve (AUC) in BC diagnosis.

  • Potential of AI models to be integrated in regular practice, provide second opinion in BC diagnosis.

Exclusion criteria

  • Following studies were excluded.

  • Articles used AI to analyze data other than histological or cytological images (e.g., MRI, mammography).

  • Articles with non-AI approaches for diagnosis (i.e., slide flow).

  • Review papers, case studies, editorials, book chapters and commentary.

  • Papers in languages other than English.

  • Articles dealing with other malignancies.

Data from the included studies was extracted and recorded in a standardized data extraction sheet. The extracted data encompassed two main categories: (1) characteristics of the included studies and (2) outcome measures.

Results

Literature search and screening

A thorough search of the three main databases, namely PubMed, Cochrane CENTRAL and EMBASE, resulted in a total of 3113 records. After removing duplicates and irrelevant records, 1849 unique studies were assessed for eligibility based on their titles and abstracts. Over 1500 (1517) studies that did not meet our inclusion criteria were excluded. The abstracts of the remaining 332 articles were obtained for further evaluation. Application of predefined criteria led to the exclusion of 258 studies. The full text of 74 articles was reviewed out of which 47 were excluded due to the lack of relevant outcomes, inappropriate study design or poor quality. Finally, 27 studies were included in the systematic review. The flow diagram of the literature search and screening process is shown in Figure 2.

Figure 2. Flowchart describing the literature inclusion process.

Characteristics of included studies

Twenty-seven research papers were analyzed in a thorough literature analysis. These studies specifically investigated the development of AI algorithms for diagnosing BC using histopathology images. The training data for histopathology images used in these research studies showed substantial variety, with data obtained from 21 to more than 400 patients. The literature evaluation includes papers published from 2017 to 2024, originating from various geographical regions including Korea, India, Egypt, the Netherlands, Brazil, Spain, Australia, Turkey, Poland, Japan, China and Malaysia. The research employed a range of AI techniques, with CNNs being particularly prominent. These CNNs were typically used in combination with other methods such as support vector machines (SVMs) or ensemble learning approaches. The datasets used for training and validation purposes included BreakHis (Spanhol et al., Reference Spanhol, Oliveira, Petitjean and Heutte2016), BACH (Aresta et al., Reference Aresta, Araújo, Kwok, Chennamsetty, Safwan, Alex, Marami, Prastawa, Chan, Donovan, Fernandez, Zeineh, Kohl, Walz, Ludwig, Braunewell, Baust, Vu, To, Kim, Kwak, Galal, Sanchez-Freire, Brancati, Frucci, Riccio, Wang, Sun, Ma, Fang, Kone, Boulmane, Campilho, Eloy, Polónia and Aguiar2019) and (ICIAR, 2018 Grand Challenge), which are well-known. The pixel resolutions of the images varied in the investigations, which affected the level of detail and the potential effectiveness of the models. In addition, the performance metrics in the included studies showed variation, with accuracy, F1-score and AUC being the most reported. Table 1 provides a concise summary of the fundamental features of the research.

Table 1. Summary of studies categorizing breast lesions (i.e., benign vs. malignant)

AUC, area under the curve; BACH, breast cancer histology challenge; BreakHis, breast cancer histopathological database; CCM, correctly classified malignant; CCR, correct classification rate; CKD, covariance-kernel descriptor; CNNs, convolutional neural networks; DCNNs, deep CNNs; DenseNet, dense neural network; FABCD, fully annotated breast cancer database; LERM, log-Euclidean Riemannian metric; MFSCNet, classification of mammary cancer fusing spatial and channel features network; MLP, multilayer perceptron; QDA, quadratic discriminant analysis; ResNet-152, deep residual network with 152 layers; RF, random forest; SVM, support vector machine; WAID, weakly annotated image descriptor; WSIs, whole-slide images.

Summary of findings

Breast lesion categorization

Researchers have developed innovative AI models for classifying breast lesions as benign or malignant, with the potential to identify histological subtypes (Singh et al., Reference Singh, Kumar, Gupta and Al-Turjman2024). The SegEIR-Net model combines segmentation and classification techniques using EfficientNet, InceptionNet and ResNet, achieving high accuracies on the BreakHis dataset (up to 98.66%) and strong results on the BACH and UCSB datasets (93.33% and 96.55%, respectively) (Singh et al., Reference Singh, Kumar, Gupta and Al-Turjman2024). Additionally, the Multilevel Context and Uncertainty aware (MCUa) model categorizes breast histology images into four types: normal tissue, benign lesion, in situ carcinoma and invasive carcinoma. The MCUa model demonstrated impressive performance, with static ensemble accuracy reaching 95.75% and dynamic accuracy reaching 98.11% on the BACH dataset, and outstanding results on the BreakHis dataset (up to 100% accuracy) (Senousy et al., Reference Senousy, Abdelsamea, Gaber, Abdar, Acharya, Khosravi and Nahavandi2022).

The context-aware stacked CNN (CAS-CNN) has demonstrated strong performance in classifying breast WSIs, achieving an AUC of 0.962 in distinguishing normal or benign slides from malignant ones (DCIS and IDC). The system exhibited a precision of 89.1% for categorizing WSIs and an overall accuracy of 81.3% in a three-class classification involving normal/benign, DCIS and IDC categories. While it effectively differentiated between normal/benign and IDC slides, it faced challenges in distinguishing between normal/benign and DCIS slides, as well as between DCIS and IDC slides (Bejnordi et al., Reference Bejnordi, Veta, Van Diest, Van Ginneken, Karssemeijer, Litjens, Van Der Laak, Hermsen, Manson, Balkenhol and Geessink2017). Additional details on studies classifying breast lesions are provided in Tables 13.

Table 2. Summary of studies categorizing breast lesions (i.e., normal/benign/in situ/invasive)

AUC, area under the curve; BACH, breast cancer histology challenge; BCBH, bioimaging challenge 2015 breast histology; BreakHis, breast cancer histopathological database; CCM, correctly classified malignant; CCR, correct classification rate; CKD, covariance-kernel descriptor; CNNs, convolutional neural networks; DCNNs, deep CNNs; DenseNet, dense neural network; FABCD, fully annotated breast cancer database; FCNs, fully convolutional networks; LERM, log-Euclidean Riemannian Metric; MCUa, multilevel context and uncertainty aware; MFSCNet, classification of mammary cancer fusing spatial and channel features network; MLP, multilayer perceptron; PDIs, phylogenetic diversity indices; QDA, quadratic discriminant analysis; ResNet-152, deep residual network with 152 layers; RF, random forest; SimCLR, simple framework for contrastive learning of visual representations; SOA, state-of-the-art; SSL, self-supervised learning; SVM, support vector machine; UCSB, University of California Santa Barbara; WAID, weakly annotated image descriptor; WSIs, whole-slide images; Xgboost, eXtreme Gradient Boost.

Table 3. Summary of studies assessing different histopathological subtypes of both benign and malignant breast lesions

AUC, area under the curve; BACH, breast cancer histology challenge; BCBH, bioimaging challenge 2015 breast histology; BreakHis, breast cancer histopathological database; CCM, correctly classified malignant; CCR, correct classification rate; CKD, covariance-kernel descriptor; CNNs, convolutional neural networks; DCNNs, deep CNNs; DenseNet, dense neural network; FABCD, fully annotated breast cancer database; FCNs, fully convolutional networks; LERM, log-Euclidean Riemannian Metric; MCUa, multilevel context and uncertainty aware; MFSCNet, classification of mammary cancer fusing spatial and channel features network; MLP, multilayer perceptron; PDIs, phylogenetic diversity indices; QDA, quadratic discriminant analysis; ResNet-152, deep residual network with 152 layers; RF, random forest; SimCLR, simple framework for contrastive learning of visual representations; SOA, state-of-the-art; SSL, self-supervised learning; SVM, support vector machine; UCSB, University of California Santa Barbara; WAID, weakly annotated image descriptor; WSIs, whole-slide images; Xgboost, eXtreme Gradient Boost.

Molecular subtyping

After the identification of a malignant breast lesion, immunohistochemistry is commonly performed to ascertain the molecular subtype. This procedure involves analyzing the levels of ER, PR, Her2 and the Ki67 mitotic index to determine the subtype, which can include luminal A, luminal B, Her2-enriched or triple negative, among other possibilities. The results of the studies regarding the use of AI models for molecular subtyping, with or without Ki67 measurement, have been summarized in Table 4. The AI models have exhibited an impressive accuracy rate (AUC of 0.75–0.91 vs. 0.67–0.8) when compared with conventional multiple instance learning models (Bae et al., Reference Bae, Jeon, Hwangbo, Yoo, Han and Feng2023) and approximately 90% for an automated BC classification system utilizing SVM (Aswathy and Jagannath Reference Aswathy and Jagannath2021).

Table 4. Summary of studies assessing breast cancer molecular subtyping (i.e., according to estrogen receptors (ER), progesterone receptors (PR) and Her2 – with or without ki67 mitotic index analysis)

AUC, area under the curve; BCBH, bioimaging challenge 2015 breast histology; BreakHis, breast cancer histopathological database; MLP, multilayer perceptron; SVM, support vector machine; UCSB, University of California Santa Barbara; WSIs, whole-slide images; Xgboost, eXtreme.

Mitotic index assessment and quantification

Several studies have aimed to create models for identifying the mitotic proliferation index (Ki67) in BC. A notable approach is the FMDet method, designed to detect mitotic rates in breast histopathology images while addressing the domain shift problem caused by variability across different scanners (Wang et al., Reference Wang, Zhang, Yang, Xiang, Luo, Wang, Zhang, Yang, Huang and Han2023). To enhance model applicability, two key strategies were implemented:

1. A novel data augmentation technique using Fourier analysis was introduced, which alters the frequency characteristics of training images to generate diverse samples that reflect real-world datasets. This involves replacing the low-frequency components of a source image with those from a reference image from a different domain.

2. Pixel-level annotations, specifically “instance masks,” were utilized for identifying mitotic figures. These masks, derived from the Mitosis Domain Generalization challenge (MIDOG) 2021 bounding box data and a pretrained nucleus segmentation model (HoVer-Net), improved detection accuracy by allowing the network to capture subtle morphological differences in mitotic cells, surpassing traditional bounding box annotations (Wang et al., Reference Wang, Zhang, Yang, Xiang, Luo, Wang, Zhang, Yang, Huang and Han2023). The model outperformed all other submissions in the challenge, with an F1 score of 0.77 (Wang et al., Reference Wang, Zhang, Yang, Xiang, Luo, Wang, Zhang, Yang, Huang and Han2023).

A DL model specifically targeted for identifying mitotic proliferation in breast histopathology images has shown considerable potential (Saha et al., Reference Saha, Chakraborty and Racoceanu2018. The precision, recall and F-score measures of this model were 92%, 88% and 90%, respectively. Significantly, the model’s performance improved when handcrafted features were integrated into its DL architecture. The model was trained and tested using datasets from MITOS-ATYPIA-14, ICPR-2012 and AMIDA-13. The study’s findings highlight the model’s high true positive rate, which indicates its precise ability to identify mitotic cells. Details of included studies focusing on training models for detecting and quantifying mitotic proliferation index have been tabulated in Table 5.

Table 5. Summary of studies assessing the Ki67 mitotic index

AUC, area under the curve; SimCLR, simple framework for contrastive learning of visual representations; SSL, self-supervised learning; WSIs, whole-slide images.

Obstacles to widespread adoption of AI

Despite the remarkable outcomes demonstrated by the examined AI models in research contexts, huge challenges and constraints persist that hinder the extensive use of AI on a broader scale (Soliman et al., Reference Soliman, Li and Parwani2024). A primary procedural drawback of supervised AI models is their need of extensive, annotated datasets for training. Manual annotation is labor-intensive and exhibits variability both among and across pathologists, undermining the fundamental objective of AI models. Similarly, the intraclass heterogeneity and dependence on binary categorization during training, as previously mentioned, constitute additional barriers that may impact the efficacy of AI models. Furthermore, the lack of a defined area size for assessing Ki67 may lead to either an overestimation or underestimation of Ki67 activity. Moreover, certain preanalytic variables, including suboptimal sample quality, air bubbles, staining artifacts, unexpected staining patterns and discrepancies in interlaboratory sample preparation and staining techniques, remain unavoidable to this day. Consequently, each AI tool necessitates validation and verification under these specific conditions. The identified variables may influence the accuracy of specimen analysis conducted by the AI model, resulting in incorrect outcomes. Additionally, significant economic and regulatory challenges persist regarding the implementation of AI technology in pathology laboratories on a global or national scale (van Diest et al., Reference van Diest, Flach, van Dooijeweert, Makineli, Breimer, Stathonikos, Pham, Nguyen and Veta2024). The concern regarding AI’s potential to replace pathologists raises ethical issues, as noted by Moxley-Wyles et al. (Reference Moxley-Wyles, Colling and Verrill2020) and van Diest et al. (Reference van Diest, Flach, van Dooijeweert, Makineli, Breimer, Stathonikos, Pham, Nguyen and Veta2024). Critics, comprising both individuals and governmental bodies, articulate concerns regarding the notion that a patient’s treatment may be dictated by an AI analysis. This is particularly concerning as most existing algorithms remain in the early stages of development, having been tested on limited populations and lacking adequate safety data. The broad adoption of this technology faces obstacles stemming from the diverse staining techniques employed across laboratories, the necessity for full digitalization prior to implementation, and the partial integration of Picture Archiving and Communication Systems with AI algorithms (van Diest et al., Reference van Diest, Flach, van Dooijeweert, Makineli, Breimer, Stathonikos, Pham, Nguyen and Veta2024). Addressing these concerns is crucial prior to the deployment of AI models in clinical practice.

Addressing limited sample size

A study aimed at improving BC histopathology image classification addressed the challenge of limited slide image datasets through data augmentation and transfer learning (Zhu et al., Reference Zhu, Song, Wang, Dong, Guo and Liu2019). The researchers developed a hybrid CNN that combines local and global visual inputs to capture detailed features and structure. They introduced a Squeeze-Excitation-Pruning block to reduce the model size without sacrificing accuracy. To enhance generalization, they used a multimodel assembly technique, training various models on different data subsets and merging their predictions. This approach proved more effective than a single-model systems and outperformed existing methods on the BACH dataset (87.5% patient-level and 84.4% image-level accuracy), indicating its potential for real-world clinical use (Zhu et al., Reference Zhu, Song, Wang, Dong, Guo and Liu2019).

Recent studies have explored the development of advanced models for classifying BC subtypes using histopathology images. A hybrid CNN-LSTM model achieved high accuracy in binary and multiclass classifications, with a binary accuracy ranging from 98.07% to 99.75% and multiclass accuracy from 88.04% to 96.5% (Srikantamurthy et al., Reference Srikantamurthy, Rallabandi, Dudekula, Natarajan and Park2023). Another study utilized a pretrained DenseNet-169 model on the BreakHis dataset, achieving 98.73% accuracy on the validation set and 94.55% on the test set, highlighting the importance of compatible-domain transfer learning – a method through which histological images are used in the pretraining of the model and then fine-tuned on a finite cytological target dataset (Shamshiri et al., Reference Shamshiri, Krzyżak, Kowal and Korbicz2023). Additionally, various studies emphasize the importance of feature selection and fusion to improve classification accuracy, noting that integrating DL features with handcrafted attributes can lead to better outcomes. Multiscale analysis, incorporating different image patch scales, also contributes to enhanced accuracy in classification tasks (Attallah et al., Reference Attallah, Anwar, Ghanem and Ismail2021; Liu et al., Reference Liu, Feng, Chen, Liu, Qu and Yang2022).

Training datasets

Another important consideration in the included studies is the choice of dataset and the use of transfer learning. Multiple studies have utilized the BreakHis dataset, which contains images of varying magnification levels, to advance BC diagnosis, despite challenges like small sample sizes and data variability (Gandomkar et al., Reference Gandomkar, Brennan and Mello-Thoms2018; Nahid et al., Reference Nahid, Mehrabi and Kong2018; Carvalho et al., Reference Carvalho, Antonio Filho, Silva, Araujo, Diniz, Silva, Paiva and Gattass2020). To address these issues, researchers have applied transfer learning techniques, initially training models on larger datasets like ImageNet before fine-tuning them on BreakHis (Kanavati et al. Reference Kanavati, Ichihara and Tsuneki2022; Laxmisagar and Hanumantharaju, Reference Laxmisagar and Hanumantharaju2022; Liu et al., Reference Liu, Hu, Tang, Wang, He, Zeng, Lin, He and Huo2022; Xu et al., Reference Xu, An, Zhang, Liu and Lu2022). Additionally, compatible-domain transfer learning has been shown to boost model performance (Shamshiri et al., Reference Shamshiri, Krzyżak, Kowal and Korbicz2023). Various approaches have been explored to make up for insufficient training datasets, including the combination of multiple classifiers and a self-trained AI algorithm utilizing a three-stage analysis technique with a WSI stacking system (Bae et al., Reference Bae, Jeon, Hwangbo, Yoo, Han and Feng2023). Overall, these efforts highlight the potential of DL and image analysis in enhancing BC diagnosis and prognosis.

Discussion

Commercially available AI models specifically designed for BC detection exist (Soliman et al., Reference Soliman, Li and Parwani2024). The mentioned entities comprise Mindpeak, Owkin, Visiopharm, Paige Breast Suite and IBEX Galen Breast. Mindpeak is located in Hamburg, Germany (Abele et al., Reference Abele, Tiemann, Krech, Wellmann, Schaaf, Länger, Peters, Donner, Keil, Daifalla, Mackens, Mamilos, Minin, Krümmelbein, Krause, Stark, Zapf, Päpper, Hartmann and Lang2023); Owkin is based in Paris, France; Visiopharm is situated in Hovedstaden, Denmark (Shafi et al., Reference Shafi, Kellough, Lujan, Satturwar, Parwani and Li2022); Paige Breast Suite is established in New York, United States and IBEX Galen Breast is positioned in Tel Aviv, Israel. These algorithms improve pathologists’ consistency, precision and sensitivity while decreasing time demands (Soliman et al., Reference Soliman, Li and Parwani2024). Nonetheless, various limitations persist in obstructing its extensive implementation on a larger scale. AI-driven specimen analysis now demonstrates specific procedural drawbacks, as observed by Soliman et al. (Reference Soliman, Li and Parwani2024). They also evaluated the contribution of AI in enhancing histological analysis of BC and to collectively assess the efficacy of each AI model, while also emphasizing potential limitations and downsides associated with each model.

Tissue classification

Most studies in this review evaluated the efficacy of AI in accurately classifying breast tissue specimens, demonstrating significant precision in distinguishing normal (or benign) tissues from malignant ones (Amin et al., Reference Amin and Ahn2023; Attallah et al., Reference Attallah, Anwar, Ghanem and Ismail2021; Gandomkar et al., Reference Gandomkar, Brennan and Mello-Thoms2018; Singh et al., Reference Singh, Kumar, Gupta and Al-Turjman2024). In the research conducted by Singh et al. (Reference Singh, Kumar, Gupta and Al-Turjman2024), the SegEIR-Net model achieved an accuracy surpassing 98% on the BreakHis dataset. Likewise, the Histo-CADx model introduced by Attallah et al. (Reference Attallah, Anwar, Ghanem and Ismail2021) has attained an accuracy of 99.54%. The initial study presenting the BreakHis dataset (Spanhol et al., Reference Spanhol, Oliveira, Petitjean and Heutte2016), aimed at evaluating AI model efficacy in tissue classification, attained an accuracy of approximately 97%. Han et al. (Reference Han, Wei, Zheng, Yin, Li and Li2017) conducted a study that attained an average accuracy of approximately 93% for eight-class classification (comprising four benign and four malignant categories) of the BreakHis dataset, in contrast to the traditional binary or ternary classification systems evaluated in most studies. The findings from these studies demonstrate that AI can continuously prove to be a dependable instrument for breast specimen classification.

Nonetheless, despite these encouraging results, many issues persist regarding tissue classification. For instance, there remains a notable heterogeneity in the datasets employed and the outcomes evaluated. Should research evaluate the efficacy of AI in distinguishing between normal and malignant? distinguishing benign from malignant? Perhaps, DCIS from invasive carcinoma (Amin et al., Reference Amin and Ahn2023)? In-class heterogeneity exists, and binary classification tasks for AI models (e.g., normal vs. abnormal) are frequently insufficient due to the presence of gray zones and unusual findings in clinical practice (Amin et al., Reference Amin and Ahn2023; Bejnordi et al., Reference Bejnordi, Veta, Van Diest, Van Ginneken, Karssemeijer, Litjens, Van Der Laak, Hermsen, Manson, Balkenhol and Geessink2017). Furthermore, certain unusual subtypes exist but may not be included in training and/or testing datasets (Hatta et al., Reference Hatta, Ichiuji, Mabu, Kugler, Hontani, Okoshi, Fuse, Kawada, Kido, Imamura, Naiki and Inai2023). The presence of in-class heterogeneity and imbalanced datasets complicates the assessment of AI’s accuracy in classifying each category, particularly the rare ones (Amin et al., Reference Amin and Ahn2023; Hatta et al., Reference Hatta, Ichiuji, Mabu, Kugler, Hontani, Okoshi, Fuse, Kawada, Kido, Imamura, Naiki and Inai2023). Similarly, Bejnordi et al. (Reference Bejnordi, Veta, Van Diest, Van Ginneken, Karssemeijer, Litjens, Van Der Laak, Hermsen, Manson, Balkenhol and Geessink2017) reported that although their model excelled in distinguishing benign from malignant tumors, it encountered difficulties with borderline categories such as DCIS. Additionally, Amin et al. (Reference Amin and Ahn2023) also documented similar findings.

Molecular subtyping

Current issues associated with manual molecular subtyping include frequent interpathologist discrepancies, particularly with Her2 status (Robbins et al., Reference Robbins, Fernandez, Han, Wong, Harigopal, Podoll, Singh, Ly, Kuba, Wen, Sanders, Brock, Wei, Fadare, Hanley, Jorns, Snir, Yoon, Rabe, Soong, Reisenbichler and Rimm2023). In contrast to specimen categorization, which relies mostly on visible microscopic features, subtyping is contingent upon supplementary technical factors, such as the extent of staining and/or manual enumeration in the computation of Ki67. Therefore, several studies are being conducted using different AI models to address the challenge of inconsistency (Abele et al., Reference Abele, Tiemann, Krech, Wellmann, Schaaf, Länger, Peters, Donner, Keil, Daifalla, Mackens, Mamilos, Minin, Krümmelbein, Krause, Stark, Zapf, Päpper, Hartmann and Lang2023).

Similar to the reported findings about AI’s performance in tissue classification, the studies assessing molecular subtyping in this review have likewise yielded promising outcomes. The data from the studies reported in this review indicate that AI excelled in both molecular subtyping and Ki67 computation. The research conducted by Bae et al. (Reference Bae, Jeon, Hwangbo, Yoo, Han and Feng2023) attained a molecular subtyping accuracy of 91%. Aswathy et al. (Reference Aswathy and Jagannath2021) found a balanced accuracy of 91.2% for their SVM model in predicting ER, PR and Her2 status. Consistent with findings in this review, several other studies have documented significantly enhanced interpathologist agreement rates following the implementation of AI aid. AI enhanced interpathologist agreement rates from approximately 88% to 96% for the Ki67 score and from roughly 89% to 93% for the ER/PR status (Abele et al., Reference Abele, Tiemann, Krech, Wellmann, Schaaf, Länger, Peters, Donner, Keil, Daifalla, Mackens, Mamilos, Minin, Krümmelbein, Krause, Stark, Zapf, Päpper, Hartmann and Lang2023). Similarly, concerning Her2 status, a study by Jung indicated a significant rise in interpathologist agreement rates from approximately 49.3% to 74.1% (p < 0.001), with the use of AI driven ER/PR and Her2 analyzers Additionally, the concordance rates for ER and PR status improved, albeit to a lower degree (93.0–96.5% p = 0.096 for ER, 84.6–91.5%, p = 0.006 for PR) (Jung et al., Reference Jung, Song, Cho, Shin, Lee, Jung, Lee, Park, Song, Park, Song, Park, Lee, Kang, Park, Pereira, Yoo, Chung, Ali and Kim2024).

Conclusion and future directions

Various AI tools may rapidly become an effective aid to histopathologists facing increasing demands for precise and speedy BC diagnosis. The identified drawbacks are paramount and need to be effectively addressed before we can reap the true benefits of this technology. It is recommended that future research endeavors focus on the following key areas to improve the performance and validity of existing AI models in the context of histopathological evaluation:

  1. 1. Establishment of Standardized Datasets: The creation of standardized, multi-institutional datasets that adhere to consistent preanalytic methodologies – such as sample preparation and tissue staining – should be prioritized to improve the generalizability of AI models across diverse clinical settings.

  2. 2. Integration of Multimodal Data: To enhance the predictive performance of AI systems, it is imperative to incorporate additional diagnostic modalities, including but not limited to imaging techniques and molecular profiling, into the analysis of histopathological specimens. This multimodal approach can offer a more comprehensive understanding of disease characteristics.

  3. 3. Development of Ethical Protocols: The formulation of robust ethical guidelines is essential to ensure the responsible application of AI technologies. This includes strategies for mitigating biases inherent in data and algorithms and enhancing transparency in the decision-making processes of AI systems.

  4. 4. Improvement in Economic Viability: It is crucial to explore cost-effective strategies for the implementation of AI solutions within clinical practice. An analysis of economic sustainability will ultimately support the broader adoption and integration of AI technologies in the healthcare sector.

By addressing these areas, future research can significantly contribute to the advancement of AI methodologies, ensuring they are both practical and ethically sound in clinical applications.

Open peer review

To view the open peer review materials for this article, please visit http://doi.org/10.1017/pcm.2025.10006.

Data availability statement

Data sharing not applicable – no new data generated.

Author contribution

AJ and AS made substantial contributions to the conception and design of the work. A.J. drafted the work and revised it critically for important intellectual content. A.S. was responsible for final revision and approval of the version to be published.

Financial support

This research received no specific grant from any funding agency, commercial or not-for-profit sectors.

Competing interests

The authors declare none.

References

Abele, N, Tiemann, K, Krech, T, Wellmann, A, Schaaf, C, Länger, F, Peters, A, Donner, A, Keil, F, Daifalla, K, Mackens, M, Mamilos, A, Minin, E, Krümmelbein, M, Krause, L, Stark, M, Zapf, A, Päpper, M, Hartmann, A and Lang, T (2023) Noninferiority of artificial intelligence-assiste analysis of Ki-67 and estrogen/progesterone receptor in breast cancer routine diagnostics. Modern Pathology: An Official Journal of the United States and Canadian Academy of Pathology Inc 36(3), 100033. https://doi.org/10.1016/J.MODPAT.2022.100033.Google Scholar
Aeffner, F, Zarella, MD, Buchbinder, N, Bui, MM, Goodman, MR, Hartman, DJ, Lujan, GM, Molani, MA, Parwani, AV, Lillard, K, Turner, OC, Vemuri, VNP, Yuil-Valdes, AG and Bowman, D (2019) Introduction to digital image analysis in whole-slide imaging: A white paper from the digital pathology association. Journal of Pathology Informatics 10(1), 119. https://doi.org/10.4103/JPI.JPI_82_18.Google Scholar
Ahn, JS, Shin, S, Yang, SA, Park, EK, Kim, KH, Cho, SI, Ock, CY and Kim, S (2023) Artificial intelligence in breast cancer diagnosis and personalized medicine. Journal of Breast Cancer 26(5), 405435. https://doi.org/10.4048/jbc.2023.26.e45.Google Scholar
Al-Hamadani, MN, Fadhel, MA, Alzubaidi, L and Harangi, B (2024) Reinforcement learning algorithms and applications in healthcare and robotics: A comprehensive and systematic review. Sensors 24(8), 2461. https://doi.org/10.3390/s24082461.Google Scholar
Alajanbi, M, Malerba, D and Liu, H (2021) Distributed reduced convolution neural networks. Mesopotamian. Journal of Big Data 20, 2528. https://doi.org/10.58496/MJBD/2021/005.Google Scholar
Aleskandarany, MA, Green, AR, Benhasouna, AA, Barros, FF, Neal, K, Reis-Filho, JS, Ellis, IO and Rakha, EA (2012) Prognostic value of proliferation assay in the luminal, HER2-positive, and triple-negative biologic classes of breast cancer. Breast Cancer Research 14(1), R3.Google Scholar
Alowais, SA, Alghamdi, SS, Alsuhebany, N, Alqahtani, T, Alshaya, AI, Almohareb, SN, Aldairem, A, Alrashed, M, Bin Saleh, K, Badreldin, HA and Al Yami, MS (2023) Revolutionizing healthcare: The role of artificial intelligence in clinical practice. BMC Medical Education 23(1), 689. https://doi.org/10.1186/s12909-023-04698-z.Google Scholar
Alshuhri, MS, Al-Musawi, SG, Al-Alwany, AA, Uinarni, H, Rasulova, I, Rodrigues, P, Alkhafaji, AT, Alshanberi, AM, Alawadi, AH and Abbas, AH (2024) Artificial intelligence in cancer diagnosis: Opportunities and challenges. Pathology- Research and Practice 253, 154996. https://doi.org/10.1016/j.prp.2023.154996.Google Scholar
Amin, MS and Ahn, H (2023) FabNet: A features agglomeration-based convolutional neural network for multiscale breast cancer histopathology images classification. Cancers 15(4), 1013. https://doi.org/10.3390/cancers15041013.Google Scholar
Anyoha, R (2017) The history of artificial intelligence. Science in the News- Harvard Graduate School of Arts and Sciences. Available at https://sitn.hms.harvard.edu/flash/2017/history-artificial-intelligence/.Google Scholar
Aswathy, MA and Jagannath, M (2021) An SVM approach towards breast cancer classification from H&E-stained histopathology images based on integrated features. Medical & Biological Engineering & Computing 59(9), 17731783. https://doi.org/10.1007/s11517-021-02403-0.Google Scholar
Aresta, G, Araújo, T, Kwok, S, Chennamsetty, SS, Safwan, M, Alex, V, Marami, B, Prastawa, M, Chan, M, Donovan, M, Fernandez, G, Zeineh, J, Kohl, M, Walz, C, Ludwig, F, Braunewell, S, Baust, M, Vu, QD, To, MNN, Kim, E, Kwak, JT, Galal, S, Sanchez-Freire, V, Brancati, N, Frucci, M, Riccio, D, Wang, Y, Sun, L, Ma, K, Fang, J, Kone, I, Boulmane, L, Campilho, A, Eloy, C, Polónia, A and Aguiar, P (2019) Bach: Grand challenge on breast cancer histology images. Medical Image Analysis 56, 122139. https://doi.org/10.1016/j.media.2019.05.010.Google Scholar
Attallah, O, Anwar, F, Ghanem, NM and Ismail, MA (2021) Histo-CADx: Duo cascaded fusion stages for breast cancer diagnosis from histopathological images. PeerJ Computer Science 7, e493. https://doi.org/10.7717/PEERJ-CS.493.Google Scholar
Aubreville, M, Stathonikos, N, Donovan, TA, Klopfleisch, R, Ammeling, J, Ganz, J, Wilm, F, Veta, M, Jabari, S, Eckstein, M, Annuscheit, J, Krumnow, C, Bozaba, E, Çayır, S, Gu, H, Chen, X‘A’, Jahanifar, M, Shephard, A, Kondo, S, Kasai, S, Kotte, S, Saipradeep, VG, Lafarge, MW, Koelzer, VH, Wang, Z, Zhang, Y, Yang, S, Wang, X, Breininger, K and Bertram, CA (2024) Domain generalization across tumor types, laboratories, and species – Insights from the 2022 edition of the mitosis domain generalization challenge. Medical Image Analysis 94, 103155. https://doi.org/10.1016/j.media.2024.103155.Google Scholar
Bae, K, Jeon, YS, Hwangbo, Y, Yoo, CW, Han, N and Feng, M (2023) Data-efficient computational pathology platform for faster and cheaper breast cancer subtype identifications: Development of a deep learning model. JMIR Cancer 9(1), e45547. https://doi.org/10.2196/45547.Google Scholar
Bajić, F, Orel, O and Habijan, M (2022) A multi-purpose shallow convolutional neural network for chart images. Sensors 22(20), 7695. https://doi.org/10.3390/s22207695.Google Scholar
Basu, K, Sinha, R, Ong, A and Basu, T (2020) Artificial intelligence: How is it changing medical sciences and its future? Indian Journal of Dermatology 65(5), 365370. https://doi.org/10.4103/ijd.IJD_421_20.Google Scholar
Bejnordi, BE, Veta, M, Van Diest, PJ, Van Ginneken, B, Karssemeijer, N, Litjens, G, Van Der Laak, JA, Hermsen, M, Manson, QF, Balkenhol, M and Geessink, O (2017) Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318(22), 21992210. https://doi.org/10.1001/JAMA.2017.14585.Google Scholar
Bejnordi, BE, Zuidhof, G, Balkenhol, M, Hermsen, M, Bult, P, van Ginneken, B, Karssemeijer, N, Litjens, G and van der Laak, J (2017) Context-aware stacked convolutional neural networks for classification of breast carcinomas in whole-slide histopathology images. Journal of Medical Imaging 4(4), 044504. https://doi.org/10.1117/1.JMI.4.4.044504.Google Scholar
Birhane, A, Kasirzadeh, A, Leslie, D and Wachter, S (2023) Science in the age of large language models. Nature Reviews Physics 5(5), 277280. https://doi.org/10.1038/s42254-023-00581-4.Google Scholar
Bitkina, OV, Park, J and Kim, HK (2023) Application of artificial intelligence in medical technologies: A systematic review of main trends. DIGITAL HEALTH 9, 20552076231189331. https://doi.org/10.1177/20552076231189331.Google Scholar
Briganti, G (2023) Intelligence artificielle: Une introduction pour les cliniciens. Revue des Maladies Respiratoires 40(4),308313. https://doi.org/10.1016/j.rmr.2023.02.005.Google Scholar
Brown, S (2021) Machine learning, explained. MIT Sloan. Ideas Made to Matter. Available at https://mitsloan.mit.edu/ideas-made-to-matter/machine-learning-explained (accessed 4 May 2023).Google Scholar
Cardoso, F, Kyriakides, S, Ohno, S, Penault-Llorca, F, Poortmans, P, Rubio, IT, Zackrisson, S and Senkus, E, ESMO Guidelines Committee. Electronic address: clinicalguidelines@esmo.org (2019) ESMO guidelines committee. Early breast cancer: ESMO clinical practice guidelines for diagnosis, treatment and follow-up. Annals of Oncology 30(8),1194–220. https://doi.org/10.1093/annonc/mdz173.Google Scholar
Carvalho, ED, Antonio Filho, OC, Silva, RR, Araujo, FH, Diniz, JO, Silva, AC, Paiva, AC and Gattass, M (2020) Breast cancer diagnosis from histopathological images using textural features and CBIR. Artificial Intelligence in Medicine 105, 101845. https://doi.org/10.1016/J.ARTMED.2020.101845.Google Scholar
Chen, M and Decary, M (2020) Artificial intelligence in healthcare: An essential guide for health leaders. Healthcare Management Forum, 18 33(1), 10. https://doi.org/10.1177/0840470419873123.Google Scholar
Cohen, MB, Martin, M, Gross, DJ, Johnson, K, Robboy, SJ, Wheeler, TM, Johnson, RL and Black-Schaffer, WS (2022) Features of burnout amongst pathologists: A reassessment. Academic Pathology 9(1), 100052. https://doi.org/10.1016/J.ACPATH.2022.100052.Google Scholar
Cree, IA, Tan, PH, Travis, WD, Wesseling, P, Yagi, Y, White, VA, Lokuhetty, D and Scolyer, RA (2021) Counting mitoses: SI (ze) matters! Modern Pathology 34(9), 16511657. https://doi.org/10.1038/s41379-021-00825-7.Google Scholar
Cruz-Roa, A, Gilmore, H, Basavanhally, A, Feldman, M, Ganesan, S, Shih, NN, Tomaszewski, J, González, FA and Madabhushi, A (2017) Accurate and reproducible invasive breast cancer detection in whole-slide images: A deep learning approach for quantifying tumor extent. Scientific Reports 7(1), 46450. https://doi.org/10.1038/srep46450.Google Scholar
Egger, J, Gsaxner, C, Pepe, A, Pomykala, KL, Jonske, F, Kurz, M, Li, J and Kleesiek, J (2022) Medical deep learning – A systematic meta-review. Computer Methods and Programs in Biomedicine 221, 106874. https://doi.org/10.1016/j.cmpb.2022.106874.Google Scholar
Elmore, JG, Barnhill, RL, Elder, DE, Longton, GM, Pepe, MS, Reisch, LM, Carney, PA, Titus, LJ, Nelson, HD, Onega, T, Tosteson, AN, Weinstock, MA, Knezevich, SR and Piepkorn, MW (2017) Pathologists’ diagnosis of invasive melanoma and melanocytic proliferations: Observer accuracy and reproducibility study. BMJ 357, j2813. https://doi.org/10.1136/bmj.j2813.Google Scholar
Elmore, JG, Longton, GM, Carney, PA, Geller, BM, Onega, T, Tosteson, AN, Nelson, HD, Pepe, MS, Allison, KH, Schnitt, SJ, O’Malley, FP and Weaver, DL (2015) Diagnostic concordance among pathologists interpreting breast biopsy specimens. JAMA 313(11), 11221132. https://doi.org/10.1001/JAMA.2015.1405.Google Scholar
Försch, S, Klauschen, F, Hufnagl, P and Roth, W (2021) Artificial intelligence in pathology. Deutsches Ärzteblatt International 118(12), 199204. https://doi.org/10.3238/ARZTEBL.M2021.0011.Google Scholar
Gandomkar, Z, Brennan, PC and Mello-Thoms, C (2018) MuDeRN: Multi-category classification of breast histopathological image using deep residual networks. Artificial Intelligence in Medicine 88, 1424. https://doi.org/10.1016/J.ARTMED.2018.04.005.Google Scholar
Grzybowski, A, Pawlikowska–Łagód, K and Lambert, WC (2024) A history of artificial intelligence. Clinics in Dermatology 42(3), 221229. https://doi.org/10.1016/j.clindermatol.2023.12.016.Google Scholar
Han, Z, Wei, B, Zheng, Y, Yin, Y, Li, K and Li, S (2017) Breast cancer multi-classification from histopathological images with structured deep learning model. Scientific Reports 7(1), 4172. https://doi.org/10.1038/s41598-017-04075-z.Google Scholar
Hanna, MG, Pantanowitz, L (2017) Why is digital pathology in cytopathology lagging behind surgical pathology? Cancer Cytopathol. 125(7):519520. https://doi.org/10.1002/cncy.21855.Google Scholar
Hanna, MG, Reuter, VE, Hameed, MR, Tan, LK, Chiang, S, Sigel, C, Hollmann, T, Giri, D, Samboy, J, Moradel, C, Rosado, A, Otilano, JR III, England, C, Corsale, L, Stamelos, E, Yagi, Y, Schüffler, PJ, Fuchs, T, Klimstra, DS and Sirintrapun, SJ (2019) Whole slide imaging equivalency and efficiency study: Experience at a large academic center. Modern Pathology 32(7), 916928. https://doi.org/10.1038/s41379-019-0205-0.Google Scholar
Hatta, S, Ichiuji, Y, Mabu, S, Kugler, M, Hontani, H, Okoshi, T, Fuse, H, Kawada, T, Kido, S, Imamura, Y, Naiki, H and Inai, K (2023) Improved artificial intelligence discrimination of minor histological populations by supplementing with color-adjusted images. Scientific Reports 13(1), 19068. https://doi.org/10.1038/s41598-023-46472-7.Google Scholar
Hosny, A, Parmar, C, Quackenbush, J, Schwartz, LH and Aerts, HJ (2018) Artificial intelligence in radiology. Nature Reviews Cancer 18(8), 500510. https://doi.org/10.1038/s41568-018-0016-5.Google Scholar
He, Z, Chen, Z, Tan, M, Elingarami, S, Liu, Y, Li, T, Deng, Y, He, N, Li, S, Fu, J and Li, W (2020) A review on methods for diagnosis of breast cancer cells and tissues. Cell Proliferation 53(7), e12822. https://doi.org/10.1111/cpr.12822.Google Scholar
ICIAR 2018 - Grand Challenge. Available at https://iciar2018-challenge.grand-challenge.org/ (accessed 09 July 2025).Google Scholar
Jiang, Y, Luo, J, Huang, D, Liu, Y and Li, DD (2022) Machine learning advances in microbiology: A review of methods and applications. Frontiers in Microbiology 13, 925454. https://doi.org/10.3389/fmicb.2022.925454.Google Scholar
Jairam, MP and Ha, R (2022) A review of artificial intelligence in mammography. Clinical Imaging 88, 3644. https://doi.org/10.1016/j.clinimag.2022.05.005.Google Scholar
Jones, NC, Nazarian, RM, Duncan, LM, Kamionek, M, Lauwers, GY, Tambouret, RH, Wu, CL, Nielsen, GP, Brachtel, EF, Mark, EJ, Sadow, PM, Grabbe, JP and Wilbur, DC (2015) Interinstitutional whole slide imaging teleconsultation service development: Assessment using internal training and clinical consultation cases. Archives of Pathology and Laboratory Medicine 139(5), 627635. https://doi.org/10.5858/ARPA.2014-0133-OA.Google Scholar
Jovel, J and Greiner, R (2021) An introduction to machine learning approaches for biomedical research. Frontiers in Medicine 8, 771607. https://doi.org/10.3389/fmed.2021.771607.Google Scholar
Jung, M, Song, SG, Cho, SI, Shin, S, Lee, T, Jung, W, Lee, H, Park, J, Song, S, Park, G, Song, H, Park, S, Lee, J, Kang, M, Park, J, Pereira, S, Yoo, D, Chung, K, Ali, SM and Kim, SW (2024) Augmented interpretation of HER2, ER, and PR in breast cancer by artificial intelligence analyzer: Enhancing interobserver agreement through a reader study of 201 cases. Breast Cancer Research 26(1), 31. https://doi.org/10.1186/s13058-024-01784-y.Google Scholar
Kanavati, F, Ichihara, S and Tsuneki, M (2022) A deep learning model for breast ductal carcinoma in situ classification in whole slide images. Virchows Archiv. 480(5), 10091022. https://doi.org/10.1007/S00428-021-03241-Z.Google Scholar
Kaul, V, Enslin, S and Gross, SA (2020) History of artificial intelligence in medicine. Gastrointestinal Endoscopy 92(4), 807812. https://doi.org/10.1016/j.gie.2020.06.040.Google Scholar
Kim, I, Kang, K, Song, Y and Kim, TJ (2022) Application of artificial intelligence in pathology: Trends and challenges. Diagnostics 12(11), 2794. https://doi.org/10.3390/diagnostics12112794.Google Scholar
Kolla, B (2024) An integrated approach for magnification independent breast cancer classification. Biomedical Signal Processing and Control 88, 105594. https://doi.org/10.1016/J.BSPC.2023.105594.Google Scholar
Kriegeskorte, N and Golan, T (2019) Neural network models and deep learning. Current Biology 29(7), R231R236. https://doi.org/10.1016/j.cub.2019.02.034.Google Scholar
Kufel, J, Bargieł-Łączek, K, Kocot, S, Koźlik, M, Bartnikowska, W, Janik, M, Czogalik, Ł, Dudek, P, Magiera, M, Lis, A, Paszkiewicz, I, Nawrat, Z, Cebula, M and Gruszczyńska, K (2023) What is machine learning, artificial neural networks and deep learning?-examples of practical applications in medicine. Diagnostics 13(15), 2582. https://doi.org/10.3390/diagnostics13152582.Google Scholar
Lanzagorta-Ortega, D, Carrillo-Pérez, DL and Carrillo-Esper, R (2022) Inteligencia artificial en medicina. presente y futuro. Gaceta medica de Mexico 158(Suplement 1), 1721. https://doi.org/10.24875/GMM.M22000688.Google Scholar
Lathwal, A, Kumar, R, Arora, C and Raghava, GPS (2020) Identification of prognostic biomarkers for major subtypes of non-small-cell lung cancer using genomic and clinical data. Journal of Cancer Research and Clinical Oncology 146(11), 27432752. https://doi.org/10.1007/s00432-020-03318-3.Google Scholar
Laxmisagar, HS and Hanumantharaju, MC (2022) Detection of breast cancer with lightweight deep neural networks for histology image classification. Critical Reviews in Biomedical Engineering 50(2), 119. https://doi.org/10.1615/CRITREVBIOMEDENG.2022043417.Google Scholar
Lee, J, Warner, E, Shaikhouni, S, Bitzer, M, Kretzler, M, Gipson, D, Pennathur, S, Bellovich, K, Bhat, Z, Gadegbeku, C, Massengill, S, Perumal, K, Saha, J, Yang, Y, Luo, J, Zhang, X, Mariani, L, Hodgin, JB, Rao, A and C-PROBE Study (2022) Unsupervised machine learning for identifying important visual features through bag-of-words using histopathology data from chronic kidney disease. Scientific Reports 12(1), 4832. https://doi.org/10.1038/s41598-022-08974-8.Google Scholar
Liu, B, Chi, W, Li, X, Li, P, Liang, W, Liu, H, Wang, W and He, J (2020) Evolving the pulmonary nodules diagnosis from classical approaches to deep learning-aided decision support: Three decades’ development course and future prospect. Journal of Cancer Research and Clinical Oncology 146(1), 153185. https://doi.org/10.1007/s00432-019-03098-5.Google Scholar
Liu, L, Feng, W, Chen, C, Liu, M, Qu, Y and Yang, J (2022) Classification of breast cancer histology images using MSMV-PFENet. Scientific Reports 12(1), 110. https://doi.org/10.1038/s41598-022-22358-y.Google Scholar
Liu, M, Hu, L, Tang, Y, Wang, C, He, Y, Zeng, C, Lin, K, He, Z and Huo, W (2022) A deep learning method for breast cancer classification in the pathology images. IEEE Journal of Biomedical and Health Informatics 26(10), 50255032. https://doi.org/10.1109/JBHI.2022.3187765.Google Scholar
Loughrey, MB, Kelly, PJ, Houghton, OP, Coleman, HG, Carson, A, Salto-Tellez, M and Hamilton, PW (2015) Digital slide viewing for primary reporting in gastrointestinal pathology: A validation study. Virchows Archiv 467(2), 137144. https://doi.org/10.1007/s00428-015-1780-1.Google Scholar
Lu, C, Romo-Bucheli, D, Wang, X, Janowczyk, A, Ganesan, S, Gilmore, H, Rimm, D and Madabhushi, A (2018) Nuclear shape and orientation features from H&E images predict survival in early-stage estrogen receptor-positive breast cancers. Laboratory Investigation 98(11), 14381448. https://doi.org/10.1038/s41374-018-0095-7.Google Scholar
Miotto, R, Wang, F, Wang, S, Jiang, X and Dudley, JT (2018) Deep learning for healthcare: Review, opportunities and challenges. Briefings in Bioinformatics 19(6), 12361246. https://doi.org/10.1093/bib/bbx044.Google Scholar
Moor, J (2006) The Dartmouth College artificial intelligence conference: The next fifty years. AI Magazine 27(4), 8791. https://www.aaai.org/ojs/index.php/aimagazine/article/view/1904/1802.Google Scholar
Morelli, P, Porazzi, E, Ruspini, M, Restelli, U and Banfi, G (2013) Analysis of errors in histology by root cause analysis: A pilot study. Journal of Preventive Medicine and Hygiene 54(2), 90.Google Scholar
Moxley-Wyles, B, Colling, R and Verrill, C (2020) Artificial intelligence in pathology: An overview. Diagnostic Histopathology 26(11), 513520. https://doi.org/10.1016/j.mpdhp.2020.08.004.Google Scholar
Mukhopadhyay, S, Feldman, MD, Abels, E, Ashfaq, R, Beltaifa, S, Cacciabeve, NG, Cathro, HP, Cheng, L, Cooper, K, Dickey, GE, Gill, RM, Heaton, RP Jr, Kerstens, R, Lindberg, GM, Malhotra, RK, Mandell, JW, Manlucu, ED, Mills, AM, Mills, SE, Moskaluk, CA and Taylor, CR (2018) Whole slide imaging ersus microscopy for primary diagnosis in surgical pathology: A multicenter blinded randomized noninferiority study of 1992 cases (pivotal study). The American Journal of Surgical Pathology 42(1), 3952. https://doi.org/10.1097/PAS.0000000000000948.Google Scholar
Muthukrishnan, N, Maleki, F, Ovens, K, Reinhold, C, Forghani, B and Forghani, R (2020) Brief history of artificial intelligence. Neuroimaging Clinics of North America 30(4), 393399. https://doi.org/10.1016/j.nic.2020.07.004.Google Scholar
Nahid, AA, Mehrabi, MA and Kong, Y (2018) Histopathological breast cancer image classification by deep neural network techniques guided by local clustering. BioMed Research International 18(1), 2362108. https://doi.org/10.1155/2018/2362108.Google Scholar
Nardin, S, Mora, E, Varughese, FM, D’Avanzo, F, Vachanaram, AR, Rossi, V, Saggia, C, Rubinelli, S and Gennari, A (2020) Breast cancer survivorship, quality of life, and late toxicities. Frontiers in Oncology 10, 864. https://doi.org/10.3389/fonc.2020.00864.Google Scholar
Nateghi, R, Danyali, H and Helfroush, MS (2021) A deep learning approach for mitosis detection: Application in tumor proliferation prediction from whole slide images. Artificial Intelligence in Medicine 114, 102048. https://doi.org/10.1016/j.artmed.2021.102048.Google Scholar
Niazi, MKK, Parwani, AV and Gurcan, MN (2019) Digital pathology and artificial intelligence. The Lancet Oncology 20(5), e253e261. https://doi.org/10.1016/S1470-2045(19)30154-8.Google Scholar
Ono, S and Goto, T (2022) Introduction to supervised machine learning in clinical epidemiology. Annals of Clinical Epidemiology 4(3), 6371. https://doi.org/10.37737/ace.22009.Google Scholar
Page, MJ, McKenzie, JE, Bossuyt, PM, Boutron, I, Hoffmann, TC, Mulrow, CD, Shamseer, L, Tetzlaff, JM, Akl, EA, Brennan, SE, Chou, R, Glanville, J, Grimshaw, JM, Hróbjartsson, A, Lalu, MM, Li, T, Loder, EW, Mayo-Wilson, E, McDonald, S, McGuinness, L, Stewart, LA, Thomas, J, Tricco, AC, Welch, VA, Whiting, P and Moher, D (2021) The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 372, n71. https://doi.org/10.1136/bmj.n71.Google Scholar
Pantanowitz, L, Hartman, D, Qi, Y, Cho, EY, Suh, B, Paeng, K, Dhir, R, Michelow, P, Hazelhurst, S, Song, SY and Cho, SY (2020) Accuracy and efficiency of an artificial intelligence tool when counting breast mitoses. Diagnostic Pathology 15(1), 80. https://doi.org/10.1186/s13000-020-00995-z.Google Scholar
Pettit, RW, Fullem, R, Cheng, C and Amos, CI (2021) Artificial intelligence, machine learning, and deep learning for clinical outcome prediction. Emerging Topics in Life Sciences 5(6), 729745. https://doi.org/10.1042/ETLS20210246.Google Scholar
Polónia, A, Campelos, S, Ribeiro, A, Aymore, I, Pinto, D, Biskup-Fruzynska, M, Veiga, RS, Canas-Marques, R, Aresta, G, Araújo, T, Campilho, A, Kwok, S, Aguiar, P and Eloy, C (2021) Artificial intelligence improves the accuracy in histologic classification of breast lesions. American Journal of Clinical Pathology 155(4), 527536. https://doi.org/10.1093/ajcp/aqaa151.Google Scholar
Ragab, M, Al-Ghamdi, ASA, Fakieh, B, Choudhry, H, Mansour, RF and Koundal, D (2022) Prediction of diabetes through retinal images using deep neural network. Computational Intelligence and Neuroscience 7887908. https://doi.org/10.1155/2022/7887908.Google Scholar
Robbins, CJ, Fernandez, AI, Han, G, Wong, S, Harigopal, M, Podoll, M, Singh, K, Ly, A, Kuba, MG, Wen, H, Sanders, MA, Brock, J, Wei, S, Fadare, O, Hanley, K, Jorns, J, Snir, OL, Yoon, E, Rabe, K, Soong, TR, Reisenbichler, ES and Rimm, DL (2023) Multi-institutional assessment of pathologist scoring HER2 immunohistochemistry. Modern Pathology 36(1), 100032.Google Scholar
Robboy, SJ, Gross, D, Park, JY, Kittrie, E, Crawford, JM, Johnson, RL, Cohen, MB, Karcher, DS, Hoffman, RD, Smith, AT and Black-Schaffer, WS (2020) Reevaluation of the US pathologist workforce size. JAMA Network Open 3(7), e2010648. https://doi.org/10.1001/JAMANETWORKOPEN.2020.10648.Google Scholar
Romo-Bucheli, D, Janowczyk, A, Gilmore, H, Romero, E and Madabhushi, A (2016) Automated tubule nuclei quantification and correlation with oncotype DX risk categories in ER+ breast cancer whole slide images. Scientific Reports 6, 32706. https://doi.org/10.1038/srep32706.Google Scholar
Romo-Bucheli, D, Janowczyk, A, Gilmore, H, Romero, E and Madabhushi, A (2017) A deep learning based strategy for identifying and associating mitotic activity with gene expression derived risk categories in estrogen receptor positive breast cancers. Cytometry Part A 91(6), 566573. https://doi.org/10.1002/cyto.a.23065.Google Scholar
Roux, L, Racoceanu, D, Loménie, N, Kulikova, M, Irshad, H, Klossa, J, Capron, F, Genestie, C, Le Naour, G and Gurcan, MN (2013) Mitosis detection in breast cancer histological images; An ICPR 2012 contest. Journal of Pathology Informatics 4, 8. https://doi.org/10.4103/2153-3539.112693.Google Scholar
Saha, M, Chakraborty, C and Racoceanu, D (2018) Efficient deep learning model for mitosis detection using breast histopathology images. Computerized Medical Imaging and Graphics 64, 2940. https://doi.org/10.1016/J.COMPMEDIMAG.2017.12.001.Google Scholar
Sarker, IH (2021) Deep learning: A comprehensive overview on techniques, taxonomy, applications and research directions. SN Computer Science 2(6), 420. https://doi.org/10.1007/s42979-021-00815-1.Google Scholar
Sedeta, ET, Jobre, B and Avezbakiyev, B (2023) Breast cancer: Global patterns of incidence, mortality, and trends. Journal of Clinical Oncology 41(16_suppl), 10528–10528. https://doi.org/10.1200/jco.2023.41.16_suppl.10528.Google Scholar
Senousy, Z, Abdelsamea, MM, Gaber, MM, Abdar, M, Acharya, UR, Khosravi, A and Nahavandi, S (2022) MCUa: Multi-level context and uncertainty aware dynamic deep ensemble for breast cancer histology image classification. IEEE Transactions on Biomedical Engineering 69(2), 818829. https://doi.org/10.1109/TBME.2021.3107446.Google Scholar
Shafi, S, Kellough, DA, Lujan, G, Satturwar, S, Parwani, AV and Li, Z (2022) Integrating and validating automated digital imaging analysis of estrogen receptor immunohistochemistry in a fully digital workflow for clinical use. Journal of Pathology Informatics 13, 100122. https://doi.org/10.1016/J.JPI.2022.100122.Google Scholar
Shajari, S, Kuruvinashetti, K, Komeili, A, Sundararaj, U (2023) The emergence of AI-based wearable sensors for digital health technology: A review. Sensors 29(23), 9498. https://doi.org/10.3390/s23239498.Google Scholar
Shamshiri, MA, Krzyżak, A, Kowal, M and Korbicz, J (2023) Compatible-domain transfer learning for breast cancer classification with limited annotated data. Computers in Biology and Medicine 154, 106575. https://doi.org/10.1016/J.COMPBIOMED.2023.106575.Google Scholar
Sidey-Gibbons, JAM and Sidey-Gibbons, CJ (2019) Machine learning in medicine: A practical introduction. BMC Medical Research Methodology 19(1), 64. https://doi.org/10.1186/s12874-019-0681-4.Google Scholar
Singh, P, Kumar, R, Gupta, M and Al-Turjman, F (2024) SegEIR-net: A robust histopathology image analysis framework for accurate breast cancer classification. Current Medical Imaging 20, e15734056278974. https://doi.org/10.2174/0115734056278974231211102917.Google Scholar
Soliman, A, Li, Z and Parwani, AV (2024) Artificial intelligence’s impact on breast cancer pathology: A literature review. Diagnostic Pathology 19(1), 118. https://doi.org/10.1186/S13000-024-01453-W/FIGURES/2.Google Scholar
Spanhol, FA, Oliveira, LS, Petitjean, C and Heutte, L (2016) A dataset for breast cancer histopathological image classification. IEEE Transactions on Biomedical Engineering 63(7), 14551462. 10.1109/TBME.2015.2496264.Google Scholar
Srikantamurthy, MM, Rallabandi, VPS, Dudekula, DB, Natarajan, S and Park, J (2023) Classification of benign and malignant subtypes of breast cancer histopathology imaging using hybrid CNN-LSTM based transfer learning. BMC Medical Imaging 23(1), 19. https://doi.org/10.1186/S12880-023-00964-0.Google Scholar
Suberi, AA, Zakaria, WN and Tomari, R (2017) Dendritic cell recognition in computer aided system for cancer immunotherapy. Procedia Computer Science 105, 177182. https://doi.org/10.1016/j.procs.2017.01.201.Google Scholar
Sufyan, M, Shokat, Z and Ashfaq, UA (2023) Artificial intelligence in cancer diagnosis and therapy: Current status and future perspective. Computers in Biology and Medicine 165, 107356. https://doi.org/10.1016/j.compbiomed.2023.107356.Google Scholar
Tabata, K, Mori, I, Sasaki, T, Itoh, T, Shiraishi, T, Yoshimi, N, Maeda, I, Harada, O, Taniyama, K, Taniyama, D, Watanabe, M, Mikami, Y, Sato, S, Kashima, Y, Fujimura, S and Fukuoka, J (2017) Whole-slide imaging at primary pathological diagnosis: Validation of whole-slide imaging-based primary pathological diagnosis at twelve Japanese academic institutes. Pathology International 67(11), 547554. https://doi.org/10.1111/PIN.12590.Google Scholar
Tizhoosh, HR and Pantanowitz, L (2018) Artificial intelligence and digital pathology: Challenges and opportunities. Journal of Pathology Informatics 9, 38. https://doi.org/10.4103/jpi.jpi_53_18.Google Scholar
Turning, A (1950) I.—Computing machinery and intelligence. Mind 236, 433460. https://doi.org/10.1093/mind/lix.236.433.Google Scholar
Umer, MJ, Sharif, M, Kadry, S and Alharbi, A (2022) Multi-class classification of breast cancer using 6B-net with deep feature fusion and selection method. Journal of Personalized Medicine 12(5), 683. https://doi.org/10.3390/JPM12050683/S1.Google Scholar
van Diest, PJ, Flach, RN, van Dooijeweert, C, Makineli, S, Breimer, GE, Stathonikos, N, Pham, P, Nguyen, TQ and Veta, M (2024) Pros and cons of artificial intelligence implementation in diagnostic pathology. Histopathology 84(6), 924934. https://doi.org/10.1111/HIS.15153.Google Scholar
Veta, M, Heng, YJ, Stathonikos, N, Bejnordi, BE, Beca, F, Wollmann, T, Rohr, K, Shah, MA, Wang, D, Rousson, M, Hedlund, M, Tellez, D, Ciompi, F, Zerhouni, E, Lanyi, D, Viana, M, Kovalev, V, Liauchuk, V, Phoulady, HA, Qaiser, T and Pluim, JPW (2019) Predicting breast tumor proliferation from whole-slide images: The TUPAC16 challenge. Medical Image Analysis 54, 111121. https://doi.org/10.1016/j.media.2019.02.012.Google Scholar
van Dooijeweert, C, van Diest, PJ and Ellis, IO (2021) Grading of invasive breast carcinoma: The way forward. Virchows Archiv 480(1), 3343. https://doi.org/10.1007/s00428-021-03141-2.Google Scholar
Wang, X, Zhang, J, Yang, S, Xiang, J, Luo, F, Wang, M, Zhang, J, Yang, W, Huang, J and Han, X (2023) A generalizable and robust deep learning algorithm for mitosis detection in multicenter breast histopathological images. Medical Image Analysis 84, 102703. https://doi.org/10.1016/J.MEDIA.2022.102703.Google Scholar
Watkins, EJ (2019) Overview of breast cancer. Journal of the American Academy of Physician Assistants 32(10), 1317. https://doi.org/10.1097/01.JAA.0000580524.95733.3d.Google Scholar
Whitney, J, Corredor, G, Janowczyk, A, Ganesan, S, Doyle, S, Tomaszewski, J, Feldman, M, Gilmore, H and Madabhushi, A (2018) Quantitative nuclear histomorphometry predicts oncotype DX risk categories for early stage ER+ breast cancer. BMC Cancer 18(1), 610. https://doi.org/10.1186/s12885-018-4448-9.Google Scholar
Xu, N, Yang, D, Arikawa, K and Bai, C (2023) Application of artificial intelligence in modern medicine. Clinical eHealth 6, 130137. https://doi.org/10.1016/j.ceh.2023.09.001.Google Scholar
Xu, X, An, M, Zhang, J, Liu, W and Lu, L (2022) A high-precision classification method of mammary cancer based on improved denseNet driven by an attention mechanism. Computational and Mathematical Methods in Medicine 22, 8585036. https://doi.org/10.1155/2022/8585036.Google Scholar
Zhu, C, Song, F, Wang, Y, Dong, H, Guo, Y and Liu, J (2019) Breast cancer histopathology image classification through assembling multiple compact CNNs. BMC Medical Informatics and Decision Making 19(1), 117. https://doi.org/10.1186/S12911-019-0913-X/TABLES/9.Google Scholar
Figure 0

Figure 1. Boolean search with keywords and their synonyms.

Figure 1

Figure 2. Flowchart describing the literature inclusion process.

Figure 2

Table 1. Summary of studies categorizing breast lesions (i.e., benign vs. malignant)

Figure 3

Table 2. Summary of studies categorizing breast lesions (i.e., normal/benign/in situ/invasive)

Figure 4

Table 3. Summary of studies assessing different histopathological subtypes of both benign and malignant breast lesions

Figure 5

Table 4. Summary of studies assessing breast cancer molecular subtyping (i.e., according to estrogen receptors (ER), progesterone receptors (PR) and Her2 – with or without ki67 mitotic index analysis)

Figure 6

Table 5. Summary of studies assessing the Ki67 mitotic index

Author comment: Artificial intelligence in breast cancer diagnosis: A systematic literature review — R0/PR1

Comments

No accompanying comment.

Review: Artificial intelligence in breast cancer diagnosis: A systematic literature review — R0/PR2

Conflict of interest statement

I declare that I have no competing interest in relation to this manuscript.

Comments

Overall Impression:

I read this paper with great interest, as there is a lot of work being done aiming to improve histopathologists’ and cytopathologists’ performance when diagnosing cancer. Unfortunately, the way this paper is written gives the strong impression that the authors want to push the message that AI-based models are here to save breast cancer diagnosis from pathologists’ errors and inter-observer variability, even though most of the work they cite only talks about the “potential” of these models to help pathologists in the clinical practice. Now, the keyword here is “potential”, as for the most part they have not been implemented in pathology laboratories yet, and massive challenges still prevent wide adoption, such as the different staining processes used by different labs, the lack of representation of rare diseases in the datasets used to train AI-based models, the reduced representation of minority populations in the same databases, integration of the AI-models with the LIS, etc. None of that seems to reduce the authors’ enthusiasm about the use of AI-models in histopathology. While I admire their optimism, the way that they present their views for the most part suggests great naivete about how things work in the real world, but in a few instances it seems to misrepresent what is actually being said in the papers that they cite so that it conforms better to the authors’ message. I think that this is very unfortunate, and I would highly recommend the authors to (1) tone down their enthusiasm and stick with the facts; (2) don’t confuse “potential” with actual use of AI-based models; and (3) don’t dismiss the immense challenges that still prevent the wide adoption of these models in clinical labs.

Abstract:

Pp 1, lines 18-29. There is no need to go into such great details about breast cancer incidence, mortality rates and projected number of new cases by 2040 as this is not a review of the disease itself, it’s a review of the use of AI in the diagnosis of the disease. In my opinion, the entire Abstract needs to be rewritten and focused on what the paper is actually discussing.

Pp 1, lines 29-32. Please clarify what you mean by “traditional methods”. Are these pathologists? If so, please state that.

Pp 1, lines 35-37. I believe that it is too early to state that AI has “improved accuracy, efficiency, and consistency”, as it is not currently in use in the clinical practice, so it hasn’t really been tested with real data, with its high degree of variability, including different staining methods used by different labs, presence of artifacts in the images, possible presence of rare disease in the slide, etc. I understand that the authors are enthusiastic about the topic, but given the current situation of AI in histopathology, I do believe that there is cause for some moderation in one’s claims.

Introduction:

Pp 2, lines 17-20. The authors state that “AI refers to the utilization of technology and computers to imitate human-like cognitive processes and intelligent actions”. Well, AI is implemented by machine learning (ML) algorithms, and many of those have no resemblance whatsoever to “human cognitive processes”. As such, please rewrite this statement.

Pp 2, line 49. It should read “AI applications in medicine have evolved…”.

Pp 2, line 54. What are “traditional algorithm-based methods”??? Please clarify.

Pp 3, lines 3-5. As the authors were previously talking about AI, I am assuming that the sentence that lists the benefits of “predictive models” is also referring to those that use AI. However, in general, predictive models do not have to be based in deep learning (DL) algorithms, they can be based on traditional machine learning (ML) algorithms, such as Support Vector Machines, and some of those traditional algorithms perform quite well. Please clarify what the authors are referring to in this sentence.

Pp 3, line 14. Not all machine learning algorithms “make predictions”. Many of them are used in segmentation or classification tasks. Please reword.

Pp 3, line 19. What is “The term” referring to??? Please clarify.

Pp 3, line 26. It should read “…. predefined results; similarly….”

Pp 3, line 41. Not all neural networks “mimic brain function”. Most of them are inspired by biological neurological systems, but the actual resemblance is small.

Pp 3, line 44. The authors first need to define what they mean by “neuron”. Secondly, I do not understand when the authors say that “each neuron processes inputs”. Do they mean to suggest that all neurons are connected to the input layer? Or that they process inputs from the previous layer? Please clarify.

Pp 3, line 46. The authors state that “ANNs have one hidden layer”, but this is not true for all types of ANNs. For example, Adaptive Resonance Theory (ART) ANNs have no hidden layers. Please reword.

Pp 3, lines 53-55. The authors state that “Recently, there has been a rise in using these models for accurate diagnoses”. This is factually not correct. While there may have been a rise in the development of Deep Learning (DL) algorithms for disease diagnoses, for the most part they are not used in the clinical practice, which is what is suggested by what the authors wrote. Please rewrite the sentence to reflect the true state of current application of DL in the clinical practice.

Pp 4, line 6. CNNs are also design for “classification” tasks. Please include those.

Pp 4, line 17. Please provide a reference for use of AI in “genomic analysis” and in “patient monitoring”.

Pp 4, line 20. Please provide a reference for use of AI in “wearable health technology” and in enhancing “doctor-patient interactions”.

Pp 4, line 22. Please provide a reference for use of AI in enabling “remote therapy”.

Pp 4, line 25. The paper by Alowais et al, 2023 discuss the “potential” role of AI in health care, but they also report at length on the challenges that need to be addressed before AI can be implemented in the clinical practice. I do not see how, from that paper, the authors can conclude that “Integrating AI into healthcare can significantly improve the effectiveness, accuracy, and personalization of medical diagnosis”. Please clarify.

Pp 4, line 35. How can the authors state that AI “is significantly advancing cancer diagnosis” and cite as support a paper (Sufyan et al 2023) that only talks about the “potential” of AI to advance cancer diagnosis??? There is a huge difference between potential and actual implementation, and the authors seem to not recognize that. Please modify your statement.

Pp 4, line 35. The paper by Alshuhri et al 2024 is not included in the References.

Pp 4, lines 40-42. The authors seem enthusiastic about how AI is going to “transform patient care”, however they seem to forget that this is not the first time that machine learning has been used to aid physicians in their practices. In the late 1990s and in the 2000s, Computer-Aided Detection (CADe) was also a big promise, but it completely failed to deliver when incorporated in the clinical practice. In light of that, I would strongly recommend that the authors moderate their enthusiasm for AI until we have actual evidence that it is improving patient care.

Pp 4, lines 49-51. The authors state that CADe and CADx “play vital roles in medical imaging”. First of all, that is not true. The clinical implementation of CADe and CADx have been marked by undelivered promises and a significant number of False Positives per case, which erodes radiologists’ trust in the system and leads less-experienced observers, like radiology residents, astray. Please do a more thorough search of the literature, instead of just presenting one paper (He Z, 2020, which by the way is not listed in the References) that probably supports your view that CADe and CADx are actually wonderful.

Pp 4, line 56. Please provide a reference for the statement that AI is improving accuracy in “identifying cancer progression”. Also, provide a reference for the statement that AI is “aiding in early detection and diagnosis”.

Pp 5, line 35. The authors state that histopathological diagnosis is “still relying on microscopic evaluations by human pathologists”. This does not take into account the fact that a lot of Institutions have moved on to Digital Pathology, which should be stated here. Furthermore, false positive and false negatived errors are not going to be erased by digitalizing the process of assessing slides. Similarly, inter- and intra-pathologist variability is also not going to be erased by moving to Digital Pathology, as the authors seem to suggest in this paragraph.

Pp 6, lines 3-5. The process of cancer diagnosis starts with the preparation of the slides and it ends with the pathologists’ interpretation, determining if and which type of disease may be present. Please clarify in which part of this process AI “is essential for improving diagnostic processes”.

Pp 6, line 15. Please clarify what the authors mean by saying that computer monitors have “much greater clarity than traditional microscopy”.

Pp 6, line 17. Which unit is the “100,000 x 100,000” representing? Are they pixels? Please clarify.

Pp 6, line 20. The authors state that Digital Pathology reduces “interpretation errors”, but because in the previous page they only vaguely alluded to false positives and false negatives that resulted as pathologists used the traditional microscopes to interpret slides, without citing any actual error rates, it is difficult to visualize the magnitude of the effect that Whole Slide Imaging (WSI) would have on the reduction of interpretation errors. Could the authors please be more specific about that?

Pp 6, line 54. The authors say that in the BACH challenge “AI could achieve accuracy levels comparable to pathologists”. Can they please describe a bit more about who these “pathologists” were? Were they domain experts? Were they general pathologists? Please clarify.

Pp 7, line 3. Please provide a reference for the statement that AI improved “interobserver concordance”.

Pp 7, line 22. Please provide a reference for the MITOSIS detection contest.

Pp 7, line 40. In the system developed by Nateghi et al, which can “identify regions of interest”, what are these regions of interest for? Areas to count mitoses? Please clarify.

Pp 7, lines 45-47. I do not understand what the authors mean when they say that AI “optimizes the time needed for pathologists”. Please clarify.

Pp 8, line 6. Instead of “… pathology team”, it should read “… pathology workflow”.

Methodology:

Fine as written.

Results:

Pp 12, line 39. Please provide references for the BreakHis, BACH and ICIAR datasets.

Pp 12, lines 39-41. How are the authors assessing that these datasets are “significant”? Please clarify.

Pp 22, line 14. Instead of “study”, it should read “studies”.

Pp 22, line 19. The authors report that the AI models in Bae et al (2023) study exhibited “an impressive accuracy rate of approximately 91%”, but in reality Bae et al’s paper cited a range of AUC values ranging from 0.75-0.91. Hence, it is not correct for the authors to simply extract the highest number in that range and claim that this was the generalized performance.

Pp 22, line 30. Please include a reference for the issue of variability across different scanners.

Pp 22, lines 32-51. The authors cite two steps that were implemented to “enhance model applicability”. However, they fail to report what was the model’s performance at the end of the implementation of those 2 steps.

Pp 23, line 3. Were any statistical tests carried out to determine that the model’s performance was “significantly” improved when handcrafted features were integrated into the DL architecture? Please clarify.

Pp 23, line 8. Can the authors please show the change in the model’s performance from pre- to post-integration of handcrafted features in the DL architecture?

Pp 23, line 26. Please include a reference for the first statement in the paragraph.

Pp 23, lines 38-40. Can the authors please provide some additional information about the performance of the new approach they are discussing vs. that of single-model systems in the BACH dataset? It is difficult to take these statements at face value without seeing some numbers.

Pp 24, line 38. Please define what the authors mean by “compatible-domain transfer learning”.

Pp 24, lines 38-40. Please provide an actual numerical example that shows how model performance is improved by using this “compatible-domain transfer learning”.

Pp 24, lines 40-45. I do not understand what the authors are referring to in the sentence where they are describing that “Various approaches have been explored…”. These approaches have been explored to do what? Please clarify.

Discussion:

Pp 25, lines 26-28. The authors cite a number of commercially available AI-based software specifically designed for breast cancer diagnosis. They follow that by saying that “These algorithms improve pathologists’ consistency, precision, and sensitivity while decreasing time demands”, and cite as reference for this statement a paper by van Diest et al (2024). I read that paper, and nowhere in it the authors evaluate any commercially available AI-based algorithms for breast cancer diagnosis. Thus, the connection made by the authors between commercially available AI software and their results when used in the clinical practice is very misleading, as it is not supported by the reference cited by the authors.

Pp 25, lines 38-40. Where in this paper did the authors “assess the efficacy of each AI model while also emphasizing potential limitations and downsides associated with each model”??? On the contrary, the authors presented the AI-based models as if they were perfect and were already working in the clinical practice, all while citing papers that only talked about the “potential” of AI one day being used in the clinical practice. Please clarify where in the paper those assessments were carried out, and also were the limitations of each model were described.

Pp 26, lines 5-8. I do not understand what the authors mean when they say “These findings correspond with literature research that has also documented exceptional performance”. What are they talking about? Can they please clarify? Also, include references for the “literature research” cited?

Pp 26, lines 20-24. Please clarify how do the authors know that the excluded findings from your analysis “demonstrate that AI has continuously proven to be a dependable instrument for breast specimen classification”.

Pp 27, line 14. Please include a reference after the “… with Her2 status”.

Pp 27, line 43. Please clarify what you mean by “AI pathologist help”.

Pp 27, line 54. The authors say that “the concordance rates for ER and PR status improved, albeit to a lower degree”. Please provide actual numbers to make this a bit more concrete for the readers.

Pp 28, lines 25-28. What do the authors mean when they say that AI models have a “dependence on binary categorization during training”? Please clarify.

Pp 29, line 3. Instead of “ought”, please use “may”.

Pp 29, lines 17-19. Instead of “… in different laboratories”, please use “… in the clinical practice”.

Conclusions and Future Directions:

Pp 29, line 31. The authors state that AI tools are “rapidly becoming an effective aid to histopathologists”. However, how can this be true, considering that most AI-models are not currently used in the clinical practice for the variety of reasons explained in the papers cited by the authors (of which some were cited in the paragraph above this)? Please clarify.

Pp 29, lines 33-36. The authors state that “The identified drawbacks can be effectively addressed”. Do they have any understanding of the magnitude of the challenges to address the difficulties in implementing AI-based models in WSI? For example, (1) How does one standardize the staining protocols used by the different labs around the world? (2) How does one increase the number of samples of rare diseases, in order to train the models appropriately? (3) How does one increase representation from minority populations in the training sets used to train the AI models? And so forth. These are not trivial challenges. Please don’t treat them as if they are!

Pp 29, line 38. Instead of “augment”, please use “improve”.

Pp 29, line 44. The “Establishment of Standardized Datasets” is not that useful if the datasets are not freely available to all researchers to train, validate and test their models. Please include “freely available” as a condition sine-qua-non on the “Establishment of Standardized Datasets”.

Pp 30, lines 6-8. When the authors say that AI models should incorporate “additional diagnostic modalities”, and then they cite as an example “imaging techniques”, what do they mean by that? Do they mean the patients scans, like computed tomography or magnetic resonance imaging? Please clarify.

Pp 30, lines 16-18. What are the biases that are “inherent in data and algorithms”??? Algorithms by themselves will only become biased depending on the distribution of the data that is used to train them, so they don’t have any “inherent biases”. Please clarify.

Pp 30, line 24. Instead of “Improvement of Economic…”, it should read “Improvement in Economic…”

Pp 30, lines 31-33. This last sentence should not be attached to point 4, it should be presented by itself below point 4, as it relates to all 4 points presented above it.

Review: Artificial intelligence in breast cancer diagnosis: A systematic literature review — R0/PR3

Conflict of interest statement

Reviewer declares none.

Comments

The application of AI in the diagnosis of breast cancer is a highly relevant and widely discussed topic in medical clinics and medical research.

The authors have compiled a comprehensive systematic literature review on this subject, providing a wealth of information. However, in its current form, the manuscript is overly lengthy and includes several basic concepts that are already well-established in the field. I recommend that the authors significantly condense the manuscript by streamlining the content—particularly in the Introduction and Discussion sections—by removing foundational information that does not add novel value.

Additionally, the Results and Findings sections should be summarized and made more concise to improve readability and focus. The manuscript would also benefit from a thorough review for grammatical errors, inconsistencies in sentence structure, and incorrect or undefined abbreviations. A more structured and succinct presentation will greatly enhance the impact and clarity of the paper.

Recommendation: Artificial intelligence in breast cancer diagnosis: A systematic literature review — R0/PR4

Comments

The manuscript is timely and relevant. However, as noted by reviewers there is some degree of over optimism, and lack of distinction between potential and actual deployment of AI pathology into clinical practice.

We would welcome review of the feedback provided and revisions to enhance the paper.

Decision: Artificial intelligence in breast cancer diagnosis: A systematic literature review — R0/PR5

Comments

No accompanying comment.

Author comment: Artificial intelligence in breast cancer diagnosis: A systematic literature review — R1/PR6

Comments

No accompanying comment.

Review: Artificial intelligence in breast cancer diagnosis: A systematic literature review — R1/PR7

Conflict of interest statement

Reviewer declares none.

Comments

Overall Impression:

I would like to commend the authors for taking into account so many of the comments/suggestions made in the previous review. I believe that the paper is significantly clearer now, and many of the grandiose (or over-optimistic) statements have been removed – although a few have stayed, as I point out in my review below. I think that the authors have done a good job presenting the different AI systems that they discuss, but there were some major points that left me a bit confused. First, the authors say that “3113 studies were potentially found, and after applying the inclusion criteria and filtering out the duplicates, 1194 unique studies were selected. From these, 1516 studies were excluded” due to a variety of factors. How can 1516 studies be excluded from 1194 unique studies? Second, and in my view a very important point, there is no presentation in the Results section of the Possible Obstacles for Implementation of AI in the Clinical Practice that may have been discussed in the studies reviewed. Instead, we get a sub-section in the Discussion that is titled “Other obstacles to widespread adoption of AI”. This is confusing, as there has not been any discussion on obstacles to widespread adoption of AI thus far. More detailed comments about each section follow below.

Abstract:

Pp 2, lines 19-22. It should read “… particularly those that have been difficult to discern through routine microscopy”.

Impact Statement:

Pp 3, line 22. Instead of “breast cancer medication”, it should read “breast cancer treatment”.

Pp 3, lines 22-24. I do not understand what the authors mean when they say “This is due to its facilitation of consensus and consistency among many observers regarding their findings”. How can AI facilitate “consensus and consistency” among a group of pathologists, for example? First, they would really have to trust the AI system to align their diagnoses with the AI’s diagnoses, and as we all know, trust in AI has been an issue in several medical disciplines. Secondly, how can AI affect the pathologists’ intra-observer agreement, so as to improve their “consistency”? Please clarify.

Pp 3, lines 24-26. Furthermore, the authors claim that “Artificial intelligence is essential for assessing breast cancer and quantifying mitotic cells”. Who has determined that AI is “essential” for these tasks? Pathologists have been doing these tasks for a long time, with no AI, and achieving good results, so I do not see how one can claim AI to be “essential” in this task. Please tone down your enthusiasm.

Pp 3, line 29. Which are the “previous methods” that the authors are referring to? Please be specific regarding what you are talking about.

Introduction

History of Artificial Intelligence

Pp 4, lines 14-40. I still don’t see what is the purpose of this sub-section, as this paper is not about AI, but about AI’s application in breast cancer diagnoses.

Concepts in Artificial Intelligence

Pp 4, line 46. Typo: It should read “AI applications in Medicine have evolved…”.

Machine Learning (ML)

Pp 5, lines 5-30. Fine.

Deep Learning (DL)

Pp 5, line 57. It should read “Deep Learning is a subset of Machine Learning…”.

Artificial Intelligence in Medicine

Pp 6, lines 27-43. Fine.

Artificial Intelligence in Cancer Diagnosis

Pp 6, line 51. I believe that it is too premature to say that “Artificial intelligence is significantly advancing cancer diagnosis” when in fact most AI models are not deployed in the clinic yet. In this way, I believe that saying that “AI has the potential to significantly advance cancer diagnosis” is a better representation of the current state of events.

Pp 7, lines 17-20. Again, it seems premature to say “AI technology is improving the accuracy of clinical image analysis for identifying cancer progression, aiding in early detection and diagnosis…”. I think that because a technology has the potential to make improvements to a certain task, it does not mean that it is making improvements to that task. I would suggest that the authors be just a bit more careful when making statements like these.

Artificial Intelligence in Breast Cancer Pathology

Pp 8, lines 38-40. When the authors say that the size of the histopathology images are “100k x 100k”, which unit of measure are they referring to? Pixels? Please clarify.

Pp 9, lines 17-38. The authors present a number of AI models that have been developed to perform at different tasks, and their performances. What is missing from this presentation is a statement indicating that all of these algorithms were developed and evaluated in a laboratory setting, and that none of them has actually been deployed in the clinical practice. I think that this important point is not explicitly stated, and readers may get the impression that all of these algorithms are being used in the clinic.

Pp 10, lines 6-8. Can the authors please explain what they mean when they say “significantly reducing the time required for pathologists to read slides tumors”. What are “slides tumors”?

Methodology

Fine as written.

Results

Literature Search and Screening

Pp 13, lines 27-32. The authors say that, from the 3113 studies potentially found, after applying the inclusion criteria and filtering out the duplicates, 1194 unique studies were selected. From these, 1516 studies were excluded. I am confused. How can one exclude 1516 studies out of a group of 1194 studies? Please clarify.

Characteristics of the Included Studies

Pp 14, line 41. Instead of “histopathology pictures”, please use “histopathology images”.

Pp 17-24. It’s a bit overwhelming to have all 5 Tables being presented to the reader one after the other, without any context. I would suggest presenting each table as it is called in the text, so as to give the reader a reference for what is displayed in the Table.

Summary of Findings

Mitotic Index Assessment and Quantification

Pp 26, lines 10 and 24. I would not call a model developed in 2018 a “A recently developed deep learning model”. Please correct.

Discussion

Pp 28, line 35. When the authors say that “Commercially available AI models specifically designed for breast cancer exist”, do they mean for breast cancer “diagnosis”, for breast cancer “detection”? Please clarify.

Pp 28, lines 49-51. The authors state that “Nonetheless, various limitations persist in obstructing its extensive implementation on a larger scale”. However, this was never discussed in the Results section, even though it is an important piece of information to understand why, despite all of AI’s prowess (as highlighted by the authors) it has not reached large distribution in the clinical practice. Can the authors please include a section on the “obstacles that currently prevent AI’s extensive implementation on a larger scale” in their Results section?

Pp 30, line 53. I don’t believe that the authors should state that “Similar to our findings about AI’s performance in tissue classification…”. The issue here is that there was no experiment conducted in this study, and as such, there were no “findings” per se. It would be best if instead the authors used “Similar to the reported findings about AI’s performance in tissue classification…”

Pp 30-31, lines 55 and 3. Similarly, it is not appropriate for the authors to say “Our data indicate that AI excelled in both molecular subtyping and Ki67 computation”. The authors didn’t run any experiments that produced any “data” to indicate that. Instead, it would be best to say “The data from the studies reported in this review indicate that AI excelled…”

Pp 31, lines 17-22. It is a circular argument to state that “Artificial Intelligence enhanced inter-pathologist agreement rates […] when aided by a pre-trained AI assistant tool”. Please remove the end of the sentence, as it is already clear from the start of the sentence what the authors mean to say.

Pp 31, line 43. The authors name a sub-section “Other obstacles to widespread adoption of AI”. Given that they have discussed no obstacles thus far, why use the word “Other” in this sub-section heading?

Pp 31-32, lines 52-6. I do not agree with the statement made by the authors that “A primary procedural drawback of most contemporary models is their need on extensive, annotated datasets for training. Manual annotation is labor-intensive and exhibits variability both among and across pathologists, undermining the fundamental objective of the AI models”. Only supervised (or semi-supervised) learning algorithms require “extensive, annotated datasets for training”. Unsupervised learning algorithms do not require any labelled data for training. As for stating the supervised algorithms comprise “most contemporary models”, I am not sure about that. I think the authors should rethink this statement, or at the very least provide references to support what they are saying.

Pp 32, line 10. The authors say that “… impact the efficacy of AI models in rare tumour categories”. But unless one is very naïve or unfamiliar with AI models, in reality nobody would expect AI to do well in rare tumor categories, exactly for the reason that the authors mention, namely, lack of training samples. So while this is certainly a barrier, I do not believe it to be one of the primary barriers for AI’s clinical adoption.

Pp 32, lines 19-21. The authors say that “Consequently, each AI tool necessitates validation and verification under these specific conditions”. However, are they aware that, at least in the US, once an AI model receives FDA certification, the model is locked, that is, it cannot learn new things? In this way, once an FDA certified model is acquired by a healthcare system, there is no way to adapt the model to the characteristics of the local population, and if that population is significantly different from the population on which the model was trained, the model is not going to perform well, and if the vendor wants to incorporate data from this new population into the training of the algorithm they will have to re-certify the AI model? This is also a significant barrier for adoption of AI in the clinical practice.

Conclusions and Future Directions

Pp 33, line 17. The authors start by saying that “Various AI tools can…” I think that it is too early to say that they “can”, given all of the obstacles described by the authors. I believe that a better word to use here would be “may”.

Recommendation: Artificial intelligence in breast cancer diagnosis: A systematic literature review — R1/PR8

Comments

I thank the authors for their effort in revising the work. I agree with the suggestions of the reviewer and they will greatly improve the manuscript. Hence I recommend a simple minor revision.

Decision: Artificial intelligence in breast cancer diagnosis: A systematic literature review — R1/PR9

Comments

No accompanying comment.

Author comment: Artificial intelligence in breast cancer diagnosis: A systematic literature review — R2/PR10

Comments

No accompanying comment.

Recommendation: Artificial intelligence in breast cancer diagnosis: A systematic literature review — R2/PR11

Comments

Dear Authors,

thank you for revising the manuscript. The paper has improved, and I recommend that it be accepted.

Decision: Artificial intelligence in breast cancer diagnosis: A systematic literature review — R2/PR12

Comments

No accompanying comment.