Home >> ALL ISSUES >> 2023 Issues >> Close-up on AI-driven assistive tools in pathology

Close-up on AI-driven assistive tools in pathology

image_pdfCreate PDF
Published last year was their report on their weakly supervised deep-learning approach for cardiac allograft rejection screening in H&E-stained WSIs, called cardiac rejection assessment neural estimator (CRANE) (Lipkova J, et al. Nat Med. 2022;28[3]:575–582). The model architecture is similar to the earlier model and adapted to the inter- and intraobserver interpretability problem in endomyocardial biopsy.

“We used about 1,300 cases from the Brigham and Women’s Hospital to develop the model,” Dr. Mahmood said, and adapted it to internal cases and to cases sent to them from hospitals in Turkey and Switzerland.

“We used different data collection mechanisms by deliberately using different slide scanners,” he said, and the staining protocols across the international cohorts differed as well. CRANE performed well on internal cohorts, but there was an expected drop in performance when adapting the model to the Turkish and Swiss data. He and coauthors write, “The model detects allograft rejection with an AUC of 0.962, assesses the cellular and antibody-mediated rejection type with AUCs of 0.958 and 0.874, respectively, detects Quilty-B lesions, benign mimics of rejection, with an AUC of 0.939, and differentiates between low- and high-grade rejections with an AUC of 0.833.”

This model, they write, “demonstrates the promise of AI integration into the diagnostic workflow,” though “optimal use of weakly-supervised models in clinical practice remains to be determined.”

In another study published in 2022, Dr. Mahmood and colleagues used weakly supervised multimodal deep learning to examine pathology WSIs and molecular profile data from 14 cancer types (Chen RJ, et al. Cancer Cell. 2022;40[8]:865–878). Their algorithm was able to “fuse these heterogeneous modalities to predict outcomes and discover prognostic features that correlate with poor and favorable outcomes,” the authors write. WSIs can be used to solve patient ranking problems, Dr. Mahmood said. “In this particular case we use overall survival.” Computational models developed using WSIs reported on in their earlier studies used just histology and WSIs to go directly to survival and other outcome predictions. “In this case, we’re integrating molecular information as well” in a limited setting using WSIs from The Cancer Genome Atlas.

“So this could result in better prognostic models, but perhaps the more interesting aspect here is that we can go in and look at what was important in the morphologic profile, what was important in the molecular profile, and how these things shift when additional modalities are included,” he said.

He and his coauthors found they can separate high-risk versus low-risk distinctions in 10 of the 14 cancer types they studied. The more interesting result, he said, is the analysis showing which cancer types would benefit from algorithms built using only WSIs, which cancer types would benefit from using molecular information alone, and for which ones it would be beneficial to include histology and molecular information in making prognostic determinations.

WSIs on average accounted for 16.8 percent of input attributions in multimodal fusion for all cancer types, which the authors say “suggests that molecular features drive most of the risk prediction” in multimodal fusion. However, for multimodal fusion models evaluated on uterine corpus endometrial carcinoma, “WSIs contributed to 55.1% of all input attributions,” the authors report. They also observed relatively larger average WSI contributions in head and neck squamous cell carcinoma, liver hepatocellular carcinoma, and stomach adenocarcinoma.

“But also on the disease level, we can quantify our architecture looking at the whole slide level heat maps and associate the two,” Dr. Mahmood said. Quantitative determinations are made for the features used in high-risk versus low-risk patients. For example, higher lymphocyte numbers are seen in low-risk patients, and lower lymphocyte numbers in high-risk patients, he said. “It’s a similar analysis in all 14 cancer types.”

More recently, they have incorporated radiology information into building their prognostic models and risk profiles. “In general,” he said, “we’re able to show that for a variety of disease models, we’re able to separate the patients very well into distinct groups and do better risk stratification if we use multiple modalities and data types.”

The group most recently reported on its self-supervised image search for histology algorithm, or SISH (Chen C, et al. Nat Biomed Eng. 2022;6[12]:1420–1434). Retrieval speeds of algorithms for searching similar WSIs often scale with repository size, which limits their clinical and research potential, Dr. Mahmood and coauthors write, and in this study they show that self-supervised deep learning “can be leveraged to search for and retrieve WSIs at speeds that are independent of repository size.” Image retrieval or retrieval of similar cases or just an image search is particularly important for rare diseases for which the number of available WSIs is often too low to train supervised deep-learning models, Dr. Mahmood noted. In developing the SISH model, “we tried to address a number of different issues with post-slide image retrieval.”

From each patch in the mosaic from a WSI, “we extract features using an encoder trained on self-supervised learning and another encoder trained on conventional images,” he said. “And the self-supervised encoder in this case was trained with a discrete latent code,” though other self-supervised encoders could be used.

WSI retrievals are “notoriously memory-hungry, but we found that for this particular problem we were able to use a consumer-grade workstation,” Dr. Mahmood said.

The algorithm was tested on cases from Brigham and Women’s and Massachusetts General and the TCGA, and “in general, over very extensive testing across 22,000 cases, it worked quite well,” he said. Their analysis showed how the model scales with increasing amounts of data. The speed “stays almost constant in searching for similar cases, which could be quite important if this were to be used in a diagnostic setting.” His group is working to deploy SISH at MGH.

The authors write: “Our experiments demonstrate that SISH is an interpretable histology image search pipeline that achieves constant search speed after training with only slide-level labels. We also demonstrate that SISH has strong performance on large and diverse datasets, can generalize to independent cohorts as well as rare diseases and, finally, that it can be used as a search engine not just for WSIs, but also for image patch retrieval.”

While the computational pathology “story so far” seems to have solved a lot of problems, Dr. Mahmood said, challenges remain. The chief limitation is reduced communication between smaller patches once they are patched. “The models are often not very context-aware,” he said, which is “a major problem in computational pathology.” In addition, few samples are available.

In natural language processing, the context matters a lot. In whole slide images, each resolution level can be seen to convey a different story: “Cellular features lead to slide-level organization, phenotypes to slide-level diagnosis, essentially. And we wanted to see if we could use methods commonly used in natural language processing and build a newer architecture using self-supervised learning to cater to some of the context-aware issues,” Dr. Mahmood said.

“So that’s exactly what we did.” Dr. Mahmood’s group’s submission to the 2022 IEEE/CVF Computer Vision and Pattern Recognition Conference described how to use whole slide images and patch them, going to patch-level and cellular-level representations. The result is a hierarchical image pyramid transformer architecture with patch-level, region-level, and slide-level representations.

“We’ve shown this essentially leads to much better results,” Dr. Mahmood said. The model was trained on TCGA cases, “and now we’re trying to scale it to a much more generic, much larger data set, hoping to convert it to a model that can be used for a variety of downstream tasks.”

Amy Carpenter Aquino is CAP TODAY senior editor.

CAP TODAY
X