Newsbytes

NLP program takes free-text searches from zero to 60

June 2023—The time it takes to read through numerous pathology reports to find nuggets of critical information buried within narrative sections of text is tantamount to the time it takes for carbon atoms to turn into diamonds—or so it may seem to those tasked with digging for medical information.

But at Mount Sinai Health System, New York, a natural language processing program can search for information in the free-text portions of multiple pathology reports instantaneously, according to Aryeh Stock, MD, instructor in the Department of Pathology, Molecular, and Cell-based Medicine, Icahn School of Medicine at Mount Sinai.

Dr. Stock estimates that reading a batch of 1,000 pathology reports in search of specific diagnostic information would take him approximately 22.9 40-hour workweeks to complete. Using the natural language processing program that he wrote, he can complete that search in four seconds, he says.

While artificial intelligence-based chatbots like ChatGPT have brought increased attention to natural language processing, creating a pathology-based NLP program presents unique challenges, Dr. Stock says. “In pathology, the vocabulary is much more limited,” he explains. “The order of words and the syntax become much more important, and this presents a significant challenge when you are trying to train an NLP program to focus on these nuances.”

Dr. Stock

Dr. Stock began to consider the lengthy, labor-intensive process of searching for information within the text of pathology reports when he was in his last year of medical school. While assisting a pathology group at Mount Sinai that was researching Crohn’s disease and inflammatory bowel disease, he was tasked with categorizing foci of inflammation by their locations within the bowel. The work involved interpreting prior biopsy reports and entering information into a spreadsheet. The repetitive nature of the work and amount of time it consumed inspired Dr. Stock to build an automated solution. A spreadsheet program he created in 2018 for the research project became the first prototype for Mount Sinai’s pathology natural language processing program.

Dr. Stock later rewrote the spreadsheet program in Python code, which enhanced its flexibility, functionality, and scalability and enabled the program to process an entire database of pathology cases at once rather than one file at a time, he says. The updated program searches plain text using comma-separated value files from the laboratory information system for input. Because the program works with CSV files, which are widely used by LISs, it can easily be used with a variety of vendors’ systems. A separate version of the program, which is also written in Python code, uses XML (extensible markup language) files as input, taking advantage of the fact that Mount Sinai’s LIS can output data in XML format. (The XML version was written by Hansen Lam, MD, who was in residency with Dr. Stock and is now a cytopathology fellow at Johns Hopkins University School of Medicine.) It has a more user-friendly interface because XML contains greater context on how to format data—but it cannot function with an LIS that does not use XML files.

Both versions of the program have saved time in pathology research projects by automating searches for information that would be nearly impossible to find in a regular LIS query, according to Alexandros D. Polydorides, MD, PhD, professor and vice chair for clinical research and trial design in the Department of Pathology, Molecular, and Cell-based Medicine, Icahn School of Medicine at Mount Sinai. A key advantage is the clear, well-organized data output that they deliver, he says. “The output of this is an Excel document with one row per specimen, or one row per patient, that can be very easily sorted, mined, and applied to any project.”

Often, even with more basic LIS queries, the data output is a spreadsheet in which data pertaining to one specimen may be spread over multiple columns or rows. It requires a lot of effort to delete and move information so that it is readable and easy to organize for research purposes, Dr. Polydorides notes.

While the two programs are similar, Dr. Lam’s XML version is open source, with the code published on GitHub (github.com/hansenlam/public_PathReporter) and details about that version published in the Journal of Pathology Informatics (Lam H, et al. Published online Nov. 8, 2022. http://dx.doi.org/10.1016/j.jpi.2022.100154). Dr. Stock’s CSV-file version of the program is proprietary intellectual property of Mount Sinai Health System.

The underlying principles of both versions of the NLP program are the same, Dr. Stock says. And both take into account that the text portion of a pathology report is a compilation of observations by multiple people. For example, if a biopsy yields five specimens, there are typically multiple comments about each of those specimens interspersed through the report. Therefore, the NLP application uses a specialized module for the Python programming language, called regular expressions, which allows users to specify rules for searching strings of text. Using regular expressions, the program identifies information pertaining to specific specimens in the text of the report and reorganizes it so data on each specimen are on a separate row of the spreadsheet. The algorithm scores words it finds in the report based on how well they match items in a library of pathology words built into the program, he says. Through this process, the program can, for example, identify which words pertain to diagnosis and which address location.

This pathology-centric approach makes the Mount Sinai NLP programs unique, according to Dr. Lam. Medicine-specific NLP programs typically have “limited pathology-related dictionaries,” and pathology NLP programs are “tailored for narrow sets of cases,” he wrote in the Journal of Pathology Informatics article.

While the program created at Mount Sinai has so far only been used for research, when Dr. Lam decided to publish a paper on the XML version, Dr. Polydorides helped him test it on projects that could demonstrate the program’s clinical usefulness in pathology. “The three things we tested it on were things that pathologists might be interested in, like Gleason scoring in prostatic adenocarcinoma, grading of anal intraepithelial neoplasia, and grade or location of dysplasia in IBD,” Dr. Polydorides says. “These are some of the many pathology-centered projects for which the program might be useful.”

The results of those pathology projects show the program has a 90 to 100 percent concordance with manually reading pathology reports to retrieve information, while saving significant amounts of time, according to the journal article. In most cases, the automated program was more accurate than the manual method. In one project, which identified dysplasia among 72 anal biopsy specimens and then determined the grade of the dysplasia in those specimens, there were just seven discordant results. But in six of the seven cases, the automated system made the correct determination.

“The upshot is that research studies that previously required months of chart review and data assembly can be done in a fraction of the time and with significantly less manpower,” Dr. Stock emphasizes.

Dr. Polydorides

Dr. Polydorides suggests that the NLP program could be adapted to automate some quality control and quality assurance processes. A standard quality control process in cytology, for example, involves checking whether there is a surgical specimen for each cytology specimen and, if so, comparing whether the cytology and surgical specimen diagnoses agree. “Rather than have someone go through all cases and see which ones have biopsies, this program might help automate that process,” he says.

The program is flexible enough that it should be relatively easy to adapt to clinical use, Dr. Stock adds. “The way the code is structured, you can essentially plug it into other Python code. So if you wanted to incorporate parsing into some other project, it could be brought in to do that sorting, and then you could move on with that data for whatever specific purpose you are trying to address,” he explains.

Dr. Stock plans to expand the scope of the CSV version of the program by writing code that would enable the program to identify other information, such as measurements of pathology specimens, he says. “You could imagine very easily that we could add another column that tells us the dimensions of specimen A and the dimensions of specimen B, which makes that information more accessible than reading reports to find it.”

In the meantime, Dr. Stock is addressing the common problem of understanding “incorrect” inputs, such as unusual terms, alternative spellings, or semicolons instead of commas, that may prevent the program from returning results for that file.

“The hope is to create something with more flexibility to handle those sorts of deviations from the expected pattern,” Dr. Stock says. “This results in a more robust system that lets you process reports that may have some human error baked into them.”

—Renee Caruthers

U.S. labs go live with Labgnostic laboratory exchange network

Labgnostic, a U.S. subsidiary of United Kingdom-based X-Lab, has announced that TriCore clinical laboratory, Albuquerque, NM, and renal pathology-focused Arkana Laboratories, Little Rock, Ark., are the first two U.S. laboratories to use its Labgnostic systems-agnostic laboratory exchange hub.

Laboratories linked to the Labgnostic hub can send diagnostic test requests and results to others participating in the network via a single interface and track specimens exchanged with participating laboratories.

“Labgnostic allowed us to interface with one of our referral labs, Arkana,” said TriCore’s chief operating officer, Eric Carbonneau, in a company press release. “We’ll leverage the single interface to the Labgnostic network to connect to more of our vendors. This saves us the hidden overhead costs of supporting and licensing multiple interfaces.”

TriCore plans to interface to additional partner laboratories via the Labgnostic hub this year, according to Labgnostic.

Survey spotlights medical courier delays and errors affecting pathology labs

Medical courier delays and errors frequently affect laboratorians’ ability to provide timely and accurate test results to patients and underscore the need for reliable health care logistics solutions, according to a survey sponsored by MedSpeed and conducted by CAP TODAY magazine.

Eighty-six percent of 269 laboratory professionals—representing numerous job titles within the laboratory—who responded to the survey reported that medical courier delays or errors affect their ability to provide patients with results at least once a month. All participants reported that medical couriers influence their work on a weekly basis. More specifically, 61 percent of laboratory supervisors, managers, and directors who responded to the survey reported that couriers lost irreplaceable specimens within the 12 months prior to the survey, which was conducted from late February to late March. Eighteen percent of that respondent subset indicated that loss of irreplaceable specimens occurred five or more times in the year-long timeframe.

When lab professionals were asked how frequently they needed to collect another specimen or sample due to courier error, 72 percent said they had to do so within a year of answering the survey. Among all respondents, 18 percent said they had to collect another specimen once within that timeframe, 26 percent said two to four times, 28 percent said five or more times, and 28 percent said never.

“The survey findings underscore the financial impact that logistics has on care delivery, particularly when considering the cost in delays to procedures,” says Jake Crampton, CEO of the health care logistics service provider MedSpeed.

The survey also found that 42 percent of lab professionals have requested stat pickups or deliveries to supplement untrustworthy scheduled service; 32 percent have transported or shipped specimens themselves; and 48 percent have stayed past their work shift to wait for a courier.

“MedSpeed overcomes courier delivery issues and errors by using scanners equipped with its custom tracking application,” says Crampton. “The application has workflows that guide our team through their day, prompting them to drop the specimens at the correct destination. Laboratorians can use our online portal, MyMedSpeed, to track specimens, ensuring a proper chain of custody.”

MedSpeed, 866-901-4201

Epic and Microsoft expand strategic collaboration

Microsoft and Epic have expanded their long-standing relationship focused on developing generative artificial intelligence and integrating it into health care by striking a deal that combines Microsoft’s Azure OpenAI Service and Epic’s EHR.

“This co-innovation is focused on delivering a comprehensive array of generative AI-powered solutions integrated with Epic’s EHR to increase productivity, enhance patient care, and improve the financial integrity of health systems globally,” according to a joint press release from the companies.

The collaboration will also extend natural language queries and interactive data analysis to Epic’s SlicerDicer self-service reporting tool for data exploration.

“Our exploration of OpenAI’s GPT-4 [large multimodal language model] has shown the potential to increase the power and accessibility of self-service reporting through SlicerDicer,” said Seth Hain, senior vice president of research and development at Epic, in the press statement.

Epic, 608-271-9000

Dr. Aller practices clinical informatics in Southern California. He can be reached at raller@usc.edu. Dennis Winsten is founder of Dennis Winsten & Associates, Healthcare Systems Consultants. He can be reached at dennis.winsten@gmail.com.