R programming language gains steam in pathology labs
April 2020—Among laboratories focused on expanding data analytics, the statistical programming language R has a loyal user base that is steadily growing. “There is a crew of us that are really trying to show the utility of R for laboratories,” says Stephen Master, MD, PhD, chief of the Division of Laboratory Medicine and director of the Michael Palmieri Laboratory for Metabolic and Advanced Diagnostics at Children’s Hospital of Philadelphia.
Surveys suggest that the free open-source statistical programming language R is less widely used than the open-source statistical language Python. In 2019, 39.4 percent of software developers listed Python as the most popular language, compared with 5.6 percent for R, according to the 2019 developer survey posted by Stack Overflow, an open online community for coders. However, broad surveys might not tell the whole story because Python is used more widely outside of statistics, according to R users interviewed by CAP TODAY.
“Python really started as more of a general-purpose programming language and had packages bolted onto it that allow it to do statistical analysis,” explains Dr. Master. “R has that kind of statistical data analysis and data manipulation built in from the ground level, but it has taken perhaps a little bit longer to acquire some of the general programming capabilities that Python has, although now they are meeting in the middle.”
In laboratories, R is “directly relevant to questions we are trying to answer,” says Patrick Mathias, MD, PhD, associate medical director of laboratory medicine informatics and assistant professor of laboratory medicine at the University of Washington School of Medicine, Seattle. “[R] became this software that was used more widely in the biostatistics community, and that’s where it established its foothold.”
CAP TODAY spoke with Drs. Master and Mathias and with Janet Simons, MD, about some of the ways they are using R in their medical institutions. All three gave presentations on the R programming language at the 2019 AACC annual meeting. Here is what they told CAP TODAY.
[dropcap]A[/dropcap]bout two years ago, Dr. Simons, a medical biochemist at St. Paul’s Hospital, Vancouver, British Columbia, began studying repeat daily blood work orders in her institution. “I was able to write a script in R to pull data on patients that had the same blood work orders every day for many, many days to see whether or not those values were changing significantly or whether patients were pretty stable and did not need that monitoring,” Dr. Simons says.

The data showed daily blood work was repeated even when values were not changing, resulting in unnecessary charges and blood draws that cause patients discomfort. Consequently, the hospital implemented a rule limiting blood work orders to three days, after which they would have to be reordered. The new policy also reduced the number of morning blood collections, allowing the lab to finish its morning blood work earlier.
Dr. Simons began using R when she joined the hospital two-and-a-half years ago, in part because colleague Dan Holmes, MD, division head of clinical chemistry at St. Paul’s Hospital, was already using it extensively. Educating herself through online tutorials, she was able to write R scripts to address lab-related questions within two months.
Since then, Dr. Simons has also used R to rebuild the laboratory’s monthly key performance indicator report, which Dr. Holmes had transitioned to R from Microsoft Excel. Dr. Simons expanded the report, which included charts of turnaround times for three types of tests performed by the laboratory, to include graphs on broader lab issues, such as stat order turnaround times, times of day when morning collection rounds occur, and order volumes.
The new reports are also far easier to produce, Dr. Simons notes. When using Excel, data had to be manually sorted and organized in each graph within the report. Using R, a “knit” function runs the code to generate the appropriate graphs and makes required calculations. “I download the next month of data and I am able to generate the report pretty much automatically, rather than having technologists or administrative assistants spend hours compiling it,” she says.
Next, Dr. Simons plans to apply R to physician report cards, which will allow physicians to compare their test-ordering patterns to those of their peers.
[dropcap]A[/dropcap]t the University of Washington School of Medicine, R capabilities are often used to answer operational questions related to the lab. “Some of the operational questions that we answer,” says Dr. Mathias, “are, What is the turnaround time for samples flowing through our blood draw area by hour of day and day of week? What impact does staffing have on that turnaround time? What are the patient volumes at different times of day to adjust staffing to match workload?”

About two years ago, UW Medicine began teaching the R language to its pathology residents during their training in the Department of Laboratory Medicine. While the institution’s data scientists work with both R and Python, the department decided to teach R because it is easy to learn and compatible with the type of work conducted in the lab, Dr. Mathias explains. “All of our residents are doing a method validation project where they have to validate a laboratory method before it goes into service, and part of that project includes performing statistical analysis. We are giving them the tools in R to be able to do that type of work more efficiently.” The residents had previously used Excel for method validation and often spent hours working with the lab’s medical technologists to build the method validation models, Dr. Mathias says.
Because R is user friendly, Dr. Mathias and his team are also using the language to teach residents basic data science concepts. In using R, the residents are also becoming more efficient at asking analysts for quality improvement data, he says. “The residents know what data to ask for up front because they understand how they are going to analyze it downstream. There is less back and forth.”
R instruction was recently rolled out to faculty members, and the lab plans to expand training to frontline laboratory staff, including managers, lab technicians, or anyone else in the lab interested in data analysis. “The person best positioned to make dashboards is the person at the front line who sees the issues on a day-to-day basis and wants to improve things,” says Dr. Mathias. “By pushing out that R education, we are trying to establish a self-service model for retrieving and analyzing data.”
[dropcap]A[/dropcap]t Children’s Hospital of Philadelphia, an R user group in which doctors and other medical personnel can share ideas has 250 members across multiple disciplines of medicine, Dr. Master says. “It shows the extent to which R has grown and the breadth of tasks it is being used for.”

Dr. Master, who has worked with R for more than 15 years, used the language for predictive analytics in several medical research studies involving machine learning. An earlier machine learning project written in R and published in the American Journal of Hematology (Raess PW, et al. 2014;89[4]:369–374), for which he was a coauthor, used random forests, a machine learning algorithm involving multiple decision trees, to predict whether a patient had myelodysplastic syndrome based on data from a hematology analyzer.
“One of the advantages of R,” says Dr. Master, “is that for almost any statistical or machine learning technique, someone has written a package for it already. For example, there was a very nice random forest package already built, so I didn’t have to write the software from scratch.”
Dr. Master has also regularly used R for equipment validation, noting that it summarizes and graphs validation data very rapidly. When he worked at Weill Cornell Medicine, he recalls, laboratory decision-makers asked a vendor whose equipment they were evaluating to graph large amounts of validation data in Levey-Jennings plots, so they could visualize performance precision over time, and bring it to their next meeting. Dr. Master asked for a copy of the raw data so he could study it as well. When the vendor representatives returned to the lab, they were unable to provide the requested analysis. However, Dr. Master had plotted it using R.
“It was a great example,” he says, “of how putting analytics in the hands of the laboratorians, rather than having to rely on someone else for the analysis, was tremendously useful to us in understanding the instruments and evaluating their suitability for use in our lab.”—Renee Caruthers
Digital Pathology Association introduces WSI resource
The international medical nonprofit Digital Pathology Association has created an online whole slide image educational resource for its members, called the Digital Anatomic Pathology Academy.
“The online resource was created to aid the international medical community in becoming adept with digital slides through practice and education even if they have no current access to a scanner,” according to a press release from the DPA.
The cloud-based platform, supported by PathPresenter, provides annotated digital slides with diagnoses and relevant information about morphology and ancillary testing. “One aim of this whole slide imaging education resource is to better familiarize pathologists with digital sign-out and practice in the digital health environment, which will improve patient care around the world,” said Marilyn M. Bui, MD, PhD, DPA immediate past-president, in the press release.
The platform, which is continuously updated, is available to all dues-paying members and to pathology residents, fellows, PhD students, and medical students who register for a free DPA membership.
Agilent and Visiopharm enter digital pathology partnership
Agilent Technologies and Visiopharm have entered an agreement to comarket Visiopharm’s portfolio of artificial intelligence-driven pathology solutions.
“The companies’ shared goal is to provide specific technologies, products, and services that will improve the standardization of pathology labs and accelerate accurate diagnoses,” according to a joint press release from the vendors.
“Agilent brings to the venture a portfolio of pathology staining management solutions while Visiopharm provides digital interpretation solutions. With [Agilent’s] complementary product portfolio and longstanding legacy of innovation and quality in this field, we see a very strong match,” said Michael Grunkin, CEO of Visiopharm, in the press statement.
Agilent Technologies, 800-227-9770
Proscia and UCSF using AI to address prostate cancer
The artificial intelligence-enabled digital pathology solutions provider Proscia and the University of California, San Francisco have entered a partnership focused on AI applications.
Under the partnership, the volumes of diverse, high-quality digitized data amassed by UCSF will be used to ensure that Proscia’s computational pathology application for prostate cancer accounts for the variability that exists across a range of diagnoses, methods of biopsy and tissue preparation, tissue staining procedures, and digital scanning processes.
Proscia and UCSF also plan to extend the collaboration to validating the clinical efficacy of computational pathology applications for other pathology subspecialties, according to a press release from Proscia.
Proscia, 877-255-1341
NovoPath client first to submit 2019 MIPS data to CAP registry
NovoPath has announced that Rahway Pathology, which uses the NovoPath laboratory information system, has become the first practice in the United States to upload 2019 Merit-based Incentive Payment System data to the CAP Pathologists Quality Registry. The registry will transmit this data to the Centers for Medicare and Medicaid Services via the CMS submission portal.
“NovoPath captures reportable MIPS data for quality measures and automates the process of tracking cases eligible for MIPS reporting,” said Hina Kharbey, vice president of business strategy for NovoPath, in a company press release. “In addition, NovoPath clients can interface with a registry of their choice for reporting.”
NovoPath, 732-329-3209
Dr. Aller practices clinical informatics in Southern California. He can be reached at [email protected].