Home >> ALL ISSUES >> 2014 Issues >> Big gain theory—data warehousing pays off

Big gain theory—data warehousing pays off

image_pdfCreate PDF
Dr. Dolan with Debbie Smith (left), director of the Enterprise Data Warehouse, and Debbie McClain, director of operations process improvement. RML has averaged about a three percent productivity increase per year for 11 years (measured by billed units per FTE), some of which is due to growth and economies of scale but much of it, they believe, due to improving process.

Dr. Dolan with Debbie Smith (left), director of the Enterprise Data Warehouse, and Debbie McClain, director of operations process improvement. RML has averaged about a three percent productivity increase per year for 11 years (measured by billed units per FTE), some of which is due to growth and economies of scale but much of it, they believe, due to improving process.

The laboratory is not trying to generate more testing, he says. “What we’re trying to do is avoid testing that’s not adding value. So we want to make it a package deal. Instead of patients having to go to the physician office to get a test ordered, the doctor can enroll them in this program and the tests will be performed quarterly at the lab, at reasonable cost for what they’re getting.”

Building a data warehouse does cost money, Dr. Dolan says. “But once you have it set up, the upkeep isn’t too bad, then you see major savings by improving processes, eliminating bottlenecks, and so on, because the analytical tools are able to do that for you. We drive all this data into the ‘view from 40,000 feet,’ to let us see our overall processes and where something goes astray. Then, once the warehouse determines the problem and solution, the warehouse never forgets. It will measure the metrics forever.”

As far as I know, there is no data warehouse that has the volume and types of patients that we have with data going back to 1998,” says Dr. Johnson. “We constantly add to it, and our view has always been that anything that can be extracted and stored, we will add—even if it is seemingly mundane or unrelated data.” That includes when and where a lab test was drawn, who the technologist was, when it was run, whether there was a delay, what instrument it was run on, what shift, and even lot numbers of reagents. Patient data such as gender, age, location of residence, and type of insurance and ordering physician data such as the specialty of the physician, where the practice is located, and many other data points also are added. Part of the point of collecting any and all collectable data, he says, is being able to look for unique relationships—relationships that are beyond what is obvious or known to be significant.

Dr. Johnson’s interest in the Enterprise Data Warehouse began about 10 years ago with the parameters on two different CBC instruments. “We were noticing significant variations between the instruments on the same sample despite changing reagent lots and ruling out other causes, and that started my interest in looking retrospectively at CBC data and trying to make useful assumptions or inferences based on the data. This led to developing reference ranges that were stratified by not only the patient’s gender but also their age by decade.”

He followed with a study on the limitations of certain CBC parameters such as the MCV versus the more preferred MCH for determining red blood cell size when samples are stored over time or there is a delay in their measurement. More recently, he has focused on platelet reference ranges personalized for the second and third trimester of a woman’s pregnancy versus the first trimester.

Using the data warehouse and its large numbers of patients to derive a reference population helps make better reference ranges. “Part of the problem with reference ranges is if you get 50 to 100 people, you’re hoping you’ve screened them adequately so that they are ‘normal.’ But a lot of times, they may not have been forthcoming with their medical history or may have disease conditions that are unknown to them or that they don’t think are otherwise significant.”

“When you’re dealing with more than 2 million unique patients, my belief is that by using a sufficient number of filters, you can ‘wash out’ the effect of any outliers or abnormals, because you’ll have so many otherwise ‘normal’ patients. That allows you to make reasonable inferences on what would be a normal hemoglobin value for a male at the age of 35 versus somebody who is 35 and female.”

Since RML is not strictly a reference lab for a specialty hospital—it includes rural hospitals, single and group physician practices, and home health agencies among other groups—its patient population includes a much higher concentration of “normals.” “Most of our lab work is outside of the health system, often at physician offices. It’s more of a ‘real world’ group of patients, which prevents significant skewing of the data.”

As a result, “our data is not strongly biased by having a large population of cancer patients,” for example. While RML does have a significant number of cancer patients, “we can use filters to weed those out such as by looking at who the ordering physician is, their previous pathology results, ICD-9 codes, inpatient pharmacy data, and radiology, among other things.”

What does the data warehouse contain? Everything that can be collected. “We know the demographics of the physician and patient, their gender, their age, often their race, and pharmaceutical data for their inpatient care up to five years and radiologic and other medical history sometimes going back to 1998. Using those pieces, plus all the other laboratory parameters such as kidney function, liver function, iron studies, immunology studies, nutrition-related parameters, numbers of visits to a hospital, numbers of visits to a physician, types of physicians seen, etc., you can develop very sophisticated filters to derive a reference population for whatever it is you are trying to study.”

The longitudinal data are helpful in interpreting samples and in some ways preferable to standard reference ranges, he says. “If someone has lived at a hemoglobin value of 10.5 g/dL—even though that’s probably mildly low for some women and some males—but they’ve lived at that for 10 years, that tells you something meaningful. If they’ve dropped 4 grams of hemoglobin in a span of 48 hours to 10.5, that’s different. So the relative change in the individual patients as well as the tempo of the changes are often useful—not only with hemoglobin but with many other parameters, including kidney function, liver function, thyroid function, autoimmune disease, microbiology—you name it.”

A project that he has been working on over the past couple of years is Monoclonal Gammopathy of Renal Significance (MGRS). “There are subsets of patients who have monoclonal gammopathy of undetermined clinical significance [MGUS], and some of them progress to renal failure and some of them do not. Some progress to more severe degrees of renal failure, and some progress faster than others.”

Johnson_Lawrence

Dr. Johnson

The question he and others are studying is whether or not there is a useful cutoff in the involved monoclonal serum free light chain value that might enable a clinician to better determine whether a patient is at increased risk of not only progressing to renal failure but also progressing in a shorter amount of time. “We know that these patients often do worse. The idea was to try to determine if there is a significant toxic level of an involved monoclonal light chain that may be significant and flag the clinician.” Under current practices, there is no agreed cutoff for the level of significance of an involved monoclonal serum free light chain, and how to best manage these patients is an open question. “It may be the case that earlier intervention may preserve renal function and improve the outcome, but that remains to be seen,” Dr. Johnson says.

Whether a therapeutic intervention would be warranted is something that needs to be addressed. These patients are sometimes followed less stringently than patients with a known cancer diagnosis. “Currently doctors will use serum free light chain testing in MGUS patients in order to see which of them are at risk for progressing to symptomatic plasma cell myeloma, but it is difficult to know which of these patients may progress to renal failure and whether or not to intervene prior to the development of a frank malignancy in order to preserve renal function.”

With the scale of the Enterprise Data Warehouse, “We now potentially have subsets of patients with MGRS who are at greater risk of progressing rapidly to severe degrees of renal failure.” They can be studied more closely to see if there is justification for therapeutic intervention, Dr. Johnson says.

“Because the data in the data warehouse is not skewed toward patients with myeloma or renal disease because the data is not strictly a referral population, but includes more general types of populations, that gives us the ability to look at the prevalence of disease in a somewhat unselected population group, including patients who are going to their doctors for routine checkups.” Other studies, he says, have had more specific patient populations selected by disease state. By comparison, most people don’t go to a subspecialty center unless something serious is wrong with them.

The goal is to be able to predict that a patient will have a higher likelihood of having a disease or bad outcome before it happens. “If you can prevent the diseases or complications, or ameliorate them prior to any therapeutic interventions becoming worthless or too risky, then you may have improved people’s health overall.”

He hopes other researchers will take the cutoffs indicated by his study based on the retrospective data and start to look prospectively at patients. “Looking at such massive numbers has given us a good starting point, and a prospective study will be necessary in order to see how the patients evolve and whether some patients do better with intervention versus others who do not.”

People tend to take a database and look for known associations, Dr. Johnson says. “But what if I have 500 million data points and I start randomly looking for associations? That gives us abilities we wouldn’t otherwise have. With such large amounts of data, you can look for things that may be unknown—or at least not intuitively obvious.”

The large-scale data allowed Dr. Johnson and fellow researchers to stratify platelet counts not only by gender and decade of age but also by stage of pregnancy. “We know that second- and third-trimester pregnant women often have lower platelet counts than first-trimester women and women who are not pregnant. But the degree of magnitude was not well characterized, and labeling a pregnant patient as thrombocytopenic has significant implications.”

Developing a good cutoff was important, because patients who are labeled as thrombocytopenic could get a potentially costly and invasive workup that is unnecessary. But through use of the data warehouse, “we could tell the OB/GYN what is the probable correct reference range for the platelet counts for a patient’s particular stage of pregnancy so that they do not overcall thrombocytopenia.”

Similarly, the large database allowed revisions of Hgb reference ranges for men and women. “It behooves you not to label somebody as abnormal if you can avoid it. We actually ratcheted the reference range down for average hemoglobin in males over 50 years of age, which allowed us not to overcall anemia.”

Subcategorization of the data is also more useful with the large database. “We can look at very unique patient populations such as Native Americans or other ethnic groups and find differences. We know there are differences in absolute neutrophil counts in Africans, Afro-Americans, and Afro-Caribbeans, and it’s often lower than the reference range that is generally quoted, as many of those studies are biased by the inclusion of a mainly Caucasian population.”

These more personalized reference ranges were possible because of the enormous amount of data. “Even after doing all our filtering, we had almost 100,000 males and 100,000 females and anywhere from 10,000 or more per decade of age. That’s a tremendous amount of statistical firepower.” The goal of medicine should be personalized care, Dr. Johnson says, “and the ability to look at longitudinal data even in individual patients will better enable us to achieve that goal.”

The current standard reference ranges for platelets are based on much smaller populations that usually do not include pregnant patients. “It stands to reason that you should adjust your reference ranges for the health status or disease condition of the patients, because those may affect the result. And one thing we’re going to start looking at is the reference ranges for the serum free light chain assay in patients with different degrees of renal failure.”

The ability to stratify reference ranges in those patients is tantalizing, he says, because the patients will no longer be labeled as having an abnormality that might provoke, for example, a bone marrow aspiration and biopsy.

Does the Enterprise Data Warehouse create a potential problem with having a “firehose” of data that overwhelms? Yes, says Dr. Johnson—if you don’t think you have the right tools to analyze the data. “But if you build enough filters to make the data more manageable, you get a patient population that is more ‘pure,’ if you will, for what you’re trying to study.”

The goal is to be able to predict disease, he says. This is an area where “we’re starting to dip our toes into the water, because we have the longitudinal data to follow patients. With disease prediction, we’re just in the infancy with MGRS. It’s one of our first attempts on a large scale to see whether a certain lab value with a certain cutoff seems to lead to a specific outcome.”

Even where no therapeutic intervention is available, there is a benefit to understanding that a patient is at risk of a bad outcome, Dr. Johnson emphasizes. “While you may not be able to cure the patients, you may be able to provide other interventions that improve their quality of life, allow them to plan for end-of-life issues, or allow you to study them more intensely. And you could use any data you derive from them to treat the next generation of patients with the same problem.”

Some clinician practices have changed because of the findings from the research using the data warehouse. “Our hemoglobin reference ranges have changed quite a bit of patient practice because we revised what we call ‘anemic.’ We also revised our cutoffs for microcytic and macrocytic red cells, and that changes outcome behaviors as well, because if you call someone microcytic, that might prompt an investigation for iron deficiency, blood loss being the most common cause in older patients and then cancer being the main concern, versus something else.”

“So improving cutoffs lets you more reliably avoid overcalling or even undercalling the diagnosis, and we’ve done that. We’ve actually used our cutoffs to generate algorithms to investigate anemia. We use the hemoglobin value, the MCH, and the absolute reticulocyte count, and that will prompt a certain subset of tests to be ordered before we get a peripheral blood smear.”

After reviewing the peripheral blood smear, additional tests may be ordered or suggested and then an integrated narrative report is issued, taking into account any available current or previous laboratory, radiologic, pharmacologic, and clinical data. “This method either gives us a clean diagnosis or it may prompt additional investigation. But the point is that it’s a multi-modal approach; we don’t interpret any one finding in isolation.” If clinicians say they are worried about a patient despite having issued a report, the pathologist, and by extension the laboratory, can also go back and look at all the data to see if there is an explanation or something that needs exploring.

The data warehouse also allows the lab to look at and flag potential problems with an instrument or variations within subsets of patients. “That may prompt us to see if there’s a real change in the patient or something else that was a problem, such as getting the wrong sample, or having a particular problem with a reagent lot on a certain shift. If you can do that in real time, you can react appropriately before anything deleterious occurs to the patient.”

Comparing test methods is another capability the data warehouse provides. Seven years ago, “we used the data warehouse when we were looking at human papillomavirus testing. There was an FDA-approved platform that was very expensive versus a non-FDA–approved platform, and we showed with large amounts of data that the tests were equivalent, even though the non-FDA–approved platform was cheaper. That saved our patients and clients a lot of money, and gave them the peace of mind to know they could trust the results of the non-FDA–approved platform.”

What he would like to do next is to take the data warehouse and perform correlation matrices. “We could just see what we could find out: what are possible correlations and what should prompt subsequent investigation. That takes a lot of computing power when you are dealing with so many variables,” he adds. But then again, “The data is already there. I have to use computer and analyst time to go back and mine it, but based on the findings, we can figure out how to target further efforts.”

In medicine as elsewhere, causation is always difficult to prove, Dr. Johnson says. “But if you have 2.3 million unique patients to look at, and you can reasonably say, ‘on average for 50 percent of my patients with a certain profile, when x happened, y occurred,’ then you may avoid a lot of unnecessary testing and therapy.” At the least it should prompt additional investigation into a potentially clinically significant association that might have previously gone unnoticed, he says.

Despite initial troubles appreciating the power of the data warehouse, manufacturers recently have come back to RML with ideas for using the data in order to better understand their instruments. “So we also use the data to look at where there may be weak points in an instrument that they may not be aware of, or perhaps additional uses for their assays that they haven’t envisioned. Sometimes it just takes the tincture of time for somebody to get an understanding of how the data warehouse may be very valuable and useful,” Dr. Johnson says.

The only limitation on uses of the data warehouse is the user’s imagination, he says. “With this database, there are a lot of questions that can be answered outright, or if not, the data can at least get you started on the right path to an answer.”

Dr. Dolan sees the health care system at a crisis point. “We’re on a path in spending that’s not sustainable. But using these powerful analytic tools, we can truly analyze where our problems are and develop strategies to fix them. We can redesign, reengineer, and then use the warehouse to monitor and fine-tune the strategies and make sure the problems stay fixed. The data warehouse gives us a tool to truly manage health care delivery.”
[hr]

Anne Paxton is a writer in Seattle.

BigGain_Dolan

CAP TODAY
X