Accuracy-based Surveys carve higher QA profile

CAP Today

October 2010
Feature Story

Anne Paxton

With its thousands of participants, the College’s proficiency testing offered to laboratories through the Surveys and EXCEL programs is acknowledged as the largest laboratory peer comparison program in the world. But tucked within the pages of the CAP Surveys catalog, a much smaller program—one that counts subscribers only by the dozens—is quietly emerging as a potentially even more powerful tool in ensuring quality laboratory testing.

It’s Accuracy-based Surveys—a program providing Surveys that use challenge specimens that are free from matrix effects and have target values traceable to certified reference materials. Unlike proficiency tests, which allow laboratories to satisfy accreditation and regulatory requirements by comparing their test results with those of other laboratories, Accuracy-based Surveys let laboratories compare their test results with international reference method results.

“Accuracy-based Surveys are about more than just fulfilling the regulatory requirements of PT and getting a passing score,” says Anthony Killeen, MD, PhD, chair of the CAP Working Group on Accuracy-based Surveys. “Knowing that a result is not just in agreement with the peer group but is also accurate is important. That information is provided by the Accuracy-based Surveys.” Dr. Killeen, associate professor of pathology and director of clinical laboratories at the University of Minnesota, says laboratories should be aware these Surveys are available—and that they’re growing in number.

The collection now includes the Accuracy-based Lipids Survey (ABL), Creatinine Accuracy Calibration/ Verification/Linearity (LN24), Glycohemoglobin Calibration Verification/Linearity (LN15), Glycohemoglobin (GH2), and Neonatal Bilirubin (NB, NB2), in addition to the 2010 entry: the Testosterone and Estradiol Accuracy Survey (ABS). With Accuracy-based Vitamin D (ABVD) on deck for April 2011, and another new analyte to be added each subsequent year, the program promises to strengthen the focus of laboratories and manufacturers on making accuracy a priority.

To supplement the Surveys, the College is also offering Commutable Frozen Serum (CFS) for a list of 15 other analytes, including cortisol, glucose, potassium, and sodium. “This material, which is free of matrix-related bias, was collected from donors by the College in 2003 for the Fresh Frozen Serum Study and has been stored at -70° C, and is now being made available for people who want to use it for studies,” says Nataliya Polyakov, MT(ASCP), senior technical specialist for the CAP Surveys program. Along with each serum specimen, the College supplies customers with a table of reference targets.

The distinction between proficiency testing and Accuracy-based Surveys is a crucial one, but the limitations of proficiency testing are not always understood, says William L. Roberts, MD, PhD, vice chair of the CAP Chemistry Resource Committee. “We actually include information in the front of each participant summary report. We try to make it very clear,” says Dr. Roberts, who is medical director of automated core laboratories, ARUP Laboratories, and professor of pathology at the University of Utah, Salt Lake City.

Routine proficiency testing can identify differences between analytical peer groups, says Dr. Killeen. But “the presence of matrix effects in routine proficiency testing materials often makes it difficult to determine whether these differences, or what proportion of these differences, are due to assay variations that could affect patient samples.”

Matrix effects are the combined effect of all components of the sample (other than the analyte of interest itself) on the measurement of the analyte. Different sample types, sources, or preparative methods can cause assay interference. To prepare proficiency testing challenge specimens, “the commercial providers actually collect plasma, and they convert it to serum by adding thrombin and calcium to it,” explains W. Gregory Miller, PhD, a consultant to the Chemistry Resource Committee who is director of clinical chemistry at Virginia Commonwealth University, Richmond. “It has to be dialyzed to remove the calcium, and when you dialyze you remove a lot of other things.”

Sometimes the serum is also delipidated to make it clearer, which removes certain proteins and other compounds of interest, he says. “So you end up with a serum-based material, but it’s protein-based, not serum. Then you add back most of the analytes of interest. And it simulates serum as closely as possible, but it has sufficient artifactual character that it has these matrix-related biases with many of the routine methods used in the clinical laboratory.”

For that reason, when these materials are used for proficiency testing, the laboratory is assessed only in terms of its ability to match the peer group. “Theoretically, all of the labs’ results could be wrong, or all of them could be right; due to the artifacts of these matrix-related biases, there’s not enough information to answer the question of accuracy.”

By contrast, the Accuracy-based Survey materials are carefully prepared, fresh-frozen, off-the-clot serum which has not been manipulated at all, except in some cases to add a bit of pure analyte to get an elevated concentration, Dr. Miller says. The creatinine Survey, for example, includes five samples that have been “spiked” with pure creatinine, which is available in pure crystalline form. Studies by the National Kidney Disease Education Program have demonstrated that adding creatinine and pooling serum from several different donors did not alter the matrix characteristics; the material was validated to be commutable with authentic patient samples. “So this material can be and has been used as evidence that the methods are properly calibrated for creatinine.”

The cost of an Accuracy-based Survey is not much greater than that of a regular proficiency test, Dr. Miller notes. “They are comparably priced, but the difference is that the samples, which are quite expensive to prepare, are targeted at only a few analytes, maybe one, or four or five, whereas the chemistry Survey for conventional proficiency testing, for example, has 70-plus analytes in the same bottle, for the same price.”

In the U.S., only the College is offering accuracy-based surveys, and they play an important role in quality assurance, says Gary L. Horowitz, MD, chair of the Chemistry Resource Committee. “If you’re doing a mass spectrometry method or other method unique to your site, you don’t have a peer group, and an Accuracy-based Survey is perfect for you.” In other cases, “the participants are people like me who might say, ‘I’m fine relative to my peer group, but I’d like to answer to a higher standard, to say my lab is not only doing this test well based on everyone doing it the same way, but also compared to the truth,’” says Dr. Horowitz, director of clinical chemistry at Beth Israel Deaconess Medical Center and associate professor, Harvard Medical School. “And if I find my method isn’t performing well, I will call up the manufacturer. The whole goal is to get enough labs to be able to make some pressure on analytes where it matters.” He expects that the forthcoming Accuracy-based Survey for 25-hydroxy vitamin D could be particularly important in that respect.

It has been documented that the Surveys produce a steady improvement in the agreement among different methods and in precision among different methods, Dr. Miller says.

“The analytes that have been chosen are those that have clinical practice guidelines associated with them, so there’s a strong motivation from a patient management point of view to ensure uniformity and standardization of results. The manufacturers observe these results and react to them, because they want their method to be chosen when people go out shopping for methods.”

The concept of Accuracy-based Surveys evolved from a set of 1994 CAP proficiency tests that included a fresh-frozen serum material for several analytes, says Dr. Miller. “The sample was a fresh-frozen, off-the-clot serum to mimic an authentic sample that would have been collected from a patient.” As a result of that trial, “it was determined there were a number of biases for a group of commonly measured analytes.”

In 2003, in its Fresh Frozen Serum Study, the College conducted a similar trial with off-the clot fresh-frozen serum for some 12 analytes. “And at that point, the assessment was that some methods were better standardized but others showed biases similar to those seen in the 1994 Surveys,” Dr. Miller says.

When frozen serum samples are sent as unknowns along with standard material, the standard material gives a pretty good reflection of the imprecision of the assay, but the biases are found to have very little connection, says Alan Thomas Remaley, MD, PhD, senior staff member of the Department of Laboratory Medicine of the National Institutes of Health and a member of the Chemistry Resource Committee.

“You might have an assay that would run high on PTH, for example, but when you run a real serum sample, it doesn’t show that bias because the material is not commutable—meaning it doesn’t behave exactly like the real patients’ samples we develop these assays for.” So by looking at the non-commutable material, which is mostly what the CAP sends out for its Surveys, it’s possible that results may be over-interpreted, Dr. Remaley says. As a consequence, the CAP grades almost all Surveys participants on their mean peer group grading. “So if there’s a bias or matrix problem, that neutralizes that effect; you’re just seeing how you did compared to your peers.”

Following the 2003 comparisons of results, the College’s Chemistry Resource Committee decided to offer Accuracy-based Surveys on a more systematic basis, Dr. Miller says. “The first one was LN24 for creatinine. That was developed in collaboration with the National Kidney Disease Education Program, which was initiating a standardization program for creatinine, and was quite successful.”

But the hemoglobin A1c Survey goes back even further and is really the poster child for the Accuracy-based Surveys, Dr. Horowitz says. “Once the College started using real patient material for A1c proficiency testing, we could start making comments about how the methods were working in the field. We didn’t have to worry about grading against a peer group, because there were none of these so-called matrix effects. Any differences we saw between methods represented a real problem, such as a calibration issue.” While the use of real human blood makes the A1c Survey a little more expensive, “we’ve watched in concert with that Survey the methods improve over the years.” In an ideal world, using real patient material would be the best way to do all proficiency testing, he adds, but it’s expensive as well as impractical.

The Accuracy-based Survey for lipids (ABL), launched about two years ago, was chosen because there has been a long history of standardizing lipids, going back about 50 years, Dr. Remaley says. “There are cut points throughout the distribution of lipids—total cholesterol, triglycerides, HDL and LDL cholesterol—based on the National Cholesterol Education Program for how to calculate cardiovascular risk, when to treat patients, and how to monitor patients. Whenever you have national guidelines, it’s very important to accurately classify patients by risk.”

There was also a reference method in place for lipids, generally another requirement for an Accuracy-based Survey. The committee was pleased when the ABL Survey enrollment quickly grew to about 150 participants—far less than the 5,000-participant average for a chemistry proficiency test, but a healthy response for a Survey that is not required.

“Most of the participants were from reference labs and university hospitals, institutions that have a big influence over the rest of the lab community,” Dr. Remaley says, noting that the College wasn’t actually prepared to make enough reference material for much more than a couple hundred participants, as the process is tedious and costly. “Often the groups that participate are kind of pushing the frontier, or they’re large reference labs that want to make sure to get it right. And I think this will put pressure on the manufacturers if they do see biases.”

With the proficiency testing results, manufacturers have typically not been as responsive. “They might see their assay is running high, and sometimes we’ll contact them and almost invariably it turns out to be a matrix issue—that there’s something about the material we’re producing.” It’s a mistake to think that routine Survey materials can be used to assess analytical accuracy, as has been shown by the several fresh-frozen Surveys the CAP has conducted, Dr. Remaley says. But as a result, he thinks, “there are many people getting off the hook who aren’t running a good assay.”

The College’s Working Group on Accuracy-based Surveys, which is managing the rollout of additional Surveys over the next several years, was prompted by that experience to think what else it should be doing, and vitamin D came up. A low vitamin D level might be responsible for myriad conditions, Dr. Remaley notes, and it’s becoming common to send the test out. “A lot of labs don’t offer it.” But some of those that do offer it have encountered problems.

As with lipids, Dr. Remaley explains, there are recommended levels of 25-hydroxy vitamin D for diagnosis of deficiencies, and there have been well-publicized struggles to get the assay right.

Vitamin D doesn’t yet have clinical practice guidelines associated with it, though there are published interpretive guidelines. “We now have a desirable range for vitamin D, but there is a lot of controversy about it,” Dr. Remaley says. “The current guidelines suggest that a large fraction of the population is vitamin D deficient.” However, “you want to make sure you get an accurate result, because a lot of people are right on the edge of not having enough.”

Typically, an Accuracy-based Survey would be geared to tests with established clinical practice guidelines. “But vitamin D is ahead of the curve simply because it has become such an important analyte in recent years, and it’s known that some methods on the market are not calibrated correctly, so there’s motivation to try to improve that,” Dr. Miller says.

Testosterone was another test that seemed to be an appropriate choice for accuracy-based Surveys, based on the assay’s perceived inaccuracy. “Endocrinologists have been lamenting for a long time that the assays are okay for men but not for looking at results at the low end—say, for precocious puberty among children,” Dr. Remaley says. The current assays are probably not accurate enough, and they differ wildly from each other, he notes, citing an article in Clinical Chemistry that compared testosterone testing to flipping a coin.

“Testosterone testing, particularly in the context of working up men for hypogonadism, has increased tremendously,” Dr. Horowitz says, “because doctors are reading papers saying that a level below 300 ng/dL is very suggestive. As a result, labs need to know that their method gives accurate results at this concentration so that patients are not under- or overdiagnosed.”

Unfortunately, as yet there is no reference method for testosterone, Dr. Remaley says. But increasingly, mass spectrometry for small molecules like testosterone is becoming the de facto method. A Centers for Disease Control and Prevention committee is setting up a reference method for both testosterone and a number of other different mass spectrometry assays.

“There are emerging more and more national guidelines recommending what to do with certain results. This is a direction that CAP so far is supporting, and I think it will go a long way toward improving lab performance,” Dr. Remaley says. The FDA approves assays mostly through the 510(k) process, based on a comparison to an existing approved assay, which only indirectly assesses accuracy by assuming that the existing assay is accurate, which may not always be true, Dr Remaley says.

It’s a gap the College is attempting to address. “CAP is committed to pressing for more accurate test results, and we will have more and more of these Accuracy-based Surveys using material that’s commutable to send to participants who are worried about their accuracy. And that will put pressure on the manufacturers to get it right,” Dr. Remaley says.

A distinct benefit of the CAP’s accuracy-based materials is that they have demonstrated the existence and extent of clinically significant variation between different manufacturers’ assays, Dr. Killeen says. That was the case with creatinine, and it served as an impetus to the extensive efforts that have dramatically improved the general level of performance of creatinine assays over the past few years. But while the Accuracy-based Surveys can reveal the problem, Dr. Killeen notes, “it takes effort on the part of IVD manufacturers to correct it.

“Clinical advocacy is important too. In the case of creatinine, the NKDEP played a key role in the push for harmonization of methods.” The “sweet spot,” Dr. Killeen says, “is where there is a convergence of interests by labs, IVD manufacturers, and a clinical advocacy group.”

The push to harmonize methods has had positive outcomes for lipids and glycohemoglobin as well. “There is pretty good data with the glycohemoglobin Survey that performance has improved considerably since it started,” Dr. Roberts says. “In fact, we have tightened the grading requirements for that Survey because of the medical need to have better-performing methods, so we anticipate additional improvements in performance over the next couple of years. We can look at mean value by vendor, and we can see for some vendors the mean value is very close to the true value, while for others it’s off a bit.”

Though the aim of the Accuracy-based Surveys is to be helpful, the CAP is not necessarily seeking to increase dramatically the raw numbers of participants. “We really don’t need to mail to 6,000 labs,” Dr. Horowitz says. With a relatively small number of methods representing the vast majority of labs, “if we had 10 running the Siemens test, 10 running the Abbott test, and so on, that might be enough to know how the whole method group performs.”

Accuracy-based Surveys are so far not a requirement for laboratories, but they address a limitation of proficiency testing that many have acknowledged, says Dr. Remaley. “The people volunteering to participate do understand the importance of accuracy; it’s an important patient safety issue that CAP has strongly supported. And Accuracy-based Surveys will benefit everyone, because they will drive the whole lab community to improve.”

“The College looks forward to continuing to be a leader in this area,” Dr. Roberts says.

Anne Paxton is a writer in Seattle.