Abstract
All pulmonologists, including those recently completing training, should be competent in critically evaluating and interpreting pulmonary function tests (PFTs). In addition, some authorities recommend that respiratory therapists learn to provide preliminary PFT interpretations for the medical directors of PFT labs. The 2005 American Thoracic Society/European Respiratory Society guidelines for interpreting PFTs lack recommendations for the best reference equations for lung volumes and diffusing capacity of the lung for carbon monoxide (DLCO), and lack reference equations for non-whites. The pre-test probability of lung disease should be determined using a short questionnaire. The “nonspecific pattern” occurs in about 15% of patients referred to a PFT lab, but it has many clinical correlates and the course is usually benign. Less common PFT patterns and those resulting from comorbid conditions (such as obesity, respiratory muscle weakness, or heart failure) are not discussed by the guidelines. More than half of patients with interstitial lung disease have a normal ratio of DLCO/VA (alveolar volume), and many have a normal total lung capacity.
- pulmonary function test
- PFT
- respiratory therapist
- lung volumes
- diffusing capacity
- DLCO
- guidelines
- restriction
- predicted values
Introduction
Incorrect or suboptimal interpretation of pulmonary function test (PFT) results can harm the patients for whom the tests were ordered. The clinicians who order PFTs usually do not have the expertise to optimally interpret the spreadsheet of numerical results. Training in PFT interpretation has been minimal in the vast majority of pulmonary medicine fellowship programs for 2 decades. The baby-boomer and more senior physicians with specified rotations for PFT lab training and decades of experience in PFT interpretation are retiring. There are no Internet-based training programs for PFT interpretation. One-day PFT interpretation postgraduate courses at the national meetings (American Thoracic Society [ATS] and American College of Chest Physicians) are sporadic. There is no credentialing for the medical directors of PFT labs (as there is for the medical directors of sleep labs).
The 2005 ATS/European Respiratory Society (ERS) guidelines for interpreting PFTs describe only the PFT patterns for the most common lung diseases (asthma, COPD, chest wall, and interstitial lung disease [ILD]) (Fig. 1).1 Comorbid diseases that affect lung function are common, but not discussed in the interpretation guidelines.
All fellowship-trained pulmonologists should be knowledgeable about the hardware (flow, volume, and gas concentration sensors), software (predicted values, lower limits of normal [LLN]) and clinical implications of PFTs. Just as respiratory therapists (RTs) provide advice to intensive-care physicians regarding ventilator settings, we believe that RTs who run PFT labs should become proficient in the interpretation of the results and provide preliminary interpretations. Most PFT systems have an automated interpretation software package, but we believe that pulmonologists and RTs are more informative than the computer. In some countries (such as Australia, New Zealand, and the United Kingdom), non-physician respiratory scientists provide the final PFT interpretations. This is not likely to occur in the United States, since the medical directors of PFT labs derive substantial income from interpreting the test results (Medicare Part B payments) and believe that their ability to correlate PFT with clinical and imaging findings provides a more useful interpretation.
The scientific basis for tests of ventilatory function (vital capacity [VC]) and gas exchange (diffusing capacity [DLCO]) date back to the early 19th and 20th centuries, respectively.2,3 These tests have been in wide clinical use for the last half century, which has seen great advances in the technology for measuring flow, volume, pressure, and gas concentration, and for calculating final results and comparing them with reference values. Over these years, guidelines have been published to systematize and achieve uniformity in measuring, reporting, and interpreting PFT results; the most recent was a joint statement of the American Thoracic and European Respiratory Societies.1 Nevertheless, many questions, evidence gaps, and inconsistencies remain, which we discuss below.
The Pre-Test Probability of Disease
As with other medical tests, PFTs do not make a diagnosis by themselves. The interpretation of PFTs merely shifts the probability of lung disease up or down from the pre-test probability, which is determined from the patient's history, responses to therapy, and the results of prior tests (radiologic, microbiological, pathologic, et cetera). The blanket statement “clinical correlation is necessary” is often added to the end of a PFT interpretation, but a non-pulmonologist who orders a PFT often needs more help in using the results for medical decisions (short of a full consultation from a pulmonary sub-specialist). Ideally, the referring physician should provide the indications for the tests ordered (what he wishes to learn) and relevant clinical diagnoses.
Unfortunately, the optimal information needed to estimate the pre-test probability of lung disease is often not easily available at the time of test interpretation (from an electronic medical record or from the physician who ordered the test), so we recommend that PFT labs routinely ask each patient a short set of questions to determine pre-test probability of common lung diseases (Table 1). We understand that the responses from patients may not be as accurate as the information from the medical record. Since obesity (also known as the metabolic syndrome) is very common and affects PFT results, we also recommend that an index of obesity be measured at the same time as standing height (which is required to determine predicted values for most PFT results). The most widely accepted index of obesity is body mass index (BMI), easily calculated from body weight and height, but recent studies have shown that lung volumes are affected more by abdominal circumference than by BMI,4 so we recommend that waist size (at the level of the umbilicus) and hip circumference also be measured (using a tailor's cloth or similar tape) and provided (along with reference values) to the person who interprets PFT results.
Absolute Versus Predicted Values
PFT results in units of L, L/s, cm H2O, or mL/min/mm Hg are useful by themselves for comparisons during follow-up tests. However, when the patient is tested for the first time, these numerical results must be compared to reference or “predicted” values, which adjust for the patient's size (height), age, sex, and race/ethnicity. Many studies of reference (predicted) equations from population-based samples of healthy persons have been published and can be chosen by the medical director. These choices are important, because the predicted values (and the normal ranges) often differ substantially from one study to another. For example, a patient with a DLCO of 30 mL/min/mm Hg can be normal using the reference equations of Miller et al,5 but abnormal using the reference equations of Crapo and Morris.6 Furthermore, rules of thumb for determining abnormality, such as DLCO < 80% predicted or FEV1/FVC < 0.70, are frequently faulty, causing high false positive or high false negative rates.7 Abnormality is defined by the 95% confidence interval (available from the published prediction equation) and printed on the final PFT report.
Restriction
The “standard” definition of restrictive impairment is a decreased total lung capacity (TLC). There are several problems with this definition. The instruments to measure static lung volumes are expensive and difficult to maintain, so they are not available in some settings, or, if available, the tests are not done because of limited time, the additional cost of the test, or the inability of the patient to perform the test. Prediction equations for TLC, functional residual capacity (FRC), and residual volume (RV) are less robust, when compared to the spirometry reference equations, because of small sample sizes of the reference studies, less precise definitions of “health,” and lack of including non-whites. In addition, the test method (body box, nitrogen washout, or multi-breath helium dilution) used by the reference study often differs from the instrument that is locally available. The quality control programs for instruments measuring lung volumes are often suboptimal, when compared to the daily 3.00-L calibration checks done for spirometers, reducing accuracy and reproducibility of the results.
The ATS/ERS statement on lung volumes8 provides a list of reference equations but makes no recommendation regarding which is the best set of reference equations. Prediction equations for lung volumes used in PFT labs are often from a different population from those used for spirometry. The resulting discrepancy can be seen in different predicted values for FVC versus slow VC (taken from the lung volume reference study). Optimally, the spirometry, lung volume, and DLCO reference equations would all come from the same study (such as those from a population-based study of adults in Michigan). However, according to the ATS/ERS guidelines,1 the best spirometry reference equations for North Americans are from the National Health and Nutrition Examination Survey (NHANES) III study,9 yet NHANES III did not measure lung volumes or DLCO (and neither has NHANES IV). One solution (which gives the appearance of internal consistency to the PFT report) is to calculate predicted TLC and RV using the VC/TLC ratio from the lung volume reference study and the predicted FVC from the NHANES III study.
RV is often less reduced than VC in both interstitial and chest wall restrictive disorders. Therefore, TLC may be within normal limits when VC is decreased. A decline in VC has long been recognized to be correlated with loss of lung compliance, a sensitive measure of impairment in interstitial diseases. For example, a 1965 study of 21 patients with asbestosis showed that when VC was below 80% of predicted (the lower limit then in use), static lung compliance had also become abnormal.10 In a recent series of 830 sarcoidosis patients, only 58 (7%) had a decreased TLC, while 25% of the remaining 772 patients showed decreased compliance, including 33 with a decreased VC.11 In patients with an increased pre-test probability of restriction, the pattern of a mildly decreased VC, normal TLC, and normal or mildly reduced RV rules out air-trapping as the cause of the low VC (even in adult smokers), and increases the post-test probability of a restrictive pattern. The converse pattern (a low TLC with a normal VC) is rare.12
The Nonspecific Pattern
Hyatt and co-workers described the pattern of a low FVC, normal TLC, and normal FEV1/FVC as “volume loss” perhaps due to “volume derecruitment.”13 About 10–15% of adults referred to a PFT lab have this nonspecific pattern.14 It can be due to obesity (a zero expiratory reserve volume), premature termination of the FVC maneuver (causing a falsely low FVC), or airway closure due to closure of small airways during forced exhalation (not measured by FEV1/FVC). The later mechanism was shown to exist in some patients by using magnetic resonance imaging to image small airways in patients inhaling He-3 during induced bronchoconstriction.15 Experts have not decided whether to classify this as “restriction” or as “obstruction.” A 3-year follow-up study of 1,284 patients with the nonspecific pattern found no significant change in FEV1 in two thirds,12 while the other third developed a low TLC (restriction) or a low FEV1/FVC (obstruction). Many patients exposed to the fumes, dust, and gases at Ground Zero (after the World Trade Center attacks) who had asthma-like symptoms (suggesting reactive airways dysfunction) had “spirometric restriction” with a normal FEV1/FVC.16 Some had abnormally high airways resistance when measured using forced oscillation tests.17,18
Spirometric restriction was seen in 8% of a series of 413 asthmatic patients after excluding all other causes (including obesity, chest wall disorders, ILD).19 FRC was not increased, DLCO was normal, and TLC was normal or reduced. An associated finding was a decreased or negative forced expiratory reserve volume.20 Asthma was confirmed by response to bronchodilators and methacholine. Increases or decreases in FVC and FEV1 were often equivalent (retaining a normal FEV1/FVC). This had also been reported in the early 1970s.21,22
DLCO Reference Equations
The ATS/ERS 2005 DLCO guidelines1,23 make no recommendation regarding the best DLCO reference equations; however, the choice will cause marked differences in identifying and quantifying abnormality. Many research studies24–26 have chosen DLCO reference equations from never-smokers of the study of Miller et al.5 Authors of DLCO reference equations from almost 1,000 healthy Australian adults27 noted that the “prediction equations (of 5 studied) that best fitted our sample were those of Miller and colleagues.” The Australian series was stratified to include older subjects; mean age was 57, compared with Miller et al's 43. Annual decrease in DLCO was greater starting at age 60. It is likely that the DLCO reference equations from healthy adults in Salt Lake City6 are too high. In one series of 204 adults, the 40% with DLCO below the Crapo and Morris LLN but above the Miller et al LLN had no identifiable disorder.5,6,28
Interpreting DLCO Results
DLCO is the only PFT (not counting invasive arterial blood gas analysis) that measures non-mechanical properties of the lung, in this case, gas exchange. Decreased DLCO as the only abnormality suggests pulmonary vascular or early parenchymal disease (see the lower left corner of Fig. 1). It may be seen in individuals with unsuspected emphysema noted on computed tomography (CT) scan.29,30 “Non-obstructive” emphysema or “emphysema with normal spirometry” is arousing interest as low dose CT scanning is increasingly being used to screen large numbers of smokers for lung cancer. In a series of 27 patients with isolated decrease in DLCO who underwent CT and echocardiography, 13 had emphysema, 11 of whom also had evidence of mild interstitial disease.25 Nine of the 14 patients without emphysema had pulmonary vascular or parenchymal disease. Another report of 1,777 patients, of whom 7% had an isolated decrease in DLCO, showed similar findings.31
In the presence of airways obstruction and hyperinflation, DLCO separates alveolar destruction (emphysema) from asthma and chronic bronchitis. A low DLCO predicts desaturation on exertion in patients with COPD32 or ILD, and predicts poor performance during the 6-minute walk test.33 Conversely, a normal DLCO makes oxygen desaturation during exercise unlikely. In one study, declining DLCO (and, to a lesser degree, declining FVC) was more important in predicting mortality in patients with ILD than the histologic classification.34 If DLCO was < 35% predicted, thoracoscopic biopsy was not useful in predicting mortality from ILD. A low DLCO predicts postoperative respiratory mortality and morbidity after lung resection in patients (Fig. 2).35,36
Interpreting DLCO/Alveolar Volume
We recommend that DLCO/alveolar volume (VA) be removed from PFT reports because of widespread misunderstanding and incorrect interpretations in the United States (and probably many other countries). In patients undergoing evaluations for an ILD seen on a chest x-ray or lung high-resolution CT, the pattern of restriction with a low DLCO but a normal DLCO/VA is commonly misinterpreted as “diffusing capacity is normal when corrected for the low lung volumes.” Such a statement misleads the physician who ordered the test to think that a chest wall disorder has caused the restriction, when this pattern is consistent with an ILD (increases with the pre-test probability of an ILD). About half of patients with an ILD diagnosed by a lung high-resolution CT and lung biopsy have a normal DLCO/VA.37–39 A diffuse loss of alveolar units causes lung volumes to fall and the DLCO to become markedly abnormal (as shown in the ATS/ERS PFT interpretation flow chart1), but the DLCO/VA often remains normal or only mildly reduced. These relationships make the DLCO much better for predicting oxygen desaturation on exercise.33,40 Other examples of diseases that cause diffuse loss of alveolar units include IPF, lung involvement with connective tissue diseases (such as scleroderma or systemic lupus), hypersensitivity pneumonitis, Pneumocystis carinii pneumonia secondary to acquired immune deficiency syndrome, and chronic congestive heart failure.
Savvy PFT experts in the United Kingdom and Switzerland have long been frustrated by the misconception that “DLCO/VA corrects the DLCO for the alveolar volume.”41,42 Replacing the term DLCO/VA with KCO (the transfer coefficient) may help correct the widespread misinterpretations. The upper limit of the normal range of KCO may be more helpful for the differential diagnosis than the KCO LLN, because KCOincreases exponentially when VA is reduced42,43 (Fig. 3). This relationship is due to an increase in the surface to volume ratio for diffusion per alveolus as the alveoli become smaller (with a submaximal inhalation to TLC). There are many causes of incomplete lung expansion that cause the VA to decrease and the KCO to increase: they include diaphragm weakness (secondary to neuromuscular disease); submaximal inhalation of test gas (a common error when performing DLCO breathing maneuvers); and chest wall restriction due to obesity, kyphosis, scoliosis, or a pleural effusion. In these cases, the DLCO itself decreases only slightly (by about 3% for every 10% decrease in the VA) and usually remains within the normal range.
The KCO also increases with a discrete loss of alveolar units (which decreases the VA), for example following a lobectomy, pneumonectomy, lobar collapse, or a localized alveolar infiltrate (as in some stages of sarcoidosis).42 Because the blood flow of lost areas of the lung is diverted to the remaining healthy lung, KCOincreases slightly. On the other hand, DLCO declines (but relatively less than the decline in VA).
It will require widespread re-education of the medical directors of PFT labs for them to understand the pattern of an apparently paradoxical increase in DLCO/VA and KCO with incomplete lung expansion or a discrete loss of alveolar units. Meanwhile (perhaps for the next decade), we recommend that the DLCO/VA be removed from PFT reports. Using only the DLCO for the differential diagnosis (per the ATS/ERS 2005 interpretation diagram) is simpler and reduces misinterpretation rates.1
Reference Values for Non-Whites
The classification of PFT impairment requires a thorough understanding of reference values and the populations on which they are based. The NHANES III spirometry reference equations provide spirometric values for whites, African-Americans, and Mexican-Americans. However, black subjects from other backgrounds may not be comparable to African-Americans. A recent reanalysis of NHANES III data found no interaction of ethnicity with age or height for FVC, FEV1, or FEV1/FVC.44 Values for Mexican-Americans were similar to those for other whites, but values for African-Americans were lower. Precision of the prediction equations derived from the full sample was greater than from ethnic-specific subsets (the 95% confidence limits were therefore narrower). The Global Lungs Initiative has disseminated reference equations for predicted and LLN FEV1 for whites, African-Americans and East Asians (from China, Korea, and Thailand).45 The white population was drawn from Europe, the United States, Canada, Brazil, Chile, Mexico, Uruguay, Venezuela, Algeria, and Tunisia. There were minimal differences in FEV1/FVC between the 3 major racial/ethnic groups. The lung function of Hispanic subjects from countries other than Mexico (such as the Dominican Republic or Puerto Rico) may differ from that of Mexican-Americans. There may also be differences between healthy people from northern versus southern China, Japan, Korea, Vietnam, and other Asian countries. While no consensus currently exists (due to a paucity of data or standardized analyses of data collected in these countries), the Kiefer et al44 and Global Lungs Initiative45 analyses are useful.
It is likely that static lung volumes differ between healthy white and black people, but no adequate reference equations for TLC, FRC, and RV have been published for non-whites. It has been common practice to reduce the white predicted values for TLC by 12–15% and for FRC and RV by 7%. DLCO reference equations for non-whites are sorely lacking. A study of only 42 nonsmoking African-Americans46 suggested reducing the DLCO predicted value for whites by 12%. A small study of healthy 17–23-year-old male Army recruits47 reported 6% lower DLCO values for African-Americans, when compared to white.
Summary
In summary, there are large gaps in the evidence for optimally interpreting PFTs. Well conducted studies of the PFT patterns from well characterized groups of patients with a wide variety of diseases are urgently needed. DLCO and lung volume reference equations from healthy Asian-Americans, Hispanics, and Asian-Americans are urgently needed.
Discussion
Hnatiuk:
What do you think about evaluating severity of decline in FVC or restrictive pattern by using the FEV1 rather than the FVC, in terms of percent predicted, as the current ATS guidelines suggest? Second, how do you interpret bronchoprovocation tests when they meet the change criteria for FEV1, but the FVC also declines by a similar amount, and there's no evidence of obstruction to be found?
Miller:
To answer the second question first, this is somebody who drops below the 20% for FEV1 but it's accompanied by a similar fall in FVC. I consider that a positive test. And, as I've shown, many patients with asthma who have this nonspecific pattern, when they provoke, that's exactly how they do it. When you give them the bronchodilator at the end, both FVC and FEV1 go up. So they're bronchially hyper-reactive, and that might mean they're closing some airways, as would fit the theory that was demonstrated using magnetic resonance imaging. I think it's a useful concept. Can you repeat the first question?
Hnatiuk:
The current ATS guidelines1 suggest you comment on the severity of the restrictive pattern not based on the FVC percent predicted but on the FEV1. Do you agree?
Miller:
I wouldn't take great issue with that, except that if you look at many restrictive impairments from ILDs. We did this with almost 3,000 patients1 with asbestosis. That's a large number. The FVC falls more so than the FEV1. One of the outcomes of that is an increase in ratio, which a lot of us use as part of the pattern in ILD and restrictive impairment. FEV1 falls as well, but I don't see any reason for it to replace FVC. I use FVC in grading those impairments.
Culver:
I'd like to comment on that as well. Not to defend the committee, but I was on the committee in the proofreading stage, and I was initially a little surprised by that as well, because it was certainly different from what I had been used to. I have grown accustomed to it, and I actually quite like having that uniform guide to impairment. We know from exercise tests that 40 × FEV1 pretty much sets our ventilatory limitation, so although FEV1 doesn't correlate terribly well with dyspnea, which is obviously multifactorial, it does correlate with exercise performance to some extent.
I find it quite helpful in things like the nonspecific pattern or patients with mixed disease when you don't have lung volumes, and you don't know what's what, and clearly you don't know what the separate severity of the restrictive and obstructive components is. I've come to use the term “ventilatory impairment,” and I can say, “OK this person's got a 60% FEV1; they have a moderate ventilatory impairment.” I don't know if they've got mild this or moderate that, but I know they have an impairment and I can emphasize that.
Similarly, with the nonspecific pattern, instead of trying to decide if I think this is incipient obstruction or restriction, and you can waffle about that in the discussion, but the bottom line is they have a ventilatory impairment of X amount. So I think that's quite a handy thing. Sometimes I even suggest to the fellows that you can do it for every PFT: just start off on the first line, “This patient has a ventilatory impairment of X amount due to obstructive disease/due to mixed disease/due to I don't know what but that's the impairment they've got.”
Miller:
If I could answer that, I often use that phrase too. I think because of what I said that you do have a disproportionate decrease in FVC or slow vital capacity versus FEV1, and we're grading the severity of that impairment: we're not just relating it to effort tolerance, as we would for an exercise test. It makes more sense for that kind of impairment to grade it by the vital capacity.
Kaminsky:
May I comment on that also? Physiologically, it makes sense to me to grade it based on FVC, because that's the pure volume-related parameter. But in trying to get a paper published,1 we tried to grade the severity of obstruction in a mixed pattern, and the reviewers were very insistent on us at least describing why we weren't basing it on a straight FEV1. Their argument was (and I do buy this) that of all the parameters we have in pulmonary physiology, the FEV1 remains the most robust in terms of overall survival in many studies, in terms of independent risk factors for survival. So if the idea is to grade this for overall impairment of functional ability, FEV1 makes sense at an epidemiological level.
If your idea is to grade this in terms of degree of restriction, I agree that it should probably be on the basis of FVC or TLC: something purely volume-related. We get into semantics. My first paper that I ever published2 was on “reversible restrictive lung disease,” which is the very entity you've been talking about. A young woman who had pure restriction based on spirometry, low TLC, we did pressure-volume curves on her, shifted down to the right, very low compliance, looked like interstitial disease, but CT scan didn't show anything. We gave her bronchodilator and she completely reversed, including the pressure-volume curve. She ended up going for an open lung biopsy, and what she had was constrictive bronchiolitis. So she had involvement of the very smallest airways that was causing the air trapping, in her case smooth muscle must have somehow been involved as well, although we hypothesized that maybe surfactant was involved when you gave her a beta agonist and this was what allowed her to re-open her airways. But no doubt it was the very peripheral regions of the lung, the small airways, that were involved in that pattern.
Miller:
By the way, the first suggestions of that in individual patients go back to the early 1970s. One of them was by someone among my generation, Charlotte Culp at Einstein.
Salzman:
Going back to the example of that nonspecific pattern from the World Trade Center group,1 in our small study2 we theorized that while the World Trade Center injury seemed to be an airway disease injury, that it may be a very small airway disease injury that was relatively silent in terms of affecting the FEV1 and FEV1/FVC. Because of predominant airway closure, you ended up having this normal FEV1/FVC and reduced but still “normal” range FVC. The site of narrowing may make a big difference in the classic obstructive pattern versus the nonspecific or, if you will, restrictive spirometric pattern.
Miller:
As Hyatt pointed out,1 it doesn't matter where the airway closes: if it's closed, it doesn't contribute to your measurable volume. It fits wherever it happens; it's likely more often in the smaller airways.
Footnotes
- Correspondence: Albert Miller MD, Pulmonary Division, Beth Israel Medical Center, 286 1st Avenue, New York NY 10010-4902. E-mail: almiller{at}chpnet.org.
-
Dr Miller presented a version of this paper at the 48th Respiratory Care Journal Conference, “Pulmonary Function Testing,” held March 25–27, 2011, in Tampa, Florida.
-
The authors have disclosed no conflicts of interest.
- Copyright © 2010 by Daedalus Enterprises Inc.