Abstract
BACKGROUND: The performance of spirometers is often measured only under ideal conditions, with a mechanical simulator reproducing the expiratory standard American Thoracic Society (ATS) curves generated by a computer. Studies have questioned the value of these results in real-life conditions. The aim of this study was to evaluate the accuracy and precision of 5 office spirometers with a flow-volume simulator using the ATS curves and using flow-volume curves obtained from patients.
METHODS: We measured the FVC, peak expiratory flow, and FEV1 by simulating different dynamic waveforms applied by a computer-driven syringe, the Hans Rudolph flow-volume simulator. In addition to testing standard curves recommended by the ATS, we also tested curves obtained with subjects.
RESULTS: The precision of the office spirometers was good and comparable using the standard ATS curves. One device presented the best performances in terms of accuracy and precision according to the ATS recommendations, but we observed significant biases in all devices with Bland-Altman analysis, particularly with the curves obtained from subjects with severe COPD.
CONCLUSIONS: The global quality of most spirometers makes them acceptable for the detection of pulmonary diseases. However, we demonstrated accuracy issues not shown by the standard testing procedure. We propose to improve the testing of spirometers by implementing more realistic flow-volume curves and to refine the analysis of the results.
- chronic obstructive pulmonary disease
- COPD
- benchmarking
- instrumentation
- quality control
- spirometry
- technology assessment
Introduction
To extend and facilitate the diagnosis of respiratory diseases, office spirometers were introduced on the market that are specially adapted to the practice of general medicine.1 A major objective of these office spirometers is to diagnose obstructive lung diseases such as asthma and COPD.2
The performance of spirometers is often assessed under ideal conditions with a mechanical flow-volume simulator designed to produce expiratory standard American Thoracic Society (ATS) curves, the standard test waveform set,3 generated by a computer following a standardized procedure, outlined by the ATS/European Respiratory Society (ERS) in 2005.4 The validity of these results has previously been studied by comparing the performance of some office spirometers with standard laboratory devices in patients tested under real-life conditions. We showed that many of the investigated spirometers revealed defaults, despite ATS certification.5 We therefore hypothesized that the standard ATS curves were not representative of some of the features routinely seen in flow-volume loops originating from patients.
The aim of this study was to evaluate the accuracy and precision of 5 different office spirometers using the Hans Rudolph flow-volume simulator (Hans Rudolph, Shawnee, Kansas) to generate both standard ATS curves and flow-volume curves obtained from healthy volunteers and subjects with respiratory disorders. We also aimed to refine the analysis of the data using the Bland-Altman method.6
QUICK LOOK
Current knowledge
The performance of spirometers is evaluated using a mechanical simulator that reproduces the standard set by the American Thoracic Society curves generated by a computer. These evaluations are generally performed under ideal conditions and may not reflect the performance of spirometers in typical clinical use.
What this paper contributes to our knowledge
Independent testing of 5 office spirometers with a flow-volume simulator revealed important accuracy issues when using flow-volume loops obtained from actual patients. The data suggest that the standards be re-evaluated using more realistic flow-volume curves. The results cannot determine whether these changes will result in improvements that would alter diagnostic results.
Methods
We asked the sales representatives of office spirometers available in Belgium to demonstrate one production model of their devices and to make them available for research. Table 1 presents the 5 devices tested. They were all reported by the manufacturer to comply with the ATS/ERS guidelines.
All devices report the values at body temperature and pressure saturated (BTPS) condition. In this study, we analyzed the performances of the spirometers only during the expiratory part of an FVC maneuver, for which it is important to know the BTPS correction factor. Among the 5 spirometers under investigation, only the PocketSpiro (MEC Group, Brussels, Belgium) allows switching between BTPS and ambient temperature and pressure condition.
We scrupulously followed the instructions of the manufacturers, in particular concerning the handling of the devices and the need for calibration checks. Each spirometer except the SpiroScout (Ganshorn, Niederlauer, Germany) requests a calibration before the measurements. Calibration involves either calibration checks (MicroLoop, Micro Medical, Rochester, United Kingdom) or recalibrations according to ATS standards. Both procedures consist of delivering different flows with a calibrated 3-L syringe stored at room temperature.
To evaluate the different spirometers, we used the Hans Rudolph series 1120 flow-volume simulator. This device is able to generate flows of up to 16 L/s and volumes of up to 8.5 L. It can impose any kind of flow-volume curves previously sampled via the software Waveform Editor.
Protocol
After calibration or a calibration check according to the manufacturer's instructions, we first injected ambient air into the spirometers using a reference syringe (Hans Rudolph 3-L calibration syringe, volume of 2,993.32 mL [−0.22%] at 22.6°C). This test was performed 3 times and at different speeds (at a maximum flow of 1, 2, and 3 L/s). During this step, the volume measured by the spirometers may slightly exceed the volume of air originating from the calibration syringe. Indeed, any volume of air introduced in a spirometer will be artificially increased by the BTPS factor, which corrects for the difference in temperature and water vapor pressure between the airways and outside. This also applies for the 3 L of ambient air originating from the calibration syringe volume. We obtained the expiratory BTPS factors from the manufacturers, and we applied these factors to correct the volumes measured by the spirometers because we observed that, except for the PocketSpiro, the devices use a fixed expiratory BTPS factor. In other words, the volumes measured during expiration do not change according to the ambient variables. According to the ATS/ERS guidelines, the volumes measured with the 3-L syringe should meet the accuracy requirement of ± 3.5%.
Thereafter, the instrument resistance of each device was calculated by submitting each spirometer to a range of constant flows (between 0.5 and 11 L/s) generated by the Hans Rudolph simulator. Pressure drop across the spirometer was measured using the pressure transducer of the simulator. The resistance of the tubing at a given flow was subtracted from the total resistance to obtain the instrument resistance. These measurements were done in triplicate.
Different waveforms from the computer-controlled simulator were introduced into the spirometers, each curve being imposed 3 times on each spirometer. The peak expiratory flow (PEF), FVC, and FEV1 measured by the spirometer were compared with the actual values delivered by the simulator. For this part of the study, 9 curves from the standard test waveform set of the ATS were selected according to their complexity and to cover a broad range for all studied parameters. Five of these were volume-time curves (curves 1–4 and 12), and 4 of these were flow-time curves (curves 2, 8, 9, and 25). In addition to these standard curves, we also selected 6 flow-time curves obtained with subjects: one volunteer, 3 subjects with stage IV COPD, and 2 subjects with restrictive lung disease (Fig. 1 and Table 2). These spirometries, provided by the lung function laboratory of Ghent University Hospital (Ghent, Belgium), were performed on a Vmax Spectra (software version 12-7; SensorMedics, Yorba Linda, California). Flow, time, and integrated volume signals were recorded at a sampling rate of 500 Hz, saved in an Excel sheet, and subsequently downloaded on the Hans Rudolph simulator.
All the ATS and subject curves were tested under BTPS conditions. An external humidifier (HC150 with Ambient Tracking, Fisher & Paykel Healthcare, Auckland, New Zealand) and the heater of the lung simulator were used to obtain heated (37°C) and humidified (100% relative humidity) air. This setup allowed us to mimic human conditions without exceeding 2 min between each of the different maneuvers. Temperature and humidity were checked before each generation of flow (Dicon SM, Jumo, East Syracuse, New York).
Analysis
The readings provided by the different spirometers with the 3-L syringe at 3 different flows were individually compared with the expected values. Instrument resistances were expressed in cm H2O/L/s and presented as means of 3 measurements for each different flow.
The repeatability and accuracy of the 3 measured variables (PEF, FEV1, and FVC) were calculated for each of the 5 spirometers in accordance with the 2005 ATS/ERS guidelines only under BTPS conditions. To investigate repeatability, we computed the span between the maximum and minimum values for the 3 variables in each of the 15 curves. For a given variable, that span was defined as the difference between the maximum and minimum values of the successive maneuvers and expressed as: span = maximum − minimum. The percentage span was defined as: percentage span = 100 × (maximum − minimum)/average. The average is the mean of 3 successive maneuvers.
The accuracy of the spirometers was assessed in different ways. We first calculated the absolute and percentage deviation of the 15 curves for the 3 variables, and for each of the spirometers, we computed the number of accuracy errors as defined by the ATS/ERS guidelines: deviation = average − standard; percentage deviation = 100 × (average − standard)/standard. The average is the mean of 3 successive maneuvers, and the standard is the value of the standard ATS curve.
We took the average of 3 measurements for each parameter for the Bland-Altman analysis. We calculated the bias, which was defined as the mean difference between the volume (or flow) measured by a spirometer and the volume (or flow) generated by the simulator (virtually the target value of the ATS and subject curves) for each curve and each spirometer. The bias or mean difference between the spirometers and the Hans Rudolph simulator was calculated separately for the ATS and subject curves. P < .05 was considered significant, meaning that a bias was present or significantly different from zero. A positive bias means that, on average, the tested spirometer overestimates the measurements with respect to the simulator and vice versa.
This study was approved by the ethics committee (B403201318648) of the Cliniques Universitaires Saint Luc (Brussels, Belgium).
Results
Volume Measurements With Calibration Syringe at Different Flows
Table 3 presents the volumes measured by the different spirometers after injection of 3 L of air with the calibration syringe at different flows under ambient conditions. Volume measurements were slightly affected by air flow but remained within the calibration limits of 3.5%, proposed by the ATS/ERS guidelines.
Instrument Resistance
Figure 2 shows the resistance values of each spirometer for different flows. Resistance increased with flow except for the PocketSpiro, which uses a variable orifice membrane sensor. Four of the 5 spirometers exhibited a resistance of < 1.5 cm H2O/L/s at a flow of 14 L/s. The MicroLoop presented a resistance exceeding 1.5 cm H2O/L/s for flows above 7 L/s and did not comply with the recommendations of the ATS/ERS guidelines.
Repeatability
The repeatability of all spirometers tested under BTPS conditions is shown in Table 4. None of measured values fell outside the limits defined by the ATS/ERS guidelines.
Accuracy
The results of the 9 ATS and 6 subject curves were analyzed separately. Table 5 presents the FVC data, and Table 6 shows the mean ± SD for FEV1, FVC, and PEF generated by the simulator for each spirometer and the mean ± SD for the biases in absolute values and as a percentage. The number of tests that were outside the ATS/ERS standards is summarized (errors).
Of the 5 spirometers, only the SpiroStar (Medriko, Kuopio, Finland) was not in accord with the ATS/ERS requirements for accuracy of PEF measurement (see Table 6). In contrast, the PocketSpiro and SpiroScout failed to measure accurately the FVC of all the ATS curves. Figure 3 shows that the SpiroStar underestimated all the FEV1 values while meeting all the ATS/ERS criteria when the curves were measured under BTPS conditions. In this case, the bias was proportional to the measured FEV1 (r2 = 0.92), indicating a proportional error. In fact, almost all spirometers had a significant bias, so they have a tendency to overestimate or to underestimate the values. Table 6 shows that significant biases were also present for the FVC and PEF measurements.
A large difference in bias appeared to exist between the ATS curves and the subject curves, with the biases of the ATS curves remaining more within acceptable limits. Particularly the FVC of the SpiroScout (Fig. 4) and the PEF of the SpiroStar were definitely beyond the ATS/ERS limits. Moreover, the biases of the curves originating from subjects with COPD exceeded those originating from healthy subjects and subjects with restrictive disorders, as illustrated with the Bland-Altman plot (see Fig. 3 and Table 6).
Discussion
In this study, an independent test of 5 office spirometers using a validated flow-volume simulator revealed significant accuracy issues with some devices under BTPS conditions. Moreover, we highlighted some important disagreements in volumes and flows between the simulator and the office spirometers, which became apparent with curves obtained from subjects with severe COPD but remained unnoticed with the currently recommended ATS curves.
This study has some limitations that have to be addressed before any further discussion and contextualization of the data. First, we tested only one device for each model of spirometer, except for the MicroLoop, for which we requested another flow sensor to double-check the unexpected high resistance. The number of each spirometer to be tested is not debated in the ATS/ERS guidelines, where it is stated that a production spirometer should be submitted to the test with the pump system. Future studies could answer this question by testing several spirometers instead of only one. We followed scrupulously the recommendations of the manufacturers in performing a true calibration of the devices that require it and only a calibration check if this was recommended or feasible, mimicking real-life conditions as much as possible.
Another limitation is that we did not update the software of the devices during the course of the study; the new devices that reached the market after the start of the study were not included. From a theoretical point, we might have missed some of the defaults spirometers may generate because we tested only a subset of the 50 curves currently recommended by the ATS/ERS guidelines. The 9 selected curves were among the most complex and extreme curves found in the ATS set. If our data are not ideally suited to make true comparisons between office spirometers, they are still exploitable from a methodological point of view because the spirometers were tested with a sample of curves representing patients currently seen in daily practice, in addition to the recommended ATS curves. Admittedly, it might have been interesting to extend the present observations to a wider range of subject curves. Indeed, the characteristics of the curves obtained from our subjects with COPD were similar to each other in terms of FEV1, precluding any generalization of our conclusions. Nevertheless, with only 3 COPD subjects randomly selected from a database of an out-patient clinic, we identified problems in 3 spirometers, which were partly or completely overlooked with the conventional ATS curves.
We observed that both the simulator and the spirometers provided highly reproducible results and that the calibration check with the 3-L syringe remained within the expected limits. Conversely, none of the devices tested was perfect in terms of accuracy for volumes and flows during a forced expiratory maneuver.
Indeed, the ATS/ERS guidelines define acceptable spirometer performance as < 3 accuracy errors for either FVC or FEV1 across the 4 BTPS waveforms. Although this was always the case for the FEV1, 3 spirometers produced individual deviations between the target FVC value and those measured by the spirometer. Moreover, the Bland-Altman analysis revealed a tendency for 3 spirometers to overestimate the real value. In other words, a spirometer can present a significant bias even if it perfectly meets the ATS/ERS requirements, as seen with the SpiroStar. The existence of a systematic bias and/or proportional errors may limit the interchangeability of the measurements between different spirometers and questions the reliability of data generated with portable spirometers in epidemiologic studies. The clinical importance of the errors depends on the variable tested. The largest errors were found in the FVC measurements (see Table 5). FVC is one major outcome parameter in several clinical studies on the treatment of idiopathic pulmonary fibrosis.7 This parameter is a robust predictor of the survival of patients with interstitial lung diseases. If we assume that the long-term repeatability of all the devices we tested is good, the presence of biases may preclude their interchangeability during the follow-up of patients.
As an example, using COPD curve 3 with a target FVC of 2.901 L, we obtained 2.109 L with the SpiroScout and 3.002 L with the SpiroStar. When we look at the average bias (see Table 6), we observe a significant negative bias with the subject set of curves with the SpiroScout (−0.281 L), but no significant bias with the SpiroStar (−0.002 L). During a longitudinal study or follow-up of patients, the biases we observed may become a clinically important issue if we have to replace the spirometer. Therefore, when a device must be replaced, if the same model is not available, we suggest choosing a model with the same characteristics, that is, the same bias.
A new and surprising finding of this study was that the SpiroScout, SpiroBank (MIR, Rome, Italy), and SpiroStar failed to accurately measure some of the FVC or PEF values using curves obtained from subjects. This questions whether the currently recommended ATS curves should be supplemented by a series of curves characterized by a prolonged and very low expiratory flow, as seen in patients with severe COPD.
Manufacturers claim that all spirometers presented in this study comply with the requirements of the ATS/ERS. This label supposes that their accuracy and repeatability were checked with the standard curves using a computer-driven piston pump. The ATS/ERS guidelines state that, under ambient conditions, acceptable spirometer performance is defined as < 3 accuracy errors for either FVC or FEV1 across the 24 waveforms (< 5% error rate) and that, under BTPS conditions, all 4 curves should be measured within the limits (± 4.5% or ± 200 mL, whichever is greater). As these errors often occur in the most complex curves, problems occurring in very demanding curves with a steep rise time or low expiratory flows, as seen in patients with severe COPD, may be easily missed, and an ATS/ERS compliance label may be provided to devices that do not deserve it.
Careful analysis of our data indicated that some office spirometers experienced difficulties in measuring flows and volumes correctly if the curves presented the following features: low-rise time, high-frequency oscillations, and low flows over a long expiratory time. Other variables such as the extrapolated volume and a rapidly decreasing PEF might also influence the results. These characteristics are more frequently found in patients with severe COPD (Global Initiative for Chronic Obstructive Lung Disease Stage IV). As it is difficult to find one real-life COPD patient presenting all these features in one curve, such a curve was recently created by combining the critical issues found in our COPD patients, calling it the critical curve. Its flow-volume representation and its characteristics are given in Figure 5 and Table 7. It is characterized by a low-rise time, a rapid decrease in flow after the PEF, and very low flows over a few seconds with high-frequency oscillations. In a preliminary study, we tested all the spirometers with this new curve under BTPS conditions, and we very easily identified the 2 devices that also presented problems in the present study. Whether this critical curve might be useful in the initial evaluation of newly marketed office spirometers remains to be investigated in a prospective study.
In light of these exciting findings, we could imagine a new procedure to certify spirometers. If our concept of a critical curve could be implemented, a test procedure could start with 3 curves, each with one particular feature such as low flow, low-rise time, or high-frequency oscillations, and a fourth curve would combine all these critical parameters. In a second step, some of the current ATS curves (maybe 8–10 curves) would be kept to test the spirometers throughout the range of spirometric values measured in a sample of patients and healthy volunteers. Even if it is more time-consuming, we favor testing under BTPS conditions, which are closer to real-life conditions. We also propose to include a Bland-Altman analysis for expression of the results, not mentioned in the ATS/ERS guidelines, because this method helps to identify problems due to proportional errors and bias. Finally, the spirometers should be tested in real-life conditions in a comparative assessment with standard validated spirometers in a significant sample of patients and healthy volunteers. There is no consensus on the number of healthy volunteers and patients required to test spirometers, but published studies were performed using from 158 to 759 subjects. Such a study should not be performed using spirometers connected in series because we have demonstrated that this method is not valid.10
Conclusions
Independent testing of 5 office spirometers with a flow-volume simulator revealed important accuracy issues that became particularly apparent using flow-volume loops obtained in patients. These findings invite the scientific community to reconsider the currently accepted procedures to evaluate spirometers. We propose to improve the testing of spirometers by implementing new flow-volume curves that are more realistic than the original ones. By refining the analysis of the results, we may identify some biases not shown by the ATS/ERS procedure. However, whether such improvements may eventually affect the quality of the diagnostic process in patients with respiratory symptoms remains to be demonstrated.
Acknowledgments
We thank the MEC Group and particularly J.-Y. Moens Eng (Medical Electronic Construction, Brussels, Belgium) for allowing us to use his Hans Rudolph series 1120 flow-volume simulator.
Footnotes
- Correspondence: Giuseppe Liistro MD PhD, Pneumology Department, Cliniques Universitaires Saint Luc, Université Catholique de Louvain, 10 avenue Hippocrate, 1200 Brussels, Belgium. E-mail: giuseppe.liistro{at}uclouvain.be.
Dr Liistro presented a version of the paper at the American Thoracic Society 2011 International Conference, held May 13–18, 2011, in Denver, Colorado.
The authors have disclosed a relationship with the MEC Group.
- Copyright © 2014 by Daedalus Enterprises