Abstract
BACKGROUND: Test-retest reproducibility of the 6-min step test (6MST) is controversial in patients with COPD because the decision to perform a second test is influenced by interruptions, physiological overload, and the patient’s exercise tolerance. The aim of this study was to analyze the reproducibility of performance on the 6MST (ie, number of steps climbed and interruptions) and physiological variables in subjects with COPD, with and without poor exercise tolerance, and with and without interruptions during the test.
METHODS: Subjects performed 2 6MST (6MST1, 6MST2) with a minimum of 30 min rest between tests. Physiological variables were assessed with a gas analyzer. Subjects who performed ≤ 78 steps in the 6MST1 and ≤ 86 steps in the test with the higher number of steps performed (6MSTBEST) were considered to have poor exercise tolerance. Subjects were also stratified according to those who interrupted the 6MSTBEST and those who did not interrupt the 6MSTBEST.
RESULTS: 40 subjects (31 men; FEV1 percent of predicted = 50.4 ± 13.5) participated in the study. The number of steps, interruptions, and physiological variables showed moderate to high reliability (intraclass correlation coefficient: 0.70–0.99, P < .001). Thirty-one (77.5%) subjects had a better performance during 6MST2 than 6MST1 (mean difference: 4.65 ± 5.59, P < .001). Although the number of times subjects were interrupted was similar between the 2 tests (P = .66), the duration of these interruptions was shorter during 6MST2 (mean difference: –0.12 ± 0.39 s, P = .040). The difference in the number of steps (6MST2 − 6MST1) did not differ between subjects who performed ≤78 steps (mean difference: 5.64 ± 5.32 steps; 10.3%; P < 0.001) and ≥ 79 steps (3.00 ± 5.82 steps; 6.13%; P = 0.08) on the 6MST1 (P = 0.15) and between subjects who performed ≤ 86 steps (5.39 ± 5.14 steps; 9.39%; P < 0.001) and ≥ 87 steps (2.92 ± 6.43 steps; 2.74%; P = 0.14) steps on the 6MSTBEST (P = 0.20).
CONCLUSIONS: Performance and physiological variables in the 6MST were reproducible, and a second test did not impose greater physiological overload. Two tests were essential for patients with poor exercise tolerance.
Introduction
Patients with COPD often show poor exercise tolerance and reduced ability to perform activities of daily living.1 Functional capacity is associated with poor outcomes in COPD, such as number of hospitalizations and mortality.2,3 Functional capacity can be assessed objectively with field tests, namely the 6-min step test (6MST),4 which is a safe, inexpensive, and easy-to-administer test with reduced space requirements.5
The 6MST is valid and reliable when performed by the same assessor on a single day4,6,7; it also shows responsiveness to a physical training program8 for subjects with COPD. Two previous studies recommended the administration of a single 6MST,4,6 unlike recommendations for other self-paced field tests that assess functional capacity, in which a learning effect plays an important role in performance.9 da Costa et al6 compared the performance of 3 6MSTs and found no significant differences among tests; however, the use of analysis of variance for such comparisons may have contributed to a type-2 error. A learning effect of 6.28% in the second 6MST performance was observed in a study conducted by our group.7 However, the clinical relevance of this result has not been thoroughly investigated, and it cannot be concluded that performing a single 6MST is sufficiently reliable to assess functional capacity.
In our previous study, we reported that the 6MST has a submaximal physiological profile,7 similar to the 6-min-walk test (6MWT),10 and that the performance of 2 6MSTs with a 30-min rest interval does not appear to impose greater physiological overload.7 However, this recovery period could be insufficient for patients with poor exercise tolerance, as the 6MST may induce overloads near a maximum effort. Therefore, it is relevant to analyze the responses in this group. Another interesting finding in our study was that most subjects interrupted the 6MST, and that these interruptions were reproducible (intraclass correlation coefficient [ICC] = 0.97, P < .001).7 There may be a reduction in the number and duration of interruptions as well as an improvement in performance on the second 6MST, as the patients become familiar with the test and may feel more comfortable to conduct it. In addition, subjects with greater functional limitations have a stronger learning effect in field tests,11-13 although there is no evidence of this outcome for the 6MST. This would justify a second test, especially in these patients.
The aim of this study was to analyze the reproducibility of the 6MST considering performance and physiological variables in subjects with COPD, with and without poor exercise tolerance, and with and without interruptions during the test.
QUICK LOOK
Current Knowledge
Patients with COPD present reduced functional capacity that can be evaluated with the 6-min step test (6MST). The 6MST is safe, easy to perform in restricted spaces, and has a low equipment cost. In addition, it is a valid test and its outcome (number of steps climbed) presents high intra-rater reliability.
What This Paper Contributes to Our Knowledge
In subjects with COPD, the 6MST was reproducible and presented moderate to high reliability in its performance and physiological variables. The second 6MST did not cause a cardiovascular, ventilatory, or metabolic overload for the subjects. The fact that subjects with poor exercise tolerance had a significant improvement in the second 6MST indicates the need to perform a second test in this population.
Methods
This is a cross-sectional study, approved by the Ethics Committee for Research Involving Human Beings of State University of Santa Catarina (UDESC; 51369015.5.0000.0118). This study was performed at UDESC, Florianópolis, Santa Catarina, Brazil. Patients with COPD enrolled in the Center for Assistance, Teaching, and Research in Pulmonary Rehabilitation at UDESC participated in this study. The inclusion criteria were diagnosis of COPD (GOLD 2–4),14 age 40–80 y, use of medication as prescribed, and clinical stability in the last 4 weeks. The exclusion criteria were inability to perform the evaluations; use of home oxygen therapy; presence of other diseases in addition to COPD (ie, cardiovascular, pulmonary, neurological, and musculoskeletal diseases) that could interfere with the performance of the evaluations; exacerbations of COPD during the protocol; hospitalizations within the last 12 weeks; current smokers or smoking cessation for < 6 months; and participation in pulmonary rehabilitation programs in the last 6 months. Informed written consent was obtained from all subjects.
The protocol was implemented on 2 separate days. On day 1, pulmonary function was assessed and the modified Medical Research Council (mMRC) scale was applied. On day 2, 2 6MSTs were performed and physiological variables were measured. Lung function was assessed using a MasterScreen Body whole-body plethysmograph (Erich Jaeger, Friedberg, Germany), following the guidelines of the American Thoracic Society and the European Respiratory Society15,16 as well as reference values for the Brazilian population.17,18 The mMRC19 scale was used to measure dyspnea and grade subjects according to GOLD.14 Two 6MSTs (6MST1 and 6MST2) were performed on the same day, using a step 20 cm high. Subjects were instructed to perform the maximum number of steps in a self-paced manner during 6 min.4,6,20 Verbal encouragement was given throughout the test, and it was identical to that given during the 6MWT.9 The 6MST was interrupted in the following situations: subject’s request, > 85% of the predicted maximum heart rate,21,22 and < 85%.6 Subjects were instructed to resume the test whenever they felt able, and when heart rate reached 10 beats/min below submaximal heart rate or reached 88%. The interval between tests was 30 min or until signs and symptoms returned to baseline. The number of steps, frequency, and duration of interruptions were computed for performance analysis. Subjects were classified according to exercise tolerance: with poor (≤ 78 steps) and without poor (≥ 79 steps) exercise tolerance on the first 6MST (6MST1);4 and with poor (≤ 86 steps) and without poor(≥ 87 steps) exercise tolerance on the 6MST with the best performance (6MSTBEST; ie, the test with the higher number of steps performed).4 In addition, subjects were also stratified into groups with and without interruptions during the 6MSTBEST. The K4b2 portable metabolic monitoring device (Cosmed, Rome, Italy) was used to perform breath-by-breath analysis of the following physiological variables during the 6MST: heart rate, , breathing frequency, tidal volume (VT), minute ventilation (), oxygen uptake (), carbon dioxide production (), oxygen pulse (/heart rate), and metabolic equivalent of task (MET). The predicted maximum voluntary ventilation22 and ventilatory demand (/maximum voluntary ventilation) were calculated. The values of the physiological variables at rest and at test end as well as the change between these values (Δ = end – rest) were used for the analyses.
Statistical Analysis
Data were analyzed with SPSS Statistics 20.0 (IBM, Armonk, New York). The Shapiro-Wilk test assessed data distribution. The ICC was used to verify the test-retest reproducibility of performance and physiological variables. Reliability was classified as low (ICC < 0.40), moderate (ICC ≤ 0.75), or high (ICC > 0.75).23 The Bland-Altman plot was used to evaluate the agreement between performance and physiological variables for the test-retest comparison. The standard error of measurement (SEM) was calculated as , where SD is the standard deviation of 6MST1; the minimum detectable change (MDC) was calculated as .24 Performance and physiological variables between test-retest were compared using the paired t test or the Wilcoxon test. To compare the variables between subjects classified according to exercise tolerance and interruption of the 6MWT, the independent t test or the Mann-Whitney U test was used. The Pearson or Spearman correlation coefficient was used to verify the correlation between 6MST1 performance and the variation between test-retest (6MST2 – 6MST1). The chi-square test and the Cramer V coefficient were used to verify the association between the classifications regarding exercise tolerance and interruption. A significance level of 5% was set for this study. To achieve an ICC between test-retest of at least 0.50, with a bidirectional α of 0.05 and a power of 90%, the sample size was estimated to be 38 subjects.25
Results
A total of 49 patients were eligible for this study. Of these, 9 subjects were excluded: 5 were unable to perform the 6MST (3 were unable to tolerate the mask for the metabolic system, and 2 reported hip pain); 1 subject exhibited arrhythmia during the test; and 3 subjects experienced a COPD exacerbation during the protocol. Therefore, the total sample included 40 subjects (Table 1).
Reproducibility in the Total Sample
The number of steps on the 6MST2 was 6.44% higher than on the 6MST1 (76.8 ± 28.7 vs 72.2 ± 29.8 steps, respectively; mean difference: 4.65 ± 5.59 [95% CI 2.86–6.44], P < .001). The number of interruptions on the 6MST1 and 6MST2 was similar (1 [range 0–4] vs 1 [range 0–4], P = .66, respectively); however, the duration of interruptions was shorter on the 6MST2 (0.99 ± 1.14 vs 1.11 ± 1.26 s; mean difference: –0.12 ± 0.39 [95% CI –0.24 to 0.004], P = .040). Thirty-one subjects (77.5%) performed better on the 6MST2. The SEM was 4.21, and the MDC was 11.7 steps. Only 5 subjects (12.5%) showed variation higher than the MDC in the 6MST2 performance.
The 6MST performance showed high reliability (ICC 0.97–0.98, P < .001). Figure 1 shows the agreement of the number of steps between test-retest, and most subjects presented better performance on the 6MST2, especially those with worse performance. There was no correlation between the number of steps on the 6MST1 and its variation between test-retest in the total sample (r = –0.28, P = .08).
Subjects showed the same initial clinical conditions in both tests except for dyspnea (Table 2). There was a difference between initial and final physiological variables in the 6MST1 and 6MST2 (Table 2). There was no statistically significant difference in the change of physiological variables between test-retest, or between their final values, except for final heart rate (mean difference: –2.17 ± 6.45 beats/min [95% CI –4.24 to –0.11], P = .039) and final /heart rate (mean difference: 0.40 ± 1.11 mL O2 consumed per heart beat [95% CI 0.04–0.76], P = .03). However, all variables showed moderate to high reliability (Table 3). Figure 2 shows the agreement of breathing frequency, /maximum voluntary ventilation, , and between 6MST1 and 6MST2, confirming the similar behavior of the variables between test-retest, with no trends to decrease or increase on the 6MST2.
Reproducibility Stratified by Exercise Tolerance and Interruption
Twenty-five (62.5%) subjects showed poor exercise tolerance considering 6MST1 performance; of these 25 subjects, 3 had performance ≥ 79 steps on the 6MST2. Considering the cutoff point of 86 steps for the 6MSTBEST, only 1 subject changed classification. Subjects with 6MST1 ≤ 78 steps showed a lower number of steps, and longer and more frequent interruptions of the 6MSTBEST than those with 6MST1 ≥79 steps (Table 4). Considering the cutoff point for the 6MSTBEST, the 6MST group with ≤ 86 steps also showed a lower number of steps and longer duration of interruptions in the 6MSTBEST than the 6MST group with ≥ 87 steps (Table 4). For all groups, the number of steps and the duration and frequency of interruptions between test-retest showed high reliability (ICC = 0.87–0.98, P ≤ .001). The variation in the number of steps (6MST2 – 6MST1) among subjects who performed ≤ 78 steps on the 6MST1 and ≤ 86 steps on the 6MSTBEST (5.64 ± 5.32 steps [95% CI 3.44–7.83], 10.3%; and 5.39 ± 5.14 steps [95% CI 3.40–7.38], 9.39%, P < .001 for both, respectively) and ≥ 79 steps on the 6MST1 and ≥ 87 steps on the 6MSTBEST (3.00 ± 5.82 steps [95% CI –0.22 to 6.22], 6.13%, P = .08; and 2.92 ± 6.43 steps [95% CI –1.17 to 7.00], 2.74%, P = .14) did not differ statistically (P = .15 and P = .20, respectively). The SEM and MDC of all groups are described in Table 4. The number of steps on the 6MST1 correlated with its variation between test-retest only in the group with ≥ 79 steps on the 6MST1 (r = –0.52, P = .045).
Twenty-four subjects (60%) interrupted the 6MSTBEST; of these, 18 subjects (75%) performed ≤ 78 steps on the 6MST1 (Cramer V = 0.32, P = .046). The subjects who interrupted the test performed fewer steps on the 6MSTBEST than those who did not interrupt the test (Table 4). Both groups showed better performance on the 6MST2 than on the 6MST1, and those with test interruptions increased the number of steps by 4.79 ± 5.14 (95% CI 2.62–6.96, P = .001) and those without test interruption increased the number of steps by 4.44 ± 6.38 (95% CI 1.04–7.83, P = .02), representing an increase of 7% and 5%, respectively. This improvement in performance did not differ between groups (P = .85). In addition, the test-retest performance showed high reliability for those who interrupted the test (ICC = 0.99 [95% CI 0.92–1.00], P < .001) and for those who did not interrupt the test (ICC = 0.95 [95% CI 0.79–0.98], P < .001.
Subjects with poor exercise tolerance and those with test interruptions during the 6MST showed no difference regarding the initial physiological variables between 6MST1 and 6MST2 (P > .05), except for heart rate (6MST1: 88.0 ± 13.6 beats/min, 6MST2: 85.5 ± 12.4 beats/min, P = .02), breathing frequency (6MST1: 20.7 ± 3.70 breaths/min, 6MST2: 19.6 ± 4.06 breaths/min, P = .04), and dyspnea (6MST1: 0 [range 0–3] and 6MST2: 0 [range 0–3], P = .04) in the 6MSTBEST group with ≤ 86 steps.
Discussion
This study has extensively investigated the reproducibility of the physiological variables in the 6MST. Its major outcomes endorse previous findings reported by our group, demonstrating that the performance and physiological variables in the 6MST show moderate and high reliability and that subjects improve performance on a second test (mean difference of 6.44%).7 Interestingly, when the sample was stratified by cutoff points, this difference ranged from 9% to 10% in subjects with poor exercise tolerance. We also observed that these subjects had more frequent test interruptions, and the duration of these interruptions decreased in the 6MST2.
A previous study6 reported high reliability (ICC = 0.94 [95% CI 0.89–0.97]) between 2 6MSTs performed on the same day, with a 30-min interval between tests. However, the authors identified no differences between both performances, unlike what we observed in this study. Nevertheless, the mean difference of the test-retest in the study conducted by da Costa et al6 was similar to that found in this study (5.8 vs 4.6 steps, respectively). Two considerations should be taken into account: the smaller sample size reported by da Costa et al6 and the use of analysis of variance for repeated measures in the comparison of performance among the 3 tests may have increased the chance of a type-2 error. Pessoa et al4 suggest that there is no need for a second 6MST, given that 6MST1 demonstrates concurrent and predictive validity. However, the authors did not analyze reliability and did not compare performance between test-retest; they also highlighted that the sample was smaller than that recommended to assess validity, suggesting the need for further studies. In contrast, our results agree with the findings of Munari et al,7 namely that the 6MST is reliable and has a learning effect of approximately 6% and the power for this analysis was 97%.
The subjects in the total sample showed shorter interruptions and performed a greater number of steps on the 6MST2 in this study. However, this result must be carefully analyzed, given that the SEM we noted was close to that found by da Costa et al6 and was similar to the average of the difference for test-retest. Studies that aimed to evaluate reproducibility of other functional tests, such as the 6MWT and the Glittre ADL-test, verified a mean percentage difference in performance similar to what we noted in this study, suggesting the presence of a learning effect and therefore the value in performing 2 tests.11,26 We also identified that the mean difference between 6MST2 and 6MST1 was below the MDC and that only 5 subjects showed a difference above the MDC. Given that the MDC reflects the smallest change in the outcome of the instrument that can be considered as a real difference,24,27 the reliability and responsiveness of the instrument would be compromised if the mean of the difference exceeded MDC.
Motivational factors play an important role in the performance of field tests, especially in self-paced tests.28 Patients may feel uneasy about performing an unknown test at its maximum work load due feelings of anxiety, lack of confidence, or fear of experiencing dyspnea.29-31 Performing a second test gives patients the opportunity to be familiar with the task, and feeling more secure may lead to improved performance. This behavior is common in other functional tests,11,13 such as the 6-min walk test, in which patients with a greater dyspnea in the first test are better able to control it, reducing this sensation on the second test and being more likely to improve performance.32 Although only a trend of correlation was found between the number of steps on the 6MST1 and its variation on the 6MST2 in the total sample, the Bland-Altman plot (Fig. 1) shows that subjects who took fewer steps on the 6MST1 exhibited increased performance on the 6MST2. It was also observed that, in mean percentage, the learning effect of subjects with poor exercise tolerance and with test interruptions was higher than for those without poor exercise tolerance and those with no test interruptions, although the differences were not statistically significant. In addition, because SEM was lower than the mean difference in the groups of patients with poor exercise tolerance and test interruptions, it is possible that the improvement in performance on the 6MST2 could be relevant for these subjects. In contrast, patients with reduced functional limitation usually reach levels close to the maximum overload on the first test, reducing the possibility of improving performance on the second test.33 In this study, in subjects without poor exercise tolerance, the number of steps tended to increase on the retest, and, in the correlation analysis, the greater the number of steps on the 6MST1, the smaller the change on the 6MST2. Therefore, in patients without poor exercise tolerance, a second test may not be necessary.
Our analyses indicate that the performance of a second test after a 30-min interval does not imply greater physiological overload to subjects. This occurred even in subjects with poor exercise tolerance and test interruptions. Similar results were found in the Glittre-ADL test,34 and therefore our study confirms that physiological variables return to baseline values after a short recovery period. Although differences in final heart rate and /heart rate were observed between test-retest in the total sample, the final heart rate was slightly lower in the retest, indicating that the 6MST2 does not impose a greater cardiovascular overload, even with significant improvement in performance. Furthermore, the difference was quite small and probably not clinically relevant. Our previous study had already demonstrated the stabilization of heart rate from the fifth minute of the 6MST7; therefore, it is possible to ensure that subjects reach a stable heart rate even when performing 2 tests.
This was a complementary study to another study conducted by our group that provided additional information on reproducibility of performance and learning effect on the test.7 We suggest performing a second 6MST to enable a more reliable assessment of functional capacity, especially in subjects with poor performance on the 6MST, given that increases of 6–10% in the retest, as observed in this study, can easily be obtained in response to an intervention, impairing the interpretation of its effects.35 In addition, our results confirm the reliability of the number of steps and SEM of the 6MST, which may support further studies that intend to assess its responsiveness to interventions.24
Some limitations should be noted. A type-2 error may have occurred in secondary analyses due to the sample size. However, the sample size calculation was achieved for the aim of this study, and it does not appear to have significantly affected the major findings. Only a few subjects in the sample were classified as GOLD 4; therefore, the results should be cautiously extrapolated to this population.
Conclusions
The 6MST was reproducible in terms of performance and physiological variables for subjects with COPD, without imposing a greater overload when a second test was performed. In addition, performing 2 tests is essential for patients with poor exercise tolerance.
Footnotes
- Correspondence: Anamaria Fleig Mayer PhD, Departamento de Fisioterapia, Núcleo de Assistência, Ensino e Pesquisa em Reabilitação Pulmonar, Universidade do Estado de Santa Catarina, Rua Pascoal Simone 358, CEP 88080-350, Florianópolis, Santa Catarina, Brazil. E-mail: anamaria.mayer{at}udesc.br
This study was supported in part by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (Finance Code 001) and Fundação de Amparo à Pesquisa e Inovação do Estado de Santa Catarina (Termo de Outorga 2017TR645). The authors have disclosed no conflicts of interest.
- Copyright © 2021 by Daedalus Enterprises