Introduction

Weaning from mechanical ventilation represents an important issue, because both an early and a delayed extubation can burden the patient’s health, increasing the risk of infections and the length of hospital stay.

Although many patients show stable conditions just after disconnection from mechanical ventilation, spontaneous breathing can become gradually less effective in sustaining valid ventilation, sometimes requiring the reinstitution of mechanical ventilation. This suggests the importance of identifying predictors of weaning from mechanical ventilation. Many studies have assessed the possibility to predict weaning in critically ill patients reliably [1, 2, 3, 4, 5, 6, 7, 8]. One of the major methodological limitations of many of these studies was the lack of blinding [9].

The aim of this study was to conduct a prospective, blinded evaluation of the most diffuse predictors of weaning in a non-selected sample of critically ill patients. We analyzed several indexes including: airway occlusion pressure (P0.1), maximal inspiratory pressure (MIP), respiratory frequency to tidal volume (f/VT) ratio, P0.1 associated with MIP and f/VT ratio, minute ventilation, respiratory rate, tidal volume and vital capacity. From a first group of patients (training set) we identified the threshold values for each index; then, we tested the predictive accuracy of these values in a prospective-validation set of patients.

Patients and methods

Ninety-three patients were evaluated; their clinical characteristics are reported in Table 1. Before the weaning trial, all the patients were receiving pressure support ventilation 10–15 cmH2O and PEEP 3–5 cmH2O. All patients were intubated with orotracheal tubes, 7.5–8.5 mm internal diameter. The ventilator used was the Servo Ventilator 300 (Siemens, Sweden).

Discontinuation from mechanical ventilation was attempted when the primary physician judged that the patient was ready to be weaned, according to the following criteria: (1) the cause for starting mechanical ventilation had resolved or clearly improved; (b) body temperature was below 38.5°C; (c) hemoglobin was equal to or higher than 8 g/dl; (d) no intravenous sedatives had been given for at least 24 h before the weaning trial; (e) there were no clinical signs of left ventricular failure/no cardiac rhythm or conduction disturbances [10]. These are the standard clinical criteria commonly adopted in our intensive care unit when deciding whether a patient is ready to be weaned. When all these criteria were present, the ability of the patient to sustain spontaneous breathing was evaluated with a 2-h T-piece trial.

During the first 2 min after discontinuation of mechanical ventilation the following tests were performed, six of these were single variables: vital capacity (ml/kg); tidal volume (ml/kg); airway occlusion pressure(cmH2O); minute ventilation (l); respiratory rate (breaths/min) and maximal inspiratory pressure (cmH2O), while three were derived variables: f/VT (breaths/min per l); P0.1/MIP and P0.1 × f/VT (cmH2O/breaths per min per l). Our measurement techniques have been extensively described in a previous study [3]. The data obtained were not available for the attending physician, who was unaware of the results of the weaning tests and, therefore, independently took the decision to continue the T-piece trial or reinstitute the ventilatory support.

During the 2-h period of spontaneous breathing, tolerance was continuously evaluated by the attending physician. The trial was stopped if at least one of the following intolerance criteria was present: respiratory rate above 35 (breaths/min); PaO2 below 65 mmHg with FIO2 less than 0.6; pH 7.34 or less; heart rate equal to or above 130 beats/min or increased by 20% or more, or if arrhythmias appeared; systolic blood pressure without inotropes below 80 mmHg or above 200 mmHg; ineffective cough; uncoordinated thoracoabdominal movement; activation of the accessory muscles; agitation or depressed mental status [11].

If the patient had poor clinical tolerance, ventilation was restarted; if, however, the patient remained stable at the end of the 2 h, the endotracheal tube was removed. A weaning trial was considered a failure when the patient did not tolerate the spontaneous breathing trial and required reconnection to mechanical ventilation. Weaning was considered successful if spontaneous breathing was sustained for more than 48 h after extubation. Finally, extubation was considered a failure if the patient required reintubation within 48 h.

The study had two different parts: during the first one, data were used to select the cut-off value for weaning predictors. The selected values were those that resulted in the fewest false classifications. During the second part, the threshold value for each index was assessed prospectively in an additional group of patients. All the patients in the first 4 months of the study served as the training set and those in the following 4 months as the prospective-validation set.

A true positive (TP) result was defined as when a test predicted successful weaning and weaning actually occurred; a false positive (FP) result was defined as when a test predicted successful weaning but weaning failed; a false negative (FN) result was defined as when a test predicted weaning failure but it was indeed successful; a true negative (TN) result was defined as when a test predicted weaning failure and the patient really failed the weaning trial [5].

Receiver operating characteristic curve analysis was performed with MedCalc software version 6.10.001 (2001 Frank Schoonjans, Belgium) [12]. This analysis provides a powerful means of assessing a test’s ability to discriminate between two groups of patients with the advantage that the analysis does not depend on the threshold value selected. The value selected as the threshold value was the one that had the highest accuracy (minimal false negative and false positive results).

Standard formulas were used to calculate the sensitivity TP/(TP+FN), specificity TN/(TN+FP), accuracy (TP+TN)/(TP+TN+FP+FN), likelihood ratio of positive test (ρ+) = sensitivity/(1-specificity) and likelihood ratio of negative test (ρ-) = (1-sensitivity)/specificity. Positive and negative likelihood ratios are independent of the prevalence of the disease. The cut-off values were then assessed in an additional group of 51 patients and the predictive performance of each index was evaluated by calculating the area under the receiver operating characteristic curve (AUC).

According to an arbitrary guideline [13], one could distinguish between non-informative (AUC=0.5), less accurate (0.5<AUC≤0.7), moderately accurate (0.7<AUC≤0.9), highly accurate (0.9<AUC<1) and perfect test (AUC=1) [14].If the 95% confidence interval for the area does not include the 0.5 value, there is evidence that the test has an ability to distinguish between the two groups [12].

In the prospective-validation set, the prevalence of weaning success and weaning failure were calculated. We also calculated the likelihood ratios = (ρ+)/(ρ-) for each index in the prospective-validation set [15]. Likelihood ratios between 0.5 and 2.0 indicate that a weaning parameter is associated with only small changes in the post-test probability of success or failure. Likelihood ratios from 2 to 5 and from 0.3 to 0.5 correlate with small but potentially important changes in probability, while ratios of 5–10 or 0.1–0.3 correlate with more clinically important changes in probability. Ratios of higher than 10 or lower than 0.1 correlate with very large changes in probability [16, 17, 18, 19].

Finally, in the prospective-validation set, according to Sassoon [5], we used Bayes’ theorem to assess the performance of each test in predicting weaning outcome as a function of prevalence of weaning success and failure in our population. Bayes’ theorem allows the calculation of the probability of success or failure of weaning after the performance of a test (post-test probability). The formulae used to calculate post-test probability are shown in Table 5 [20].

The results are reported as means ± standard deviation. Comparison between proportions was made using the chi-square test (with Yates’ correction for continuity); comparison between means was made using the F-test: a probability of less than 0.05 was considered significant.

Results

We included 93 patients: their main characteristics are described in Table 1. Initially, we intended to distinguish between extubation failure and failure weaning patients. However, we could not make such a distinction because only one patient in the study required intubation within 48 h after extubation. The prevalence of extubation failure in this study was only 0.011 (1/93).

Table 1 Clinical characteristics and cause of the acute respiratory failure in the entire study population (ALI acute lung injury, ARDS acute respiratory distress syndrome, CHF cardiac heart failure, COPD chronic obstructive pulmonary disease, FW weaning failure, MV duration of mechanical ventilation before weaning trial, PEEP positive end-expiratory pressure, PSV pressure support ventilation, SAPS II Simplified Acute Physiologic Score II, SW successful weaning). Note that one patient with extubation failure was not included in the analysis

We did not observe significant differences between the groups “successful weaning” and “weaning failure” regarding their clinical characteristics, diagnosis, sex, weight, height, duration of mechanical ventilation before trial weaning and Simplified Acute Physiologic Score II, neither in the training set nor in the prospective-validation set (Table 2). No statistical difference between the two groups was observed concerning the values of heart rate, systolic blood pressure, pH and PaO2 (Table 3).

Table 2 Clinical characteristics, cause of acute respiratory failure and weaning outcome in the training set and in the prospective-validation set (see Table 1 for explanation of abbreviations). Note that one patient with extubation failure was not included in the prospective-validation set. p<0.05 was considered significant
Table 3 Index and clinical variables measured during the first 2 min after discontinuation of ventilator support (f/V T rapid shallow breathing, FW weaning failure, HR heart rate, MIP maximal inspiratory pressure, P 0.1 airway occlusion pressure, RR respiratory rate, SBP systolic blood pressure, SW successful weaning, VC vital capacity, V MIN minute ventilation, V T tidal volume). Note that one patient with extubation failure was not included in the prospective-validation set. No significant differences were observed between SW and FW in the training set and in the prospective-validation set

Ninety patients had an inspiratory support level between 13 and 18 cmH2O. In order to homogenize this level, we applied by protocol an inspiratory pressure support of 15 cmH2O. Two patients were already ventilated with a pressure support level of 10 cmH2O that was not modified.

In the training set, the threshold values that discriminated between successful weaning and weaning failure are shown in Table 4. In this group the prevalence of weaning success was 0.54 and that of weaning failure was 0.46. The accuracy was 0.71 for maximal inspiratory pressure, slightly higher than the accuracy of vital capacity and tidal volume. These results were in accordance with the values of the likelihood ratio of positive test and likelihood ratio of negative test.

Table 4 Threshold values had the best discrimination between patients successfully weaned and those who failed in the training set; > and ≤ indicate whether the values above the threshold or those below it were predictive of successful weaning (AUC area under receiver operating characteristic curve, CI confidence interval, FN false negative, FP false positive, f/V T rapid shallow breathing, MIP maximal inspiratory pressure, ρ+ likelihood ratio of positive test, ρ- likelihood ratio of negative test, P 0.1 airway occlusion pressure, RR respiratory rate, SE standard error, VC vital capacity, V MIN minute ventilation, V T tidal volume). Larger values of ρ+ and smaller values of ρ− indicate greater diagnostic ability. ρ− and ρ+ are independent of prevalence of weaning outcome

In the prospective-validation set the prevalence of weaning success was 0.72 (37/51) while that of weaning failure was 0.28 (14/51). In the entire study population the prevalence of weaning success was 0.64 (59/92) and the prevalence of weaning failure was 0.36 (33/92). In the prospective-validation set the likelihood ratio values ranged between 0.69 and 1.87: therefore all the indexes were associated with small changes in the post-test probability of success or failure [9, 15] (Table5). These results were in accordance with the values of the probability calculated by Bayes’ theorem and according to our prevalence of success or failure of weaning.

Table 5 Area under receiver operating characteristic curve (AUC), likelihood ratio (LR) and probability to predict weaning outcome according to the prevalence in the prospective-validation set (see Table 4 for explanation of abbreviations). Note that one patient with extubation failure was not included in the prospective-validation set. Formulae for estimation of post-test probabilities (Bayes’ theorem): P(W+) prevalence of weaning success (pre-test probability); P(NW-) prevalence of weaning failure (pre-test probability); P(T+∣W+) true positive rate (sensitivity); P(T+∣NW-) false positive rate; P(T-∣NW-) true negative rate (specificity); P(T-∣W+) false negative rate. Probability for weaning success if test is positive: \( P(W + \left| {T + )} \right. = \frac{{P(T + \left| {W + ) \times P(W + )} \right.}} {{P(T + \left| {W + ) \times P(W + ) + P(T + \left| {NW - ) \times P(NW - )} \right.} \right.}} \)Probability for weaning success if test is negative: \( P(W + \left| {T - )} \right. = \frac{{P(T - \left| {W + ) \times P(W + )} \right.}} {{P(T - \left| {W + ) \times P(W + ) + P(T - \left| {NW - ) \times P(NW - )} \right.} \right.}} \)

The AUCs in the prospective-validation set are shown in Table 5: showing that all the evaluated tests appeared to be poor predictors of weaning outcome, as suggested by the 95% confidence interval estimate. In detail, the integrative indexes did not reveal a high ability to distinguish between successful weaning and weaning failure, because the AUC values for maximal inspiratory pressure and P0.1 were not significantly different from the area for P0.1/MIP (p=0.72 and p=0.07, respectively). Also the AUC for P0.1 × f/VT was not different from the areas for f/VT (Fig. 1) and P0.1 (p=0.52 and p=0.16, respectively) [12].

Fig. 1
figure 1

Receiver operating characteristic curve for respiratory frequency to tidal volume (f/V T ) in the prospective-validation set. Area under the curve ± standard error and 95% confidence interval are given in Table 5

Discussion

The purpose of weaning indexes is to provide easy discrimination between those patients who can be successfully weaned from mechanical ventilation and those who are unable to be weaned. Many factors can influence the weaning outcome: the functional parameters used as indexes of weaning, the criteria used to define failure or success, the moment at which the patients are studied, different clinical practice from unit to unit and the different populations.

This study included a non-selected population of a general intensive care unit and reflected the activity of our every day clinical practice. Specific care was adopted to avoid the limitation represented by the lack of blinding, a bias frequently observed in previous studies [10]. Our results clearly show that all the evaluated indexes are poor predictors of weaning outcome and are partially different from those previously reported [21, 22, 23, 24].

In the prospective-validation set, likelihood ratios were between 0.61 and 1.87 for all the indexes evaluated. These values indicate that weaning parameters were associated with only small, clinically unimportant changes in the post-test probability of success or failure[9]. Applying the Bayes’ theorem in the prospective-validation set, we also found that, given the prevalence of the weaning outcome in this group, all indexes were of little use in discriminating between those patients who could be successfully weaned and those in whom the weaning trial would have failed.

It is also important to emphasize that, in our study, the prevalence of weaning outcome (‘a priori’ probability) was not only determined by the patient population but also by other factors, including the physician’s clinical judgement and the standard protocol used in our intensive care unit.

According to the method proposed by Yang and Tobin [1], we determined the cut-off values by using the receiver operating characteristic curve analysis and selected as the threshold value the one that resulted in the highest accuracy (minimal false negative and false positive results) that is independent of specific cut-off values. This approach assumes that the outcomes related to false positive and false negative are equivalent and do not account for the pre-test probability. This concept is theoretically linked to the receiver operating characteristic curve through the optimality criterion: S = [(1-P)/P] × CR, where P denotes the prevalence in the target population and CR (cost ratio) = [(CFP-CTN)/(CFN-CTP)] represents the utilities associated with the four possible test outcomes, respectively, and S is the slope of the receiver operating characteristic curve at the optimal operating point [25, 26].

A weakness of this approach is that it requires the users to quantify the consequences of each possible test outcome. The slope approach requires a smoothed function (e.g. binomial distribution), which introduces additional uncertainties. Therefore, to plot the true positive rate (sensitivity) as a function of the false positive rate (100-specificity) for different cut-off points, provides a more practical solution to the problem.

In our patients, none of the indexes investigated appeared to be a good test of screening, as they were all characterized by a high sensitivity and a low specificity. We observed the highest sensitivity and specificity for vital capacity and tidal volume. Minute ventilation showed a high sensitivity and a low specificity because, compared to vital capacity and tidal volume, it had more true positives and fewer true negatives. Vital capacity, tidal volume and minute ventilation showed a high proportion of false positives and false negatives (12+5, 6+17 and 5+21, respectively). The poor predictive value of such indexes was further supported by the respective values of the AUC and likelihood ratios. We further supported the finding of a poor predictive value for these tests by applying the Bayes’ theorem, based on the prevalence of the weaning outcome in this group.

Generally, a low predictive value for a test is observed when the study population is heterogeneous with respect to clinical characteristics and diagnosis (as it was in our study). A low predictive value can also depend on the way measures are taken and the method used to determine the cut-off value for sensitivity and specificity estimates. In our study, by using the receiver operating characteristic curve analysis, we chose the cut-off value which was associated with the smallest number of false positives and negatives.

Another reason with which to explain the low predictive value of these indexes was the use of clinical criteria indicating the need to restart mechanical ventilation during the T-piece trial: these criteria could make respiratory rate and derived parameters (P0.1 × f/VT and f/VT) less useful for establishing the proportion of false negatives because, after the first 2 min, a respiratory rate of more than 35 was a sufficient criterion for the attending physician to stop the weaning trial. We considered it unethical to keep a patient in a T-piece and proceed to extubation when clear clinical signs of intolerance were present. Obviously, it was impossible to blind the respiratory rate to the attending physician, because the respiratory rate was used as the clinical criterion for confirming the ability of the patient to sustain spontaneous breathing.

The low discriminative ability of a test may also depend on the method used to take measurements. For example, in our study maximal inspiratory pressure was measured after expiration to functional residual capacity and not to residual volume [27]. The maximal inspiratory pressure mean value for the entire study population was 22±7 (SW: 24±7 and FW: 20±6 cmH2O, respectively). Such a relatively low value may be explained by the severity and old age of our case mix, which included many patients with COPD, ALI/ARDS and neurological disorders. Moreover, the group of patients with postoperative respiratory failure included only patients who underwent emergency surgery. P01 also was not a good test for screening, because it showed a high sensitivity (0.94) and a low specificity (0.07): these data were confirmed by using the Bayes’ theorem.

According to a recent review [28], a distinction between weaning failure (inability to tolerate spontaneous breathing without ventilatory support) and extubation failure (inability to tolerate removal of the translaryngeal tube) has been increasingly recognized. This analysis was not made in our study because only one patient required intubation within 24 h of extubation, after 2 h of spontaneous breathing.

Two arguments can explain our low reintubation rate. First, a spontaneous breathing trial (T-piece) can yield a low reintubation rate [29]. Recent studies [30] have shown that almost 76% of ventilated patients can be extubated after a 2-h spontaneous breathing trial. Moreover, we included clinical signs indicating respiratory muscle capacity and load imbalance, such as uncoordinated thoracoabdominal movements and activation of the accessory muscles of respiration as criteria for spontaneous breathing intolerance. The use of these criteria, along with the traditional criteria for monitoring a spontaneous breathing trial [11], made it easier to identify those patients who presented early signs of increased muscle load. In this way, probably, the prevalence of extubation failure was underestimated in favor of the prevalence of weaning failure.

Finally, the low ability of the evaluated tests to discriminate successful weaning and weaning failure can also be explained by the fact that they represent only a static measure, collected at a specific moment, whereas weaning is a dynamic process during which the physiologic variable measured is continuously influenced by the patient’s clinical condition.

On account of our results and those from other recent studies [29, 31], we suggest that weaning should be based on clinical evaluation and strict protocols, and that the use of predictive tests can poorly corroborate clinical judgment.

In conclusion, even when the methodological limitation represented by the lack of blinding of the physicians making decisions about the weaning process is avoided, none of the predictors of weaning studied is powerful enough to predict success: the systematic use of these weaning “predictors” is thus of little use clinically.