BACKGROUND: Ventilator management for children with hypoxemic respiratory failure may benefit from ventilator protocols, which rely on blood gases. Accurate noninvasive estimates for pH or PaCO2 could allow frequent ventilator changes to optimize lung-protective ventilation strategies. If these models are highly accurate, they can facilitate the development of closed-loop ventilator systems. We sought to develop and test algorithms for estimating pH and PaCO2 from measures of ventilator support, pulse oximetry, and end-tidal carbon dioxide pressure (PETCO2). We also sought to determine whether surrogates for changes in dead space can improve prediction.
METHODS: Algorithms were developed and tested using 2 data sets from previously published investigations. A baseline model estimated pH and PaCO2 from PETCO2 using the previously observed relationship between PETCO2 and PaCO2 or pH (using the Henderson-Hasselbalch equation). We developed a multivariate gaussian process (MGP) model incorporating other available noninvasive measurements.
RESULTS: The training data set had 2,386 observations from 274 children, and the testing data set had 658 observations from 83 children. The baseline model predicted PaCO2 within ± 7 mm Hg of the observed PaCO2 80% of the time. The MGP model improved this to ± 6 mm Hg. When the MGP model predicted PaCO2 between 35 and 60 mm Hg, the 80% prediction interval narrowed to ± 5 mm Hg. The baseline model predicted pH within ± 0.07 of the observed pH 80% of the time. The MGP model improved this to ± 0.05.
CONCLUSIONS: We have demonstrated a conceptual first step for predictive models that estimate pH and PaCO2 to facilitate clinical decision making for children with lung injury. These models may have some applicability when incorporated in ventilator protocols to encourage practitioners to maintain permissive hypercapnia when using high ventilator support. Refinement with additional data may improve model accuracy.
Ventilator management for children with acute lung injury (ALI) varies widely.1,2 Explicit ventilator protocols can standardize mechanical ventilation, provided that practitioners follow recommendations.1,3–6 Specifically, potentially injurious ventilator settings are frequently not reduced, even with normal or over-ventilated pH or PaCO2.1 In general, in the acute phase of illness, ventilator settings are changed based on arterial blood gases (ABGs), requiring an arterial catheter and frequent blood samples, which is challenging in children.7 Accurate and reliable noninvasive methods to estimate pH or PaCO2 could allow for more frequent ventilator changes during the acute phase of illness to maintain permissive hypercapnia and to help in clinical decision making.
Pulse oximetry (SpO2) is routinely used for clinical decision making,8 and clinicians change PEEP or FIO2 in response to either PaO2 or SpO2 both in the acute phase of illness and during weaning. However, clinicians most frequently make decisions to change ventilator rate, tidal volume, or peak inspiratory pressure during the acute phase of illness based on arterial pH or PaCO2. The most widely used noninvasive sensor to estimate adequacy of ventilation is end-tidal carbon dioxide pressure (PETCO2). However, the relationship between PETCO2 and PaCO2 changes as a function of alveolar dead space. Additionally, estimating pH from PETCO2 is confounded by changing metabolic acidosis.
At the bedside, one can estimate PaCO2 from PETCO2 using the alveolar dead-space fraction (AVDSF = [PaCO2 − PETCO2]/PaCO2).9 Although this is not the same as a dead-space-to-tidal-volume ratio, which requires volumetric capnography, it is a clinical surrogate.10–13 Although one can use this value, calculated from simultaneous measurement of PETCO2 and PaCO2, to estimate future PaCO2 from a known value of PETCO2, it will not perform well in the setting of changing alveolar dead space, as may be the case during the acute phase of respiratory illness. To date, most closed-loop algorithms incorporating PETCO2 for ventilator management have been applied to the weaning phase of mechanical ventilation.14 We sought to develop a predictive algorithm to estimate pH and PaCO2 that can account for changing alveolar dead space for application during the acute phase of illness.
We have previously demonstrated that noninvasive surrogates for intrapulmonary shunt (ie, oxygen saturation index [OSI] or SpO2/FIO2) are correlated with changes in AVDSF.8,9 We hypothesize that incorporating noninvasive measures of intrapulmonary shunt, noninvasive continuously available values from the ventilator, and previously known values from the ABGs will permit development of a predictive algorithm to estimate PaCO2 and pH accurately. Such an algorithm could be incorporated into a computer-ventilator protocol to encourage lung-protective behavior and permissive hypercapnia for children with lung injury.
Ventilator protocols based on measurements of arterial blood gases have proven useful in the management of pediatric hypoxemic respiratory failure. Accurate noninvasive measurements of arterial carbon dioxide could allow ventilator changes to optimize lung-protective ventilation without blood gas analysis.
What this paper contributes to our knowledge
In a group of mechanically ventilated pediatric patients with hypoxemic respiratory failure, a predictive model to estimate arterial carbon dioxide using the previously observed relationship between end-tidal carbon dioxide and PaCO2 or pH predicted PaCO2 within ± 6 mm Hg. The utility of these models in replacing arterial blood gases remains to be determined.
We developed and tested algorithms using data sets from previous studies on children with acute hypoxemic respiratory failure.1,8,9,15 We constructed data sets with simultaneous measurements of arterial pH, PaCO2, PaO2, pulse oximetry (when SpO2 was ≤ 97%), PETCO2 (measured via mainstream, with the same adapter size for all children as per ICU standards), and ventilator settings (mode, ventilator rate, peak inspiratory pressure, PEEP, exhaled tidal volume (mL/kg), and FIO2). We created composite variables for deficits in oxygenation, including the OSI (mean airway pressure × FIO2 × 100/SpO2) and SpO2/FIO2. We excluded measurements if there was a leak around the endotracheal tube of ≥ 20%,16 if it had been > 24 h since the previous ABG, or if there was only one ABG for an individual subject. The first ABG attained for each subject was used as baseline, and the algorithms generated estimates for pH and PaCO2 at the time of subsequent ABGs. Predicted values for pH and PaCO2 were compared with actual measured values. The study was approved by the Committee on Clinical Investigation at Children's Hospital Los Angeles with a waiver of informed consent (CCI-09-00126 and CCI-09-00287).
Data Set 1: Single-Center Data Set
We assembled this data set from a single-institution retrospective study. Children (< 18 y of age) were included in this study if they were intubated and mechanically ventilated with at least one PaO2/FIO2 < 300 after intubation. Children with left ventricular dysfunction or cyanotic congenital heart disease were excluded. We have previously published the methods regarding data collection, subject characteristics, and ventilator support.1,15 We extracted data from the electronic medical record, time-ordered per subject. We extracted the closest charted value for SpO2 and PETCO2, which was at most 1 h before a study ABG. We have previously used this methodology to demonstrate that OSI, AVDSF, and SpO2/FIO2 correlate with mortality.9
Data Set 2: Multi-Center Data Set
We assembled this data set from a 6-center prospective study in children. Children (<18 y of age) were included in this study if they were intubated and mechanically ventilated with an indwelling arterial line and SpO2 ≤ 97%. Children with left ventricular dysfunction or cyanotic congenital heart disease were excluded. We have previously published the methods regarding data collection, subject characteristics, and ventilator support.8 SpO2 and PETCO2 were recorded prospectively precisely at the time of the ABG, with concurrent ventilator settings. Therefore, unlike with the single-center data set, PETCO2, SpO2, ventilator settings, and ABG results were simultaneous. ABG values were not recorded if the pulse oximetry waveform was inadequate or if the subject had received endotracheal tube suctioning or invasive procedures for 30 min before the blood gas. PETCO2 was recorded when available as part of routine care. PETCO2 was not used routinely for all ventilated subjects in some of the study ICUs.
We report the results of the statistical models trained on data set 1 and tested on data set 2. To predict both PaCO2 and pH, we created 2 models. The first models used the previous simultaneous values for PaCO2 and PETCO2 to calculate AVDSF, which was used to estimate the expected current value for PaCO2, based on a new PETCO2. We used AVDSF instead of the difference between PaCO2 and PETCO2 to control for proportionality inaccuracies as PaCO2 increased. The first model for pH used this estimate for PaCO2 with the serum bicarbonate from the previous ABG to predict pH using the Henderson-Hasselbalch equation.
The second models were generated using a multivariate gaussian process (MGP), a machine-learning technique. A gaussian process is a probability distribution over a function. The joint values of the function at any subset of times have a multivariate normal distribution, defined by its mean and covariance function. We used a squared-exponential covariance function. The resulting process can be thought of as a generalization of a Bayesian linear regression model applied to higher dimensions. The covariance function of an MGP is represented as a matrix: where the element C(t,t′)ij is the correlation between variable i at time t and variable j at time t′. We used a separable model, where r(t,t′) is the temporal covariance between 2 time points, and S ϵ Rnxn is the covariance matrix between the variables. For an offset of observation times t1,t2, … tT, the resulting observations are jointly gaussian with a covariance matrix of K ϵ RnTxnT. We exploited the separable nature of our model and the simultaneity of the observations to avoid explicit computations with such a large matrix. We estimated the covariance matrix S from the training data (data set 1).
When testing on data set 2, we assumed we knew the measurements for all components except for the current values of PaCO2 and pH. We predicted the mean and covariance of the marginal distributions of PaCO2 and pH at the current time given all known measurements for all components up until and including the current time.
In addition to the AVDSF model, we computed a model based on minute ventilation, which is often calculated at the bedside (estimates the current PaCO2 from the current minute ventilation and previous PaCO2 and minute ventilation). It did not perform as well as AVDSF, so the results are not shown. We tested 2 additional models to account for changing dead space, including hierarchical linear regression and continuous time-based Bayesian network,17 but they were not superior to the MGP model, so the results are not shown. We have previously presented one of these models in abstract form.18
Our primary outcome was the accuracy of different algorithms to predict pH and PaCO2. To evaluate these outcomes, we generated 80% and 95% prediction intervals around the point estimate for pH and PaCO2. The purpose of this outcome was to assess whether the model could generate estimates that fall into a range that may be acceptable in certain clinical scenarios 80% or 95% of the time. To mimic the decisions a ventilator protocol would make, we binned observed and predicted values. PaCO2 was binned: < 35, 35–60, and > 60 mm Hg. pH was binned using guidance from the ARDS Network protocol: < 7.30, 7.30–7.44, ≥ 7.45.1,4 We report percent agreement between observed versus predicted bins, kappa statistics, and 80% and 95% prediction intervals for PaCO2 or pH within each of the predicted bins. We also report the percentage of observations that fall within Clinical Laboratory Improvement Amendments (CLIA) standards for PaCO2 (the greater of ± 5 mm Hg or 8%) and pH (± 0.04).19
Of the 398 children enrolled in the single-center retrospective study (data set 1), 274 met inclusion criteria, with SpO2 ≤ 97% and PETCO2 results available at most 1 h before the ABG. Of the 103 children without cyanotic congenital heart disease enrolled in the multi-center study (data set 2), 83 met inclusion criteria, with PETCO2 data available at the time of the ABG. Hence, this training data set (data set 1) had 2,386 observations (aligned PETCO2, SpO2, ABG, and ventilator data) from 274 children. The testing data set (data set 2) had 658 observations from 83 children. In general, the data sets were similar with respect to disease severity, blood gas parameters, and ventilator support. The subjects had moderate-to-severe lung injury, with a median FIO2 of 0.6 (interquartile range of 0.4–0.8), a median PaO2/FIO2 of 127 (interquartile range of 86–192), and a median oxygenation index of 12.6 (interquartile range of 6.7–22) in data set 1. Results for data set 2 were similar (Table 1). The median time between observations (ABGs) was 6.5 h in both data sets (Table 1).
Model Training (Data Set 1)
The model was trained (parameters were estimated) on data set 1. Variables included in the model were PETCO2, OSI, peak inspiratory pressure, PEEP, ventilator rate, tidal volume, minute ventilation, an interaction term of PETCO2/min of ventilation, dynamic compliance of the respiratory system, and previous values for pH, PaCO2, and PETCO2. Other variables considered but not used in the model were pressure support and SpO2/FIO2. This is because pressure support varied little (the value was almost always 10 cm H2O) and SpO2/FIO2 was included in the calculation of OSI, which was included in the model.
Prediction of PaCO2 in the Testing Data Set (Data Set 2)
We constructed a baseline model to predict PaCO2 using the previous AVDSF with the current PETCO2. The predicted values were on average 0.3 ± 7.2 mm Hg (mean ± SD) higher than the observed values. Overall, 67.5% of the predicted PaCO2 would fall within CLIA standards (greater of ± 5 mm Hg or 8% difference) against the measured PaCO2. Eighty percent of the predicted values were within ± 7 mm Hg of the observed values, and 95% were within ± 13 mm Hg (Fig. 1A). When binning the observed and predicted PaCO2, the overall agreement was 89%, with a kappa of 0.76 (Table 2). The accuracy was best in the normal or low PaCO2 bins, where 80% of the predicted values were within ± 6 mm Hg of the observed values, and 95% were within ± 12 mm Hg (Table 3 and Fig. 2A).
The MGP model derived from data set 1 performed slightly better than the baseline model using AVDSF. The predicted values were on average 0.02 ± 6.1 mm Hg (mean ± SD) higher than the observed values. Overall, 73.6% of the predicted PaCO2 would fall within CLIA standards against the measured PaCO2. For the MGP model, 80% of the predicted values were within ± 6 mm Hg of the observed values and 95% were within ± 11 mm Hg (see Fig. 1B). When binning the observed and predicted PaCO2, the overall agreement was 91%, with a kappa of 0.80 (see Table 2). Within each PaCO2 bin, the prediction intervals were narrower than with the AVDSF model, and in the normal PaCO2 bin, 80% of the predicted values were within ± 5 mm Hg of the observed values, and 95% were within ± 10 mm Hg. In the low PaCO2 bin, 80% of the predicted values were within ± 3 mm Hg of the observed values, and 95% were within ± 4 mm Hg (see Table 3 and Fig. 2B).
Prediction of pH in the Testing Data Set (Data Set 2)
We used the predicted PaCO2 from the AVDSF model with the calculated serum bicarbonate from the previous ABG with the Henderson-Hasselbalch equation to predict pH. The predicted values were on average 0.004 ± 0.064 (mean ± SD) lower than the observed values. Overall, 59.3% of the predicted pH would fall within CLIA standards (± 0.04) against the measured pH. Using this model, 80% of the predicted pH values were within ± 0.07 of the observed values, and 95% were within ± 0.13 (Fig. 3A). When binning the observed and predicted pH, the overall agreement between the observed and predicted bins was 70%, with a kappa of 0.48 (Table 4). The best accuracy was in the normal pH bin, where 80% of the predicted values were within ± 0.06 of the observed values, and 95% were within ± 0.10 (see Table 3 and Fig. 4A).
The same MGP model (trained on data set 1) was used to predict pH for data set 2. In general, the MGP model was slightly superior to the model using AVDSF and the Henderson-Hasselbalch equation. The predicted values were on average 0.002 ± 0.05 (mean ± SD) lower than the observed values. Overall, 67.6% of predicted pH would fall within CLIA standards against the measured pH. Using the MGP model, 80% of the predicted pH values were within ± 0.05 of the observed values, and 95% were within ± 0.10 (see Fig. 3B). When binning the observed and predicted pH, the overall agreement between the observed and predicted bins was 72%, with a kappa of 0.49 (see Table 4). Within each pH bin, the prediction intervals were narrower than with the Henderson-Hasselbalch model, and in the normal pH bin, 80% of the predicted values were within ± 0.05 of the observed values, and 95% were within ± 0.10. In the high pH bin, 80% of the predicted values were within ± 0.05 of the observed values, and 95% were within ± 0.07 (see Table 3 and Fig. 4B).
Alternative Training and Testing Data Sets
We repeated the analysis using data set 2 as the training data set and data set 1 as the testing data set. In general, each model had larger prediction intervals and less agreement between observed and predicted bins of pH and PaCO2 (data not shown).
We have demonstrated a first step in the application of machine-learning algorithms to estimate pH and PaCO2 to facilitate decision making regarding ventilator management for children with moderate-to-severe lung injury. Over the entire range of predicted values, these models may not yet have an acceptable level of accuracy to replace blood gas sampling. However, these algorithms may be useful in certain clinical scenarios such as decreasing potentially injurious ventilator settings for children who fall in the over-ventilated range through a protocol with standardized ventilator decisions. With model refinement, it may become more clinically acceptable in other scenarios. Furthermore, although models that use previous known relationships between PETCO2 and PaCO2 perform reasonably well, we can modestly improve the accuracy by incorporating noninvasive markers of oxygenation as well as changes to ventilator settings. It may be that these are surrogates for changes in alveolar dead space. Our MGP model predicts PaCO2 with 80% prediction intervals of ± 6 mm Hg and pH with 80% prediction intervals of ± 0.05. It performed best in the middle or over-ventilated range of pH and PaCO2. For example, if the model predicted PaCO2 to be 50 mm Hg, 80% of the time, the actual PaCO2 (if one were to draw an ABG) would be between 45 and 55 mm Hg, and 95% of the time, it would be between 40 and 60 mm Hg. For pH in the over-ventilated range, for example, if the model predicted pH to be 7.45, 80% of the time, the actual pH would be between 7.4 and 7.5, and 95% of the time, it would be between 7.38 and 7.52. Overall, the model would predict PaCO2 with CLIA-acceptable equivalence to a blood gas machine 74% of the time and pH 67% of the time.
Some may believe that the confidence in these predictions is not adequate for clinical decision making. Although in medicine we strive for 95% certainty for statistical significance, clinical decisions are often based on much more uncertainty than 20%. For example, pulse oximetry in a low range (< 87%) may have 95% prediction intervals greater than ± 10% against co-oximetry.20 For example, if the pulse oximeter reading is 85%, then 50% of the time, the actual SaO2 on co-oximetry would lie between 75 and 83%, but for 95% certainty, the SaO2 could range from 64 to 89%. Nevertheless, pulse oximetry in a low range is routinely used for clinical decision making for children with cyanotic congenital heart disease. Although SpO2 is more accurate in the range frequently seen for children with ALI, this example is meant to illustrate that the parameters practitioners routinely use for clinical decisions relating to mechanical ventilation may have more uncertainty than 20%. As such, 80% certainty that the PaCO2 is in a range of 10 mm Hg or that the pH is in a range of 0.1 (as seen in our model in the normal or over-ventilated range) may be acceptable in certain situations to facilitate a clinical decision such as decreasing potentially injurious ventilator settings when the value is predicted to be normal or high, embracing a permissive hypercapnia strategy for ALI.
We believe that there are several potential applications of these algorithms. First, noninvasive estimates of PaCO2 or pH may decrease the number of ABGs. If clinicians can be 80% confident that the PaCO2 lies within a range of 10 mm Hg or that the pH lies within a range of 0.1, they may be willing to forgo an ABG and instead change the ventilator. This is likely applicable in the over-ventilated range for patients with ALI, encouraging more continual lung-protective behavior. Leaving the decision to act on the estimated open loop allows providers flexibility regarding their comfort with the reported level of certainty, as there may be scenarios when this level of accuracy is not acceptable.
Second, these continuously available estimates may facilitate standardized assessment of ventilator support and adherence to ventilator protocols. For example, an open-loop computer protocol could be developed that requires an assessment every 2–4 h. The decision support tool could display an estimate for the predicted pH, with a prediction interval. Clinicians could accept or reject the protocol's recommendation or obtain an ABG if they are uncomfortable with the potential error in the prediction at that time point. As we see in this analysis, the majority of blood gas values for children with ALI lie in a normal range, where the model performs well (> 90% agreement in PaCO2 bins). We have previously demonstrated there are many lost opportunities to be lung-protective when clinicians do not wean the ventilator even when settings are high and the pH is normal or high.1 A continuously available estimate of pH or PaCO2 may make clinicians more willing to wean ventilator settings because they can more closely monitor the effects of their change. However, as is clear from the analysis, these algorithms have limitations, particularly in the low pH ranges, where there is substantially more uncertainty in the actual pH seen in the limits of agreement as well as in the Bland-Altman plots.
We specifically developed this model using data from children with lung injury. We felt it important to start with this population because it represents a more difficult scenario, as children with lung injury have dynamic and changing degrees of dead space. It is likely that both models that assume no change in dead space (such as the AVDSF model) and those that try to capture surrogates for changing dead space (ie, the MGP model) would perform even better in children with minimal or no lung injury. This should be tested. The median AVDSF in these 2 data sets was ∼0.23, in line with our previous publication on AVDSF demonstrating that such values are independently associated with mortality.9 As such, these data sets represent the lung-injured children we often take care of in our ICUs, for whom lung-protective ventilation has the potential to improve outcomes.
Although the predictive ability of the MGP algorithm is fair, it is not ready for a closed-loop system (where the provider's feedback is not required to change the ventilator) and will not replace blood gases. In fact, these models are reliant upon blood gasses for their development and calibration and are meant to facilitate decision making at times between blood gasses. This analysis was meant as a first step; the model must be refined with additional data and tighter prediction intervals before the loop on ventilation is ready to be closed.
To our knowledge, this is the first application of gaussian processes, a machine-learning technique, to predict pH or PaCO2. However, machine-learning techniques have been used extensively in medicine,21 in gene expression studies,22–24 for classification of cardiac arrhythmias,25 for predicting morbidity after coronary artery bypass surgery,26 and for predicting when weaning from ventilator support should begin.27 Gaussian processes have been applied in adults with ALI to model the pressure-volume curve to titrate PEEP.28 These techniques have been used in industries outside of medicine for years, and although the methods may appear complicated, the algorithms are not computationally challenging. Therefore, they can easily be applied in most ICUs using basic computers.
There are limitations to our analysis. First, this represents secondary analysis of data, which is inherently limited in reliability and accuracy. Second, the algorithms performed worse when trained on the multi-center data set and tested on the single-center data set. We believe that this is a function of the simultaneous assessments of SpO2, PETCO2, and ABG values in the prospective multi-center data set compared with the retrospective single-center data set. Because the real-time application of the algorithm will be with controlled, continuously available data, we believe that it will more likely perform as it did with the multi-center data. This needs to be tested. Third, we elected not to split the multi-center data set into training and testing data because we were worried about sample size. Fourth, the testing data set was relatively small (84 subjects), and the algorithm should be evaluated in another group of subjects. Fifth, there were no hemodynamic data, which will also affect changes in dead space. Sixth, the MGP model had a more visible proportional bias on Bland-Altman analysis than the AVDSF model and may perform differently when applied to a different validation data set. This should be tested. Seventh, although application of such an algorithm may have an intention to reduce the frequency of blood gasses, it is possible that the early phases of deployment of such an algorithm may prompt clinicians to get more blood gases to verify what the algorithm is displaying. In all likelihood, such a phenomenon would be transient, and once clinicians became more comfortable with its accuracy, they would draw fewer blood gases.
We may be able to improve model accuracy with temporally continuous values for PETCO2 and SpO2, but this may require alternative analytic methodologies to reduce the dimensionality of the data. We may also be able to improve the accuracy by incorporating hemodynamic variables such as heart rate and blood pressure. Finally, these algorithms will likely perform better in less acute phases of mechanical ventilation, such as weaning, when dead space changes less frequently. These hypotheses need to be tested.
Noninvasive, continuously available measurements and ventilator settings may be helpful for predictive algorithms that estimate PaCO2 and pH for children with hypoxemic respiratory failure. The current level of accuracy offers some applications, particularly for standardizing decisions about decreasing ventilator support in line with lung-protective strategies when the pH or PaCO2 is predicted to be normal or over-ventilated. With continued model refinement, it is possible that these algorithms can be used for decision support by incorporating them into computer-ventilator protocols. These algorithms should be refined and tested with additional prospectively gathered data from mechanically ventilated children with a wide range of severity of lung injury and hemodynamic support.
- Correspondence: Robinder G Khemani MD MsCI, Children's Hospital Los Angeles, 4650 Sunset Boulevard, Mailstop 12, Los Angeles, CA 90027. E-mail: .
This work was supported by the Department of Anesthesiology and Critical Care Medicine, Children's Hospital Los Angeles, and by the Laura P. and Leland K. Whittier Virtual Pediatric Intensive Care Unit.
Dr Khemani presented a version of this paper at the ATS 2012 International Conference, held May 18–23, 2012, in San Francisco, California.
The authors have disclosed no conflicts of interest.
- Copyright © 2014 by Daedalus Enterprises