Article Text

Download PDFPDF

Progressive prediction of hospitalisation in the emergency department: uncovering hidden patterns to improve patient flow
  1. Yuval Barak-Corren1,2,
  2. Shlomo Hanan Israelit3,4,
  3. Ben Y Reis2,5
  1. 1 Rappaport Faculty of Medicine, Technion Israel Institute of Technology, Haifa, Israel
  2. 2 Predictive Medicine Group, Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, USA
  3. 3 Emergency Department, Rambam Healthcare Campus, Haifa, Israel
  4. 4 Elisha Hospital, Haifa, Israel
  5. 5 Harvard Medical School, Boston, Massachusetts, USA
  1. Correspondence to Yuval Barak-Corren, Rappaport Faculty of Medicine, Technion Israel Institute of Technology, Haifa, Israel; yuval.barakcorren{at}childrens.harvard.edu

Abstract

Introduction One of the factors contributing to ED crowding is the lengthy delay in transferring an admitted patient from the ED to an inpatient department (ie, boarding time). An earlier start of the admission process using an automatic hospitalisation prediction model could potentially shorten these delays and reduce crowding.

Methods Clinical, operational and demographic data were retrospectively collected on 80 880 visits to the ED of Rambam Health Care Campus in Haifa, Israel, from January 2011 to January 2012. Using these data, a logistic regression model was developed to predict patient disposition (hospitalisation vs discharge) at three progressive time points throughout the ED visit: within the first 10 min, within an hour and within 2 hours. The algorithm was trained on 50% of the data (n=40 440) and tested on the remaining 50%.

Results During the study time period, 58 197 visits ended in discharge and 22 683 in hospitalisation. Within 1 hour of presentation, our model was able to predict hospitalisation with a specificity of 90%, sensitivity of 94% and an AUCof 0.97. Early clinical decisions such as testing for calcium levels were found to be highly predictive of hospitalisations. In the Rambam ED, the use of such a prediction system would have the potential to save more than 250 patient hours per day.

Conclusions Data collected by EDs in electronic medical records can be used within a progressive modelling framework to predict patient flow and improve clinical operations. This approach relies on commonly available data and can be applied across different healthcare settings.

  • emergency department management
  • hospitalisations
  • efficiency
  • communications
  • research
  • operational

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Key messages

What is already known on this subject?

  • Early prediction of patients’ disposition can improve ED flow.

  • There is widespread adoption of electronic medical records by hospitals.

What this study adds?

  • In a large tertiary hospital in Israel, we developed a predictive model for admission based on data available at 10, 60 and 120 min after arrival at the ED (ie, ‘progressive’ modelling).

  • The model identified 94% of the hospitalisations and 90% of the discharges, as early as 1 hour after presentation to the ED.

  • Information on clinical decisions made during the ED encounter can be used to predict hospitalisation.

Background and significance

EDs worldwide face ever-increasing patient loads.1–6 The average time patients wait to be seen by an ED physician in the USA has increased from 46.5 to 58.1 min between 2003 and 20097; in Canada the median triage-to-physician time has increased from around 40 min to nearly 120 min between 2000 and 2007.8 This increase in waiting time leads to adverse clinical outcomes and a more stressful environment at the ED.9–12 Evidence suggests that mitigation of crowding in EDs is associated with better clinical outcomes.13

Several measures are employed to better manage ED crowding. One such measure is the standard triage system that prioritises patients according to the severity of their condition and the urgency of care they require; yet some debate exists regarding its validity and interobserver reproducibility.14 Other methods include: fast-track units, observation units and lean methodology in process design.15–17 Yet, given current levels of crowding, much additional room for improvement remains.

Evidence suggests that early outcome prediction (ie, hospitalisation or discharge) can improve ED patient flow.15 16 Due to the widespread use of electronic medical records (EMR), we set out to determine the value of EMR data for predicting hospitalisations. Given the progressive nature of ED treatment, where different types of data become available over time, we propose a progressive modelling approach—making use of available data in the early stages of an ED visit, and improving the accuracy of predictions as additional information becomes available later on.

This study builds on a growing field of predictive medicine related to emergency care, including studies that predict overall ED load,18 expected case mix19 and patient flow from the ED onwards.5 20 21

Objective

The objective of this study is to create and validate an automatic tool for the prediction of hospitalisations from the ED. The proposed model could serve as a practical solution for improving the operations of the ED and minimising waiting times for the patients, a problem relevant to many EDs worldwide.

Materials and methods

Data collection and preparation

The study was based on cross-sectional data collected at the ED of the Rambam Healthcare Campus, a 1000-bed tertiary care hospital in northern Israel. The data were collected retrospectively for the time period from January 2011 to January 2012; the data encompass 13 months of activity, 85 526 different visits and 65 117 unique patients. The Institutional Review Board of the Rambam Healthcare Campus approved the study.

The primary outcome was the clinical decision regarding admission: patients were released home (n=58 197) or admitted to an inpatient department (n=22 683). Patients  who had outcomes that did not clearly fall into these two categories, including deceased (n=242, 0.3%), released to other institutes (n=1298, 1.5%), unknown outcomes (n=1957, 2.3%) and decided to leave on their own (n=1149, 1.3%) were excluded from the study (n=4646 visits, 5.4%), see online supplementary figure S-1. We provide results that include this population in the supplementary material (see online supplementary table S-1).

Supplemental material

The data were de-identified and analysed on a per-visit basis; that is, multiple visits of the same patient were treated as separate instances (see online supplementary table S-2 for overall visit statistics per subject). For each visit, 43 parameters were collected, including demographics, triage score, chief complaint, vital signs, list of medications, lab results, history of prior visits, list of diagnoses and timestamps for the different steps in the ED care process (table 1). Categorical parameters were manually coded to numerical values. We automatically encoded and translated the Hebrew free-text in the chief complaint parameter into a list of structured English codes using an encoding tool developed by Cohen and Elhadad (see online supplementary figure S-2).22 Due to computational limitations, we applied feature-selection and reduced the number of variables for all textual parameters (chief complaint, medications and diagnosis) using the Natural Language Toolkit (NLTK) for Python; we selected only the terms with highest frequency of use or highest significance in discerning between hospitalised and discharged patients (see online supplementary table S3-S5). Outliers were removed from clinical parameters where values did not match physiological range (eg, oxygen saturation of 1000% or systolic BP of more than 1000 mm Hg) or clearly out of range (eg, 60 hours till seen by a physician); these were less than 0.6% of these cases for all parameters and the missing values were imputed with mean value.

Table 1

Univariate analysis of the parameters used in the model

Since lab tests were not taken for all subjects, we faced the problem of many missing lab results. Excluding records with missing labs, we were left with 17 501 records, leaving a biased sample with 89.5% (n=15 658) of the patients being hospitalised. In fact, a similar problem exists in single lab measurements such as calcium; among the 18 924 patients with data on calcium levels, 89.1% (n=16 861) were hospitalised. As a result, we decided to encode each lab-test as either missing (not taken) or existing (taken) while disregarding their actual values.

Table 2

Summary of bivariate analysis

Statistical analysis

The analysis of the data followed a three-step process. After running a univariate analysis (table 1) to identify the overall distribution of the different parameters and assess the number of missing or invalid records for each parameter, we continued with a bivariate analysis of the data, comparing each of the independent variables across the two different cohorts—hospitalised and discharged (table 2). We ran t-tests for the continuous variables (eg, lab results) and χtests for the categorical parameters (eg, triage score). Lastly, we performed a multivariable analysis.

The multivariable analysis was based on the data available at three different time points: the first analysis (T1) was based on data available within the first 10 min of admission to the ED and includes demographics, chief complaint, vitals and triage; the second analysis (T2) was based on data available between 10 min and 1 hour after admission, including lab results; the third analysis (T3) was based on data that are typically available between 1 and 2 hours after presentation and include the physician’s diagnosis (International Classification of Disease, revision 9 codes: ICD-9) and list of medications.

Three different modelling methods (logistic regression, naïve Bayes and C4.5 classification tree) were evaluated for the multivariable analysis. The evaluation was done at each of the previously mentioned time points (T1–T3) and on a randomly selected sub-cohort of 50% of the subjects. We used Orange BioLab statistical software for this comparison.23 All models were all-inclusive and no variables were omitted (see online supplementary table S-6). Comparing the models’ area under the curve values, the logistic regression model was found to be superior at all of these time-points and was chosen for the final analysis.

The multivariable analysis was carried out in the following manner: for all three time analyses (T1–T3) a logistic regression model was built using a randomly selected training set that included all variables and half the subjects (n=40 440), each model was then validated on the remaining testing-set (n=40 440). Models T2 and T3 were built both with and without the prediction results obtained from the preceding models; the preceding prediction results were introduced into the model as a zero-to-one continuous variable. The R statistics program and Stats package V.3.1.1 were used for these analyses.24

Results

Univariate analysis of features

On average, 227 patients visited the Rambam Healthcare Campus ED each day, and the average rate of admission to the hospital was 28% (n=63) per day (table 1). Yet this is only the average rate of hospitalisations and the actual daily rates can vary greatly from day to day, both as a percentage of total visits (from 14% to over 35% hospitalisation rate), and in absolute numbers (from 23 to 88 hospitalisations per day) as shown in the online supplementary figure S-3.

Bivariate analysis of features

Bivariate analysis of the data revealed statistically significant differences between admitted and discharged patients for almost all the parameters measured (table 2). Hospitalisation was associated with an older age, with a higher number of prior admissions and with an overall higher number of prior ED visits in the past  2 years. Among the subjects with lab tests, we found small but statistically significant differences in lab results between the hospitalised and discharged patients, tending toward the pathological values of each measurement for the hospitalised patients. While these results were statistically significant, given the large amount of missing lab-test values we decided to encode each lab test as either existing or missing for the multi-variable prediction model.

Multivariable predictive modelling

Using the logistic regression model at time-point T1 (10 min), we were able to correctly predict patient disposition in 79.0% of the cases with 51.0% sensitivity and 10% false-positive rate (table 3). The prediction was most influenced by the patient’s parents’ country of birth, initial classification made by the receptionist assigning the patient to a specific ED unit, and terms from the chief complaint such as ‘digestion’ and ‘jaundice’. Discharge was associated with chief-complaint terms like ‘needle’, ‘alcohol’ and ‘lightheadedness’ (see online supplementary table S-7).

Table 3

Comparing logistic regression performance at different time points and different combinations of time points

The second model (T2) uses features that are usually available later than 10 min, but within 1 hour of admission, and includes lab results. Relating only to the presence or absence of the lab results we were able to predict patient disposition in 90.9% of the cases with 93.3% sensitivity and 10% false-positive rate (table 3). We found that the lab values themselves were not very useful for predicting hospitalisation, but the decision to order each specific lab test was useful for prediction (see online supplementary tables S-7).

After separately obtaining results for both T1 and T2 time points, we re-ran T2 this time with T1 results to generate a combined T1+T2 model (see online supplementary table S-8). The combined model did not lead to a significant improvement in our prediction capabilities. While T2 alone provided an accuracy of 90.9% in classification, adding model T1’s results improved it to 91.2%, a minor 0.3% improvement (table 3).

Model T3 was based on information that is typically available later than 1 hour from admission to the ED, but within 2 hours. It included the ICD-9 codes entered by the physician, the new medications prescribed at the ED and the time it took till a physician saw the patient. Using model T3 we were able to predict patient disposition in 80.8% of the cases with 57.6% sensitivity and 10% false-positive rate. Controlling for other factors pancreatitis, coagulopathies and acute infections (as seen by the administration of intravenous antibiotics) were found to be most important in predicting hospitalisations (see online supplementary table S-7). The intradiagnoses and intramedications correlations are shown in online supplementary figures S-4 and S-5.

Combining all models T1+T2+T3 we predicted patient disposition in 91.8% of the cases with 96.3% sensitivity and 10% false-positive rate (table 3). Figure 1 demonstrates the impact of applying models T1 and T2 to a theoretical cohort of 100 patients that enter the ED. Model T3 is omitted from the figure since its contribution to the overall prediction is only marginal.

Figure 1

Illustration of the suggested prediction system for a theoretical cohort of 100 patients.  When patients first enter the ED it is unclear whether they will be admitted or not, but as more information becomes available the prediction system can start to discern between the hospitalised (red) and discharged (blue) patients. The three time-lapses in this figure depict this prediction process. On the left are 100 patients with an unidentified prognosis at the time of presentation. After 10 min we can apply model T1 and correctly identify the outcome (discharge or hospitalisation) in 79% of the cases. At 60 min after presentation, most of the lab-test results are available and we can use model T2 to correctly identify the outcome in 91% of the cases. Model T3 is not mentioned in this figure, since it provides only a minor improvement.

Discussion

The results of this study indicate that by using information commonly available in most EDs, we can accurately predict a patient’s likelihood of hospitalisation at different stages of the visit, with increasing levels of accuracy. Accurate and early prediction of admission can expedite the bed-coordination process and possibly shorten the boarding time.15 16 The results uncover the high predictive value of early routine clinical decisions made by ED nurses and physicians such as which blood work to order, a finding that resonate with the results from studies like the one carried by Vaghasiya et al.25

The predictive performance of this progressive model compares favourably with other ED predictive models, as seen in table 4.

Table 4

Summary of results from a selected sample of studies of hospitalisation prediction. The high NPV results highlight the potential of automatically predicting discharges. The list of studies is based on Peck26 and our own literature review. The present study relates to the T1+T2+T3 results

In the specific case of the Rambam ED, it takes about 5.5 hours for the ED physicians to decide whether a patient will be hospitalised or not. It then takes an additional 3.5 hours until a bed is set in one of the departments (figure 2). This leads to a total of 9 hours of average waiting in the ED for the 28% of patients who are eventually hospitalised—hours that are spent in the congested ED while an inpatient bed is already available in many of these cases. The patient waits for the ED physicians to complete their evaluation and care and then for the inpatient department staff to make their preparations. Once the lab-test results are available (usually within an hour in our specific setting), or even when the tests are ordered, the suggested model (T1+T2) can already predict hospitalisation with a sensitivity of 94.4% and false-positive rate of 10%. Thus, on an average day of 227 visits to the ED and 63 admissions, the model would be expected to correctly recognise approximately 59 of these cases, saving each of these 59 patients up to 4.5 hours and saving the ED up to 265.5 patient-hours a day. For the discharged patients who spend on average 5 hours at the ED, our system could predict their discharge 4 hours in advance of the current discharge time, potentially shortening their stay at the ED and leading to a more efficient use of the hospital’s resources.

Figure 2

Comparison of ED flow with (b) and without (a) the prediction system.  The system advances the bed co-ordination step to an earlier stage, thus saving the boarding time for the patient.26

For the hospitalised patient, these 4.5 hours can be valuable, even if there is limited space in the inpatient department, as studies show that when EDs are overcrowded patients prefer to be in the hallways of the inpatient department vs boarded in the ED.33 Also, special areas within the ED could be created for these patients to better facilitate their needs. For the discharged patient, every hour saved is an hour away from work or at home, or time that could be better used for outpatient medical assessment.

The system, as we envision it, could be used either for its positive predictive value (ie, predicting hospitalisations) or for its negative predictive value (ie, predicting discharges), or for both uses. It could run in the background in the EMR, invisible to the clinical staff at the ED and only notifying the inpatient hospital departments or co-ordinators of patient placement (if available) when a hospitalisation is predicted. The inpatient department will then be able to prepare ahead for accepting a most likely inpatient, thus making the overall flow of patients through the hospital more efficient.

When the ED is overcrowded, prediction of discharges could also prove very useful. At the physician’s discretion and without interfering with the treatment of urgent cases, the system could present ED physicians with the likelihood of admission for each patient, thus helping them attend to the likely-to-be-released patients early on and prepare them for discharge, leaving the ED only with patients that require further investigation. This will improve the ED experience for both the patients and staff and will save precious time for the discharged patients. It will also unmask the to-be-admitted patients who might otherwise get lost in the crowd. This optimisation could only be possible after all urgent patients are stabilised and cared for.

As with any prediction system, potential errors may arise. These errors might serve to decrease the level of trust that ED physicians, nurses in inpatient departments and others put in the system. False-positive cases (about 10 out of the 164 discharges per day in our setting) can lead to unnecessary allocation of resources in the department. False-negative results (10 out of 63 hospitalisations per day) might provide false assurance to the inpatient department and improper preparation for incoming load. The system will remain beneficial so long as the cost of its misclassifications is relatively low compared with the benefits provided by the model’s successes.

These errors may be mitigated by the cancelling out of the false-positive results with the false-negative ones, though this compensation may be limited due to the differences between patients and between the time in day of hospitalisations. Like all prediction systems, implementation requires carefully communicating the predictions of the model to relevant hospital staff in a way that embraces the inherent uncertainty of predictive models, and avoids unwanted influence on the care provider.

Limitations

There are several limitations to our study. First, the data used in this study is unique to the Rambam ED, and may not be generalisable to other EDs as is. While the method for constructing these predictive models may be applied in other EDs, the results and odds ratios for each variable can vary between different settings. Second, the data itself is imperfect and incomplete and thus might be sensitive to changes in ED protocols and working norms. In our specific use-case, 89% of the patients who had full blood work were hospitalised, thus most of the prediction was based on subjective clinical decisions, usually carried out by the ED nurses, of whether or not to order these tests. If new protocols would require blood-work for all patients, then the model will have to be modified; periodic adjustment of the model is generally recommended. Third, not all patients have a ‘discharged’ or ‘hospitalised’ outcome. We omitted from our analysis 5% of the patients who had other outcomes that were not ‘discharged’ or ‘hospitalised’. Another limitation of this model is that it does not predict the specific inpatient department that the patient will be admitted to. This can be studied in future work.

Conclusions

Hospitals today collect vast amounts of information in their EMR, sufficient to create useful predictions for admissions from the ED. The prediction system in this study can be implemented today without requiring significant modifications to established clinical process. It uses a simple logistic regression model, provides a binary and accurate output and has been tested on a large cohort of patients from our retrospective dataset. Integrated with an EMR system, our model can streamline the ED experience of patients, improve hospital operations and enhance co-ordination between different caregivers. Further studies will enable us to validate the generalisability of these findings to additional settings.

References

Footnotes

  • Contributors Study conception and design: YBC, SI, BYR. Acquisition of data: YBC, SI. Analysis and interpretation of data: YBC, SI, BYR. Drafting of manuscript: YBC, BYR. Critical revision: YBC, SI, BYR.

  • Funding This study was partially funded by the US National Library of Medicine, grant R01 LM009879.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.