Abstract
Purpose
Obstructive sleep apnea (OSA) may lead to life-threatening problems if it is left undiagnosed. Polysomnography is the “gold standard” for OSA diagnosis; however, it is expensive and not widely available. The objective of this systematic review is to identify and evaluate the available questionnaires for screening OSA.
Source
We carried out a literature search through MEDLINE, EMBASE, and CINAHL to identify eligible studies. The methodological validity of each study was assessed using the Cochrane Methods Group’s guideline.
Principal findings
Ten studies (n = 1,484 patients) met the inclusion criteria. The Berlin questionnaire was the most common questionnaire (four studies) followed by the Wisconsin sleep questionnaire (two studies). Four studies were conducted exclusively on “sleep-disorder patients”, and six studies were conducted on “patients without history of sleep disorders”. For the first group, pooled sensitivity was 72.0% (95% confidence interval [CI]: 66.0-78.0%; I2 = 23.0%) and pooled specificity was 61.0% (95% CI: 55.0-67.0%; I2 = 43.8%). For the second group, pooled sensitivity was 77.0% (95% CI: 73.0-80.0%; I2 = 78.1%) and pooled specificity was 53.0% (95% CI: 50-57%; I2 = 88.8%). The risk of verification bias could not be eliminated in eight studies due to insufficient reporting. Studies on snoring, tiredness, observed apnea, and high blood pressure (STOP) and STOP including body mass index, age, neck circumference, gender (Bang) questionnaires had the highest methodological quality.
Conclusion
The existing evidence regarding the accuracy of OSA questionnaires is associated with promising but inconsistent results. This inconsistency could be due to studies with heterogeneous design (population, questionnaire type, validity). STOP and STOP-Bang questionnaires for screening of OSA in the surgical population are suggested due to their higher methodological quality and easy-to-use features.
Résumé
Objectif
L’apnée obstructive du sommeil (AOS) peut provoquer des problèmes de santé fatals si elle n’est pas diagnostiquée. La polysomnographie est «l’étalon or» du diagnostic de l’AOS; cependant, cette méthode est onéreuse et n’est pas disponible partout. L’objectif de cette revue méthodique était d’identifier et d’évaluer les questionnaires de dépistage de l’AOS existants.
Source
Nous avons réalisé une recherche de la littérature dans les bases de données MEDLINE, EMBASE et CINAHL afin d’extraire les études admissibles. La validité méthodologique de chaque étude a été évaluée sur la base de la directive du Groupe de méthode de Cochrane.
Constatations principales
Dix études (n = 1484 patients) satisfaisaient aux critères d’inclusion. Le questionnaire de Berlin était le questionnaire le plus utilisé (quatre études), suivi par le questionnaire sur le sommeil de Wisconsin (deux études). Quatre études ont été menées exclusivement auprès de «patients avec troubles du sommeil», et six auprès de «patients sans antécédents de troubles du sommeil». Dans le premier groupe, la sensibilité pondérée était de 72,0% (intervalle de confiance [IC] 95%: 66,0-78,0%; I2 = 23,0%) et la spécificité pondérée de 61,0% (IC95%: 55,0-67,0%; I2 = 43,8 %). Dans le deuxième groupe, la sensibilité pondérée était de 77,0 % ([IC] 95%: 73,0-80,0%; I2 = 78,1 %) et la spécificité pondérée de 53,0% (IC95%: 50-57%; I2 = 88,8%). Le risque de biais de vérification n’a pas pu être éliminé dans huit des études en raison de présentation insuffisante des données. Les études sur le ronflement, la fatigue, l’apnée observée et une hypertension artérielle (études dites STOP pour l’acronyme anglais) et les questionnaires STOP incluant l’indice de masse corporelle, l’âge, la circonférence du cou, et le sexe (études dites Bang pour l’acronyme anglais) ont démontré la meilleure qualité méthodologique.
Conclusion
Les données probantes existantes concernant l’exactitude des questionnaires sur l’AOS sont associées à des résultats prometteurs mais peu constants. Ce manque de constance pourrait être lié à la conception hétéroclite des études (population, type de questionnaire, validité). Les questionnaires STOP et STOP-Bang sont suggérés pour dépister l’AOS chez les patients chirurgicaux en raison de leur qualité méthodologique supérieure et de leur facilité d’emploi.
Similar content being viewed by others
Obstructive sleep apnea (OSA) is a significant medical problem affecting at least 2-26% of the general population.1 It is estimated that up to 93% of women and 82% of men with moderate to severe OSA remain undiagnosed.2 Obstructive sleep apnea is independently associated with an increased likelihood of hypertension, cardiovascular disease, and diminished quality of life.3 - 7
The “gold standard” for diagnosis of OSA is laboratory polysomnography (PSG); however, the occurrence of OSA is far more prevalent than can be handled by the available sleep laboratories. Therefore, a screening tool is necessary to stratify patients based on their clinical symptoms, their physical examinations, and their risk factors, in order to ascertain patients at high risk and in urgent need of PSG and/or further treatment and patients at low risk who may not need PSG.
Previous investigators have developed different diagnostic models for the clinical prediction of OSA. Rowley et al.8 prospectively studied the utility of four clinical prediction models (Crocker,9 Viner,10 Flemons,11 and Maislin12) and concluded that they are not sufficiently accurate to discriminate between patients with or without OSA. In addition, some of these clinical models require the assistance of a computer and sophisticated mathematical calculations.
In contrast to clinical diagnostic models, OSA questionnaires do not require complicated calculations to identify high-risk patients, and they are potentially easier for routine clinical applications. This systematic review aims to evaluate and compare the accuracy of existing questionnaires as screening instruments for OSA in adults.
Methods
This systematic review was carried out using the recommended methods established by the Cochrane Methods Group on Screening and Diagnostic TestsFootnote 1 and by other authors.13 , 14
Literature search
In order to include all available evidence, a systematic search of the literature was carried out through the Cochrane Library, MEDLINE (from 1950 to April 2009), EMBASE (from 1980 to April 2009), and CINAHL (from 1990 to 2009) using the search strategy that was designed for each database. The search strategy was developed and executed by an expert librarian and included the following free-text and index terms: “obstructive sleep apnoea or apnea”, “hypopnea or hypopnoea”, “OSA or SHS or OSAHS”, “sleep related respiratory disorder”, “sleep disordered breathing”, “Sleep Apnea Syndromes”, “Risk Assessment”, “Mass Screening”, “validation studies”, “questionnaire”, “sensitivity”, “specificity”, “screen”, “risk”, “score or scale”, and “mass screening” (Appendix). The search was extended to checking the reference lists of the included papers.
The search results were evaluated by two independent reviewers (A.A., A.K.) to find the eligible articles for inclusion. First, obviously irrelevant items were excluded by reviewing the title and/or abstract of the records. Next, the full-text articles of the remaining papers were retrieved and carefully evaluated to determine if they met the following eligibility criteria: 1) The study used a patient-based questionnaire as a screening tool for OSA in adult subjects (≥18 yr); 2) The questionnaire’s accuracy was evaluated by comparing its results with the results of a PSG as the “gold standard”15 for diagnosing OSA; 3) OSA was clearly defined as apnea/hypopnea index (AHI), apnea index (AI), or respiratory disturbance index (RDI) ≥ 5; 4) Information was adequately presented to allow the construction of a 2 by 2 contingency table; 5) The questionnaire and full text paper were written in English. The studies that were found ineligible and excluded from our study are listed in Table 2.
Assessment of methodological quality
The methodological quality of each paper was assessed independently by the authors (A.A., A.K.), and disagreements were resolved by arbitration of the senior author (F.C.). Validity criteria assessing internal and external validity were explicitly described and coded according to Cochrane Methods Group on Screening and Diagnostic Tests.A Internal validity included the following factors: study design, definition of the disease, blind execution of the index test (questionnaire) and the reference test (polysomnography), valid reference test, avoidance of verification bias, independent interpretation of test results. External validity consisted of the following items: disease spectrum, clinical setting, demographic information, previous screening or referral filter, explicit cut-offs, percentage of missing patients, missing data management, subject selection for polysomnography.
Data extraction and analysis
Data were extracted by two reviewers (A.A., A.K.) independently using standard data collection forms. In each study, the true positive, false positive, true negative, and false negative values were extracted for each AI, AHI, or RDI cut-off, and 2 by 2 contingency tables were constructed accordingly. The AI/AHI or RDI ≥ 5 were considered as diagnosis cut-offs for the existence of OSA. The AI/AHI or RDI ≥ 15 and 30 were considered as diagnosis cut-offs for moderate and severe OSA, respectively. Using the 2 by 2 contingency tables, we recalculated the following predictive parameters in each study: sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and prevalence of OSA in each study. Results were not extracted that would require extrapolations from equations, graphs, or derivations from figures or tables. Studies were excluded from the review if there was inadequate information to draw the 2 by 2 contingency tables (seven papers).11 , 16 - 20
A validated computer program for meta-analysis of test accuracy data (Meta-DiSc,21 version 1.4, Hospital Ramony Cajal, Madrid, Spain) was used to describe the overall accuracy of the questionnaires and to assess inconsistencies in accuracy parameters (sensitivity and specificity) across studies (heterogeneity). Accuracy parameters with a similar target population were analyzed together (sleep-disorder patients vs patients without history of sleep disorders). Inconsistency (12) > 50% was considered as significant. Subgroup analysis was carried out on studies using the same questionnaire to explore the reasons of heterogeneity. Meta-analysis was not carried out on other predictive parameters (PPV and NPV), as they are not related to the intrinsic quality of the questionnaires. However, all parameters were presented in the review for descriptive analysis.
Results
Our extensive search strategy yielded 4,105 citations (Figure 1). After screening, most studies were eliminated based on the eligibility criteria, and only ten papers were considered for final inclusion in the review. The characteristics of the included studies are summarized in Table 1. The excluded studies11 , 16 - 20 , 22 - 24 and the reasons for their exclusion are listed in Table 2
.
Subject characteristics
A total of 1,484 participants were in the included studies, and 350 of them were “patients with sleep-related disorders”, i.e., patients in the sleep clinics or those with habitual snoring. Weatherwax 2003, who studied the validity of a questionnaire on epileptic patients,25 was considered in this category, as up to one-third of epilepsy patients could have coexisting OSA.25 Studies on “patients without history of sleep disorders” included a total of 1,134 subjects (Table 1). In this group, Sharma 2006a enrolled patients using a pre-screening questionnaire that included questions related to OSA risk factors. The sample size for all studies ranged from 42 to 602 patients. The mean age of the subjects in the studies varied from 42 to 55 yr. The male ratio and body mass index (BMI) ranged from 45.8-79.3% and from 24.6-30.2, respectively. Four studies were performed in the USA,25 , 27 - 29 three in Canada,30 - 32 two in India,33 , 34 and one in Europe.35 Nine studies were prospective cohort, while one study30 was a retrospective chart review of patients who had already undergone PSG in a sleep laboratory (Table 1).
The questionnaire characteristics
There were eight questionnaires developed and/or validated in the included studies. The Berlin questionnaire was the most common questionnaire among the studies (four studies), followed by the Wisconsin sleep questionnaire (two studies). Apnea score (AS), Haraldsson’s questionnaire, and the Sleep Apnea scale of the Sleep Disorders Questionnaire (SA-SDQ) were validated in sleep-disorder patients only. The latter was a modified version of SA-SDQ, which was validated on epileptic patients. The original SA-SDQ questionnaire was developed by Douglas et al.;17 however, their study could not be included in this review due to insufficient data regarding the accuracy parameters (Table 2). The checklist of the American Society of Anesthesiologists (ASA), the STOP questionnaire (snoring, tiredness, observed apnea, and high blood pressure), and the STOP-Bang questionnaire (STOP including BMI, age, neck circumference, gender) were validated on surgical patients. The Wisconsin questionnaire was validated on patients selected from the general population, and the Berlin questionnaire was validated in differing populations, i.e., sleep clinic patients, general population, and surgical patients.
The details of each questionnaire and its scoring method are shown in Table 3. All eight questionnaires used loud snoring (or snoring) and stop breathing during sleep as two of the components of their questionnaires. In five questionnaires, daytime sleepiness was one of the questions. Three questionnaires noted body mass index and high blood pressure. A history of adenoidectomy or anatomical problems of airways was considered in three questionnaires, and measurement of neck circumference was used in two questionnaires. The summary of items used in the OSA questionnaires is shown in Table 4. Three questionnaires (ASA checklist, STOP, and STOP-Bang) used a yes/no format; however, others had Likert-type (frequency) questions. The number of questions in each questionnaire ranged from three to 12 items.
The validation tool was one overnight sleep laboratory PSG in eight studies, except for Netzer’s study of the Berlin questionnaire. In Netzer’s study, the Berlin questionnaire was validated by the patient taking the ambulatory PSG recorder home. The patient was given instructions on how to use the recording device and was told to turn on the device at bedtime and to turn it off upon arising. One retrospective chart review on the Berlin questionnaire used the result of two consecutive nights of PSG as the “gold standard”. In seven studies, the AHI ≥ 5 was used as the PSG cut-off score for OSA diagnosis; AI ≥ 5 was used by one study, and RDI ≥ 5 or 10 was used by two studies.
Methodological quality of the included studies
In terms of the internal validity, all of the included studies used a valid reference test to verify the accuracy of the questionnaires (Table 5). Netzer 1999, however, partially attained this factor, as they used a portable PSG method for validation.28 Only three studies adequately addressed the validation process of the questionnaires (Chung 2008a, Chung 2008b, and Ahmadi 2008).30 - 32 The other included studies did not have specific information to clearly evaluate the risk of bias during the validation process of their questionnaires. More specifically, the following aspects were not specified in the papers: 1) blind execution of the PSG and questionnaire, i.e., those who performed the PSG were unaware of the results of the questionnaire (and vice versa); 2) avoidance of verification bias, i.e., interpretation of the PSG results was performed independent of the questionnaire results; and 3) interpretation of the PSG results was performed independent from the patient’s clinical history. Therefore, the risk of bias cannot be eliminated in these studies. Chung 2008 studies clearly specified all of the above criteria, and Ahmadi 2008, a retrospective study, adequately addressed items two and three, but not the first item (blind execution of PSG).30 Overall, the studies by Chung et al. on the STOP and STOP-Bang questionnaires had the highest internal validity.31 , 32
In terms of the external validity (generalizability), most of the studies met the appraisal items adequately (Table 5). In Sharma 2006a, a pre-screening set of questions was used to select the subjects out of the general population;33 therefore, this study is missing one of the important components of the external validity and is at risk of screening bias, i.e., those who were selected for the validation of the screening questionnaire already had some type of sleep-related problems, and they do not necessarily represent the target population of the study. Another important aspect of the external validity was the management of the missing data, which was only carried out in three studies (Chung 2008a, Chung 2008b and Young 1993).29 , 31 , 32 Chung et al. and Young et al. compared the basic characteristics (age, sex, BMI) of those patients who agreed to undergo PSG vs those who refused. Chung et al. showed a significant difference between the two groups, i.e., the PSG group had a higher BMI (30 ± 7 vs 28 ± 6; P < 0.05). Other characteristics in both studies were similar between the two groups.
Results of accuracy outcomes and other predictive parameters
In studies on “sleep-disorder patients”, the prevalence of OSA (AHI/AI or RDI ≥ 5) ranged from 42-76%. The sensitivity of different questionnaires in predicting OSA ranged from 59-81%, with Haraldsson’s questionnaire showing the highest sensitivity. The pooled sensitivity was 72.0% (95% CI: 66.0-78.0%; I2 = 23.0%). In this category, the specificity ranged from 46-80%, with Haraldsson’s questionnaire showing the highest specificity. The pooled specificity was 61.0% (95% CI: 55.0-67.0%; I2 = 43.8%). The PPV and NPV values ranged from 48-92% and from 57-72%, respectively (Table 6).
In studies on “patients without history of sleep disorders”, the prevalence of OSA ranged from 21-69%. The sensitivity of different questionnaires in predicting OSA (AHI/AI or RDI ≥ 5) ranged from 66-95% with the Wisconsin Sleep questionnaire showing the highest sensitivity followed by the Berlin and the STOP-Bang questionnaires (Table 7). The pooled sensitivity was 77.0% (95% CI: 73.0-80.0%; I2 = 78.1%). To explore the reasons of heterogeneity, i.e., I2 > 50%, we performed subgroup analysis on studies using the same questionnaire, i.e., the Berlin or the Wisconsin sleep questionnaire, but this did not yield consistent results. The pooled sensitivity for the Berlin questionnaire was 77.0% (95% CI: 71.0-82.0%; I2 = 79.4%) and for the Wisconsin questionnaire was 83.0% (95% CI: 76.0-88.0%; I2 = 84.0%). As it is shown in Table 7, the average sensitivity of the questionnaires varied from 66-86% in all studies except Wisconsin (Sharma 2006b), which was shown to have an unexpectedly high value (95%) of sensitivity. This study was carried out by the same author of another study where the authors used a pre-screening set of questions to select the subjects out of the general population. After excluding this study as an outlier, the overall sensitivity was calculated as 76.0% (95% CI: 72.0-79.0%), and the index of heterogeneity of the results was reduced to I2 = 73%.
The specificity of different questionnaires in studies on “patients without history of sleep disorders” ranged from 38-95% with the Berlin questionnaire showing the highest specificity. The pooled specificity was 53.0% (95% CI: 50-57%; I2 = 88.8%). Subgroup analyses on studies using the same questionnaire, i.e., the Berlin or the Wisconsin sleep questionnaire, delivered inconsistent results. The pooled specificity for the Berlin questionnaire was 74.0% (95% CI: 65.0-81.0%; I2 = 90.7%) and the Wisconsin questionnaire was 50.0% (95% CI: 46.0-52.0%; I2 = 90.9%). As it is shown in Table 7, the average specificity of the questionnaires ranged from 38-76% in all studies except the Berlin questionnaire (Sharma 2006a), which was shown to have an unexpectedly high value (97%) of specificity. In that study, a pre-screening set of questions was used to select the subjects out of the general population. After excluding this study as an outlier, the overall specificity was calculated as 51.0% (95% CI: 48.0-55.0%), and the index of heterogeneity of the results was reduced to I2 = 74.7%. The PPV and NPV values ranged from 28-96% and from 38-97%, respectively (Table 7).
The sensitivity of different questionnaires in predicting moderate OSA (AHI/AI ≥ 15) ranged from 54-93%, with STOP-Bang questionnaires showing the highest sensitivity (Table 8). The prevalence of moderate OSA was 8-70% among the included studies. The pooled sensitivity was 77.0% (95% CI: 73.0-81.0%; I2 = 85.6%). The specificity of the questionnaires in predicting moderate OSA varied from 37-97% with the Berlin questionnaires showing the highest specificity (Table 8). The pooled specificity was 44% (95% CI: 41.0-47.0%; I2 = 84.0%). The PPV and NPV ranged from 11-97% and from 48-97%, respectively.
With regard to predicting severe OSA (AHI/AI ≥ 30), the sensitivity was very variable in the studies ranging from 17-100%, with the STOP-Bang questionnaire having the highest value. The specificity varied from 36-97%, with the Berlin questionnaire showing the highest specificity (Table 9). The prevalence of severe OSA was 22-69% among the included studies. The pooled sensitivity for predicting severe OSA was 67% (95% CI: 60.0-73.0%; I2 = 96.8%). The respective value for pooled specificity was 45% (95% CI: 41.0-49.0%; I2 = 91.9%). The PPV and NPV ranged from 31-92% and from 34-100%, respectively.
Discussion
This systematic review identified and evaluated eight available patient-based questionnaires for screening of OSA. Among the questionnaires validated on “sleep-disorder patients”, we found Haraldsson with the highest sensitivity and specificity. The accuracy of results are significantly heterogeneous in the studies on “patients without history of sleep disorders”, even in the studies on the same questionnaire. In terms of predicting the existence of OSA (AHI/AI ≥ 5), the Wisconsin and the Berlin questionnaires were shown to have the highest sensitivity and specificity, respectively. However, the validity of these studies is unclear due to the potential effects of pre-screening and the risk of verification bias. In terms of predicting moderate or severe OSA, the STOP-Bang and the Berlin questionnaires were found to have the highest sensitivity and specificity, respectively. The STOP and STOP-Bang questionnaires were found to have the highest methodological validity, reasonable accuracy, and easy-to-use features.
Due to the relatively high prevalence of undiagnosed OSA and its short- and long-term complications, a reliable screening tool is required for a quick prediction of OSA. A quick and reliable screening test would enable clinicians to detect the possibility of OSA during initial clinical visits and then determine those patients at high risk and either in need of further assessment or in need of immediate therapeutic treatment. Questionnaires can be appropriate tools for quick prediction of obstructive sleep apnea as they can be applied and scored easily as part of routine daily practice. This approach is tremendously important to anesthesia practitioners and to surgical patients, as there is insufficient time in the short preoperative period to complete an assessment of every patient with the standard diagnostic approach, i.e., sleep lab PSG.
An ideal screening questionnaire should have three important characteristics,36 namely, 1) Feasibility: Patients and healthcare providers should find the questionnaire user-friendly; 2) Accuracy: There should be a clear validation process that leads to high accuracy parameters; 3) Generalizability: Valid results should be realized when the questionnaire is used on different target populations, i.e., the questionnaire has been validated in different study populations.
The response rate is generally considered as an index factor of the feasibility of a questionnaire.36 However, only three of the ten studies that we evaluated reported their response rate.29 , 31 , 34 All three studies were on the general population, and the response rate ranged from 82-91%. We assumed that most patients who were referred to sleep clinics for their sleep-related complaints were more willing to respond to the questions. If this assumption is true, the response rate should be higher in this group of patients than in the general population. Thus, an easier and more straightforward questionnaire is needed to be used in the general population to achieve a higher response rate. Some authors have taken this need into consideration. While the older type of questionnaire, such as the SA-SDQ,25 included a greater number of questions that were more complicated, the newer questionnaires, such as STOP and STOP-Bang, have fewer and more straightforward questions.31 , 32
The imprecision we detected in the accuracy of these questionnaires as screening tools could have arisen from several factors. One factor could be the methodological quality of the papers,37 which was poorly reported in the majority of the studies we evaluated. Methodological quality was clearly reported in the studies by Chung et al. and Ahmadi 2008.30 - 32 Compared with the studies by Ahmadi 2088,8 , 30 the studies by Chung et al. 31 , 32 are considered as being even higher quality due to their prospective design and the blind execution of PSG, i.e., the persons responsible for carrying out the standard PSG for the patients were not aware of the results of the questionnaires. Other studies did not address the validation process of the questionnaires, fully evaluate the validity of their results, or, more specifically, eliminate the risk of verification bias. This bias occurs when the disease status, i.e., OSA, is not determined in all subjects who are tested (or screened by the questionnaire) and when the probability of verification depends on the questionnaire results, other clinical variables, or both, rather than those of the “gold standard”, i.e., PSG.38
When verification of disease status is anticipated among high-risk patients, a bias is introduced that can markedly increase the apparent sensitivity of the questionnaire and reduce its specificity. Sharma 2006a studied the Berlin questionnaire in the general population and yielded high predictive parameter values (sensitivity 85%).33 This study has selection bias, as a primary “pre-screening” was performed before the subjects were selected for the questionnaire being studied. This primary screening was performed using a questionnaire covering four main variables of OSA (snoring, tiredness, obesity, and hypertension). Subjects answering yes to these questions proceeded to the next step to be selected as the subjects for the validation analysis of the study questionnaire. Consequently, the study subjects could not represent the general population. This approach resulted in heterogeneity when a meta-analysis was attempted.
Another factor in the inaccuracy of these questionnaires was the variety of the target populations among the different studies. In an effort to unify the subject populations, all studies were divided into two major groups: studies on patients “with sleep disorder problems”25 , 27 , 30 , 35 and studies on “patients without known sleep problems”.28 , 29 , 31 - 34 Due to the high prevalence of OSA in patients with sleep disorders, we could not use the first group of studies as a reference for the strength of the questionnaires designed to screen OSA in the general population. Recent evidence shows that a test performance varies in different populations because of the severity of the disease. For example, a patient population with a higher disease prevalence may include more severely diseased patients; therefore, the test would perform better in this population.39 It is also important to emphasize that an ideal diagnostic test in a general population should have a relatively high specificity to minimize false positives, nevertheless, it should have sufficient sensitivity. Conversely, an ideal diagnostic test in a population with a high pre-test probability of disease should have higher sensitivity while maintaining high specificity.40
Lack of a standard definition for some factors involved in OSA questionnaires could also result in the heterogeneity of the data among the studies. For example, the cut-off numbers used for BMI ranged from 25-35, and different scales were used for snoring in the questionnaires. The Epworth Sleepiness Scale (ESS) is a standard questionnaire to measure daytime sleepiness;41 however, Osman et al. showed that this questionnaire had no value in distinguishing simple snorers from patients with OSA.42 Standard definition of factors is a prerequisite for standardizing questionnaire components and evaluating their values in different combinations.
It is important to screen OSA patients in the perioperative setting to identify those who require further management. An ideal questionnaire should be sensitive enough to detect any patients with OSA, but more importantly, it should recognize patients with severe forms of OSA, as they may likely have perioperative complications. In this regard, there is not enough evidence available in the literature to indicate the AHI/Al cut-off that places the patient at significant increased risk of postoperative complications. However, the correlation between AHI scores and lifetime complications, such as motor vehicle accidents and the risk of atrial fibrillation, has been suggested in non-surgical populations.43 Therefore, we can hypothesize that the more apnea episodes patients experience, the more vulnerable they are to the effect of anesthesia and surgery. In this regard, the Wisconsin sleep questionnaire was not validated for detection of severe OSA. The Berlin questionnaire, which was shown to have high sensitivity for detecting OSA (69-86%), was found to be relatively less sensitive in detecting moderate and severe cases, i.e., sensitivity: 54-79% and 17-87%, respectively. The STOP-Bang questionnaire was shown to have consistently high sensitivity for detecting OSA in different AHI cut-offs and severity levels (AHI ≥ 5: 84%, AHI ≥ 15: 93%, AHI ≥ 30: 100%). This was achieved in exchange of losing the specificity of the questionnaire, which is not the major concern in the preoperative setting, as screened patients will always be advised to confirm their diagnosis with postoperative PSG at their leisure.
A meta-analysis of clinical screening tests for obstructive sleep apnea was conducted by Ramachandran et al. in 2008.44 This study included eight papers on questionnaires and 18 articles on clinical prediction tests, including clinical scales, algorithms, and prediction models. We can distinguish our systematic review from Ramachandran’s review in at least three different areas, i.e., the main focus, the presentation of the results, and the conclusion. First, our systematic review is focused only on questionnaires, whereas Ramachandran’s review involves other types of clinical screening tools. Since questionnaires do not need sophisticated mathematical calculations, we consider them as being more practical and, therefore, more convenient to be used in the daily clinical practice. Second, although our method of presenting the details of our data is similar to Ramachandran’s method, we have included descriptive features, such as tables of the included and excluded studies and tables of the included questionnaires and quality assessments. While the included studies are extremely diverse in their quality, design, and patient population, the summary tables allow for individual interpretation of the available literature on the OSA questionnaires. Finally, because of the inconsistent results, even between studies regarding the same questionnaire, we did not make a definite conclusion regarding the most accurate questionnaire. However, we did recommend the STOP (or STOP-Bang) questionnaire in consideration of its high-quality methodology. Although Ramachandran et al. evaluated the quality of the papers, their review did not provide the details of the assessment, and this factor was not taken into consideration in the synthesis of their conclusion.
We concluded that questionnaires have the potential to screen patients for high risk of having OSA, and this approach can raise the awareness of anesthesiologists and surgeons to the possibility of OSA in surgical patients. Also, this approach may facilitate early detection of patients who need further assessment and who would benefit from perioperative precautions for OSA patients. Identification of patients at risk of OSA could potentially reduce the rate of OSA-related postoperative complications.45 Due to the inconsistent literature, it is difficult to draw a definite conclusion regarding the most accurate OSA questionnaire available. However, we suggest the STOP questionnaire and its extended version, STOP-Bang, for OSA screening in surgical patients.
Despite the STOP questionnaire being developed in our institute by the corresponding author of this review, we used a systematic approach to review the literature objectively following the standard guidelines. The STOP and STOP-Bang questionnaires have high-quality methodological and reasonably accurate results. The scoring method is straightforward, and the acronym, STOP, (snoring, tiredness, observed apnea, and high blood pressure) makes it easy for clinicians to remember. For anesthesiologists facing a large number of patients in the preoperative clinic, this questionnaire could serve as a quick tool to screen OSA in surgical patients and could increase the awareness of OSA precautions in perioperative management.
Notes
Cochrane Methods Group on Screening and Diagnostic Tests, 2007. http://www.cochrane.org/cochrane/sadt.html.
References
Young T, Hutton R, Finn L, Badr S, Palta M. The gender bias in sleep apnea diagnosis. Are women missed because they have different symptoms? Arch Intern Med 1996; 156: 2445-51.
Young T, Evans L, Finn L, Palta M. Estimation of the clinically diagnosed proportion of sleep apnea syndrome in middle-aged men and women. Sleep 1997; 20: 705-6.
Shahar E, Whitney CW, Redline S, et al. Sleep-disordered breathing and cardiovascular disease: cross-sectional results of the Sleep Heart Health Study. Am J Respir Crit Care Med 2001; 163: 19-25.
Turkington PM, Sircar M, Allgar V, Elliott MW. Relationship between obstructive sleep apnoea, driving simulator performance, and risk of road traffic accidents. Thorax 2001; 56: 800-5.
Ing AJ, Ngu MC, Breslin AB. Obstructive sleep apnea and gastroesophageal reflux. Am J Med 2000; 108(Suppl 4): 120S-5S.
Idris I, Hall AP, O’Reilly J, et al. Obstructive sleep apnoea in patients with type 2 diabetes: aetiology and implications for clinical care. Diabetes Obes Metab 2009; 11: 733-41.
Abrams B. Long-term sleep apnea as a pathogenic factor for cell-mediated autoimmune disease. Med Hypotheses 2005; 65: 1024-7.
Rowley JA, Aboussouan LS, Badr MS. The use of clinical prediction formulas in the evaluation of obstructive sleep apnea. Sleep 2000; 23: 929-38.
Crocker BD, Olson LG, Saunders NA, et al. Estimation of the probability of disturbed breathing during sleep before a sleep study. Am Rev Respir Dis 1990; 142: 14-8.
Viner S, Szalai JP, Hoffstein V. Are history and physical examination a good screening test for sleep apnea? Ann Intern Med 1991; 115: 356-9.
Flemons WW, Whitelaw WA, Brant R, Remmers JE. Likelihood ratios for a sleep apnea clinical prediction rule. Am J Respir Crit Care Med 1994; 150: 1279-85.
Maislin G, Pack AI, Kribbs NB, et al. A survey screen for prediction of apnea. Sleep 1995; 18: 158-66.
Deville WL, Buntinx F, Bouter LM, et al. Conducting systematic reviews of diagnostic studies: didactic guidelines. BMC Med Res Methodol 2002; 2: 9.
Irwig L, Tosteson AN, Gatsonis C, et al. Guidelines for meta-analyses evaluating diagnostic tests. Ann Intern Med 1994; 120: 667-76.
Walter SD, Irwig L, Glasziou PP. Meta-analysis of diagnostic tests with imperfect reference standards. J Clin Epidemiol 1999; 52: 943-51.
Bliwise DL, Nekich JC, Dement WC. Relative validity of self-reported snoring as a symptom of sleep apnea in a sleep clinic population. Chest 1991; 99: 600-8.
Douglass AB, Bornstein R, Nino-Murcia G, et al. The Sleep Disorders Questionnaire I: creation and multivariate structure of SDQ. Sleep 1994; 17: 160-7.
Gurubhagavatula I, Maislin G, Pack AI. An algorithm to stratify sleep apnea risk in a sleep disorders clinic population. Am J Respir Crit Care Med 2001; 164: 1904-9.
Pouliot Z, Peters M, Neufeld H, Kryger MH. Using self-reported questionnaire data to prioritize OSA patients for polysomnography. Sleep 1997; 20: 232-6.
Rosenthal LD, Dolan DC. The Epworth sleepiness scale in the identification of obstructive sleep apnea. J Nerv Ment Dis 2008; 196: 429-31.
Zamora J, Abraira V, Muriel A, Khan K, Coomarasamy A. Meta-DiSc: a software for meta-analysis of test accuracy data. BMC Med Res Methodol 2006; 6: 31.
Izci B, Ardic S, Firat H, Sahin A, Altinors M, Karacan I. Reliability and validity studies of the Turkish version of the Epworth Sleepiness Scale. Sleep Breath 2008; 12: 161-8.
Onen SH, Dubray C, Decullier E, Moreau T, Chapuis F, Onen F. Observation-based Nocturnal Sleep Inventory: screening tool for sleep apnea in elderly people. J Am Geriatr Soc 2008; 56: 1920-5.
Teculescu D, Guillemin F, Virion JM, et al. Reliability of the Wisconsin Sleep Questionnaire: a French contribution to international validation. J Clin Epidemiol 2003; 56: 436-40.
Weatherwax KJ, Lin X, Marzec ML, Malow BA. Obstructive sleep apnea in epilepsy patients: the Sleep Apnea scale of the Sleep Disorders Questionnaire (SA-SDQ) is a useful screening instrument for obstructive sleep apnea in a disease-specific population. Sleep Med 2003; 4: 517-21.
Malow BA, Levy K, Maturen K, Bowes R. Obstructive sleep apnea is common in medically refractory epilepsy patients. Neurology 2000; 55: 1002-7.
Kapuniai LE, Andrew DJ, Crowell DH, Pearce JW. Identifying sleep apnea from self-reports. Sleep 1988; 11: 430-6.
Netzer NC, Stoohs RA, Netzer CM, Clark K, Strohl KP. Using the Berlin Questionnaire to identify patients at risk for the sleep apnea syndrome. Ann Intern Med 1999; 131: 485-91.
Young T, Palta M, Dempsey J, Skatrud J, Weber S, Badr S. The occurrence of sleep-disordered breathing among middle-aged adults. N Engl J Med 1993; 328: 1230-5.
Ahmadi N, Chung SA, Gibbs A, Shapiro CM. The Berlin questionnaire for sleep apnea in a sleep clinic population: relationship to polysomnographic measurement of respiratory disturbance. Sleep Breath 2008; 12: 39-45.
Chung F, Yegneswaran B, Liao P, et al. STOP questionnaire: a tool to screen patients for obstructive sleep apnea. Anesthesiology 2008; 108: 812-21.
Chung F, Yegneswaran B, Liao P, et al. Validation of the Berlin questionnaire and American Society of Anesthesiologists checklist as screening tools for obstructive sleep apnea in surgical patients. Anesthesiology 2008; 108: 822-30.
Sharma SK, Vasudev C, Sinha S, Banga A, Pandey RM, Handa KK. Validation of the modified Berlin questionnaire to identify patients at risk for the obstructive sleep apnoea syndrome. Indian J Med Res 2006; 124: 281-90.
Sharma SK, Kumpawat S, Banga A, Goel A. Prevalence and risk factors of obstructive sleep apnea syndrome in a population of Delhi, India. Chest 2006; 130: 149-56.
Haraldsson PO, Carenfelt C, Knutsson E, Persson HE, Rinder J. Preliminary report: validity of symptom analysis and daytime polysomnography in diagnosis of sleep apnea. Sleep 1992; 15: 261-3.
Fitzpatrick R, Davey C, Buxton MJ, Jones DR (1998) Evaluating patient-based outcome measures for use in clinical trials. Health Technol Assess 2: i-iv, 1-74.
Reid MC, Lachs MS, Feinstein AR. Use of methodological standards in diagnostic test research. Getting better but still not good. JAMA 1995; 274: 645-51.
Punglia RS, D’Amico AV, Catalona WJ, Roehl KA, Kuntz KM. Effect of verification bias on screening for prostate cancer by measurement of prostate-specific antigen. N Engl J Med 2003; 349: 335-42.
Leeflang MM, Bossuyt PM, Irwig L. Diagnostic test accuracy may vary with prevalence: implications for evidence-based diagnosis. J Clin Epidemiol 2009; 62: 5-12.
Ross SD, Sheinhait IA, Harrison KJ, et al. Systematic review and meta-analysis of the literature regarding the diagnosis of sleep apnea. Sleep 2000; 23: 519-32.
Johns MW. A new method for measuring daytime sleepiness: the Epworth sleepiness scale. Sleep 1991; 14: 540-5.
Osman EZ, Osborne J, Hill PD, Lee BW. The Epworth Sleepiness Scale: can it be used for sleep apnoea screening among snorers? Clin Otolaryngol Allied Sci 1999; 24: 239-41.
Shafazand S. Perioperative management of obstructive sleep apnea: ready for prime time? Cleve Clin J Med 2009; 76(Suppl 4): S98-103.
Ramachandran SK, Josephs LA. A meta-analysis of clinical screening tests for obstructive sleep apnea. Anesthesiology 2009; 110: 928-39.
Liao P, Yegneswaran B, Vairavanathan S, Zilberman P, Chung F. Postoperative complications in patients with obstructive sleep apnea: a retrospective matched cohort study. Can J Anesth 2009; 56: 819-28.
Acknowledgement
The authors thank Marina Englesakis, B.A. (Hons), M.L.I.S. Information Specialist, Librarian, Toronto Western Hospital, Toronto, Ontario, Canada, for her help with the literature search.
Funding disclosure
Funding from the Department of Anesthesia, Toronto Western Hospital, University Health Network, University of Toronto.
Conflicts of interest
None declared.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Abrishami, A., Khajehdehi, A. & Chung, F. A systematic review of screening questionnaires for obstructive sleep apnea. Can J Anesth/J Can Anesth 57, 423–438 (2010). https://doi.org/10.1007/s12630-010-9280-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12630-010-9280-x