Abstract
BACKGROUND: Sputum production and purulence were proposed as criteria for justifying the use of antimicrobial agents. The Sputum Color Chart was developed and validated to standardize purulence of sputum evaluation. The aim of this study was to observe the reproducibility of the Sputum Color Chart from different categories of health caregivers.
METHODS: The color of 10 sputum samples was evaluated using photographs for intra- and inter-reliability. The observation was repeated 3 times. Eighteen volunteers from 6 categories of health caregivers (student in physiotherapy, senior chest physiotherapist, junior resident in pulmonology, medical microbiologist, pulmonologist, and general practitioner) were investigated.
RESULTS: Poor inter-rater reliability was observed for all categories with the exception of senior chest physiotherapists. The best intra-rater reliability was observed for microbiologists and senior chest physiotherapists. We found a great proportion (>40%) of important discrepancies in 2 categories (junior pulmonologist and general practitioner). The proportion of non-discrepancy between evaluators varied between 10 and 40%, depending on the category.
CONCLUSIONS: Even if the Sputum Color Chart is a useful tool for the clinician in the context of clinical deterioration, it presents non-uniform reliability regarding the caregivers and their category.
Introduction
Sputum is a common symptom in various pulmonary diseases. Indeed, sputum production is typical in cystic fibrosis or COPD for physiopathological reasons, and patients with bronchiectasis frequently present with a longstanding productive cough.1
Sputum production is also associated with inflammatory processes in the airways2 and with the progression of obstruction3 in smokers. Additionally, it is related to the mortality rate in patients with COPD.4
Sputum color is related to the degree of infection. Bacterial colonization has been demonstrated to be low (5%), moderate (43.5%), and very important (86.4%) in mucoid, mucopurulent, and purulent sputum, respectively (P < .001).5 The purulence of sputum has also been associated with the bacterial load during exacerbations in subjects with COPD.6
The decision to prescribe an antimicrobial agent in patients with an exacerbation is often difficult. Sputum production and purulence were proposed as criteria for defining an exacerbation and justifying the use of antimicrobial agents.7 Usually, the prescription is ordered empirically, based on clinical evidence of purulence. Hence, the color of sputum could play an important role in clinical practice in lung diseases with productive cough even if that role is still debated.8 It has been proposed to take into account the sputum color to determine the clinical impact of COPD (the current consequences of the disease for the patient).9 However, the color can be difficult to evaluate, and when reported by patients, it is not reliable.10
For this purpose, Murray et al5 developed and validated a quantitative method (Sputum Color Chart) to standardize the evaluation. It allows clinicians to report sputum color by providing an accurate representation of the 3 major grades of color.
Such validated tools are necessary to facilitate management. A good inter-rater reproducibility was observed between the physicians and patients.5 However, some other psychometrics properties must be investigated as they are for questionnaires. A complete tool validation requires the evaluation of its reproducibility. Reproducibility concerns the degree to which repeated measurements in steady state provide similar answers. It includes reliability and agreement. These properties are usually evaluated by the weighted Cohen's kappa coefficient and the Bland-Altman analysis, respectively.11 Reliability concerns the power of distinction between patients.11 Agreement concerns the absolute measurement error, which refers to clinically important changes.11 The aim of this study was to observe the reproducibility of the Sputum Color Chart for 3 evaluators from 6 categories of health caregivers who can be routinely faced with sputum evaluation in their clinical practice.
QUICK LOOK
Current knowledge
Sputum color is related to the degree of infection, and the purulence of sputum has been associated with bacterial load in subjects with COPD exacerbation. The Sputum Color Chart was developed and validated to standardize the purulence of sputum evaluation.
What this paper contributes to our knowledge
Among health caregivers, we found poor inter-rater reliability for all categories except senior chest physiotherapists. Although the Sputum Color Chart may be useful in the context of assessing clinical deterioration, it is not reliable across health caregiver categories.
Methods
Evaluators
Volunteers with different clinical backgrounds and professional experience were recruited in a tertiary hospital. Eighteen volunteers from 6 categories of caregivers were investigated: 3 students in the last year of physiotherapy, 3 senior chest physiotherapists, 3 junior residents in pulmonology, 3 medical microbiologists, 3 pulmonologists, and 3 general practitioners. The single exclusion criterion was a volunteer who presented with a diagnosis of color vision deficiency. All of the volunteers were unfamiliar with the Sputum Color Chart.
Design
Ten consecutive sputum samples in doublet were collected from hospitalized patients independent of their characteristics and bacteriology. The selection of samples was based on consecutive bacteriological analysis requests. The test was ordered by a physician independent of the study. One sample was used for the study, and the other one was sent to the microbiology laboratory for bacteriological analysis. The 2 inclusion criteria were the presence of expectoration and a diagnosis of chronic respiratory disease. These 10 sputum samples were placed on a white support and were photographed. The pictures were sent by electronic message to the evaluators for reading with the Sputum Color Chart5 on a similar screen. The Sputum Color Chart is a chart using photographs of sputum representing the 3 typical gradations of color (mucoid [clear], mucopurulent [pale yellow or green], and purulent [dark yellow or green]). The reading was repeated 3 times.
Statistics
Statistics were evaluated using SPSS 22.0 (IBM Corporation, Armonk, New York). A descriptive analysis was performed to describe the readings. Discrepancies (defined by the maximal difference between 3 readings) were observed in the triplicate readings of the same sputum by one evaluator and in the first reading by 3 evaluators of the same health caregiver category. Eighteen raters were included in the study because it was suggested that kappa can be validly applied when the number of subjects being rated is >2 × C2 (where C is the number of categories in the assessment tool).12
Cohen's kappa coefficient was calculated and Bland-Altman analysis was conducted to evaluate the reliability and the agreement, respectively. Kappa values were interpreted as follows: >0.80 was very good, 0.61–0.80 was good, 0.41–0.60 was moderate, 0.21–0.40 was fair, and <0.21 was poor.13 Bland-Altman analysis was performed for the greatest difference between 3 measurements of the same evaluator or the same health caregiver category and the mean of the 3 measurements. A Kruskal-Wallis test was performed to compare the results between the different categories of health caregivers for the first evaluation.
Results
Bacteriology and Subjects
Subjects were between 18 and 67 y old. They had various respiratory diseases (cystic fibrosis [n = 3] and COPD [n = 7]), and all were hospitalized for exacerbation.
Samples (Fig. 1) were characterized as follows: a normal flora for 6 samples, Pseudomonas aeruginosa in 2 samples, and Staphylococcus aureus in 2 samples.
Intra-Rater Reproducibility for the 6 Categories of Health Caregivers
The discrepancies, the reliability, and the agreement between the 3 readings of each evaluator from the 6 categories are presented in Figure 2 and Table 1. Only 20 of 54 Cohen's kappa coefficients were <0.60. We observed an absence of discrepancy from 30–100% of the secretions for all evaluators, independent of the category of caregivers. The proportion of the maximal discrepancy (=2) between the 3 readings was very low or null for all of the categories. Only one evaluator among the students in physiotherapy and the physiotherapists had 2-point discrepancies for one secretion.
Inter-Rater Reproducibility for the 6 Categories of Health Caregivers
The reliability and the agreement between the 18 evaluators from the 6 categories are presented in Table 2. Comparison of the first reading of the 10 sputum samples of each evaluator showed a great proportion of important discrepancies (=2) in 2 categories (40 and 50% for students in the last year of physiotherapy and general practitioners, respectively) (Fig. 3). Only 3 of 18 Cohen's kappa coefficients were >0.60, and 2 of them were observed between 2 chest physiotherapists. All but the pulmonologists and the senior chest physiotherapists showed 2 discrepancies for >20% of the secretions. The proportion of non-discrepancy between evaluators was very low and varied between 10 and 40%, depending on the category. The comparison between the categories showed a significant difference in the sputum readings (P = .02).
Discussion
We evaluated the intra- and inter-rater reliability and the agreement of the Sputum Color Chart for 3 evaluators from 6 categories of caregivers who can be routinely faced with sputum evaluation in their clinical practice. Our study highlighted a non-uniform reliability and agreement with the Sputum Color Chart. This observation was not related to the category of caregivers.
Sputum color is considered to be a key element in clinical evaluation of subjects with respiratory diseases. Indeed, sputum color appears to be regularly used in routine practice and in research.14,15 Changes in sputum are included in the widely accepted Anthonisen definition of exacerbations in COPD7 and in their evaluation.16 In a large study on COPD exacerbations, sputum color was used as a marker of exacerbation, and the authors demonstrated that sputum purulence was strongly associated with bacterial growth.17 Although this criterion has lower sensitivity and more limited applicability in children than in adults due to the difficulty of inducing children to expectorate,18 it is also mentioned in the literature regarding exacerbations in pediatric subjects. Indeed, sputum color is one of the clinical criteria that improves specificity of the non-cystic fibrosis bronchiectasis exacerbation definition in children.19 Moreover, Goeminne et al20 used the Sputum Color Chart to demonstrate a relationship between sputum purulence from non-cystic fibrosis bronchiectasis subjects and inflammatory markers in sputum, history of P. aeruginosa, any form of colonization, or modified Brody scores. They defined the Sputum Color Chart as a tool that may predict the degree of inflammation, gelatinolytic activity, and disease severity in these subjects.20 When children with cystic fibrosis are interviewed about symptom improvement after exacerbations, they report, among other improvements, a sputum color change.21
As self-assessment of sputum color by patients was demonstrated to be unreliable,10 so there is a need for validated tools. Although the Sputum Color Chart previously demonstrated a good inter-rater reliability between doctors and subjects,5 no study had previously investigated the intra-rater reliability for different health caregivers and the inter-rater reliability between different health professionals. These properties are important, since they are major factors in the therapeutic decision,9 notably the prescription of antibiotics.7 Indeed, in a study, it appeared that subjects with COPD experiencing an exacerbation with purulent sputum were treated with antibiotics, and those with mucoid sputum were not. The evaluation of sputum purulence was based on sputum color evaluation with a 9-point color chart. The authors concluded that sputum color assessment could contribute to avoiding unnecessary antibiotic therapy.6
We observed poor global intra-rater reliability in our sample of raters. This means that a difference in sputum color was perceived by mistake by a lot of caregivers. Since a sputum color change implies a reason for the physician to prescribe antibiotics, we can speculate a non-justified prescription based on this parameter. However, when looking specifically by category of caregivers, better intra-rater reliability was observed for microbiologists and senior chest physiotherapists than for other categories of caregivers (13 of 18 Cohen's kappa coefficients of both of these categories were >0.41, which means a moderate to very good reliability). A very good reliability (Cohen's kappa >0.80) was found for microbiologists. This could be explained by the dedicated routine practice based on the evaluation of sputum for this category of caregivers. The clinical experience probably plays a role, since it was previously shown to be related to a better agreement in different kinds of evaluations,22,23 even if it is not systematically verified for all tools.24–26
The intra-rater discrepancies for all of the evaluators are heterogeneous. Only one reading of sputum produced 2 discrepancies for one junior and one senior physiotherapist. Bias of agreement is null for 1 of 3 of the evaluators and always <0.5, which means less than one classification of discrepancy. Also, this is not related to the category of caregivers.
Poor inter-rater reliability was observed for all categories with the exception of senior chest physiotherapists. The level of training could be an influential element on the inter-reliability. Its influence was previously demonstrated on auscultation reliability.27 Indeed, the students in physiotherapy and in pulmonology showed poor inter-reliability, with Cohen's kappa coefficient <0.20. However, we found a similar inter-reliability for pulmonologists. This can be considered surprising due to the experience of this category of caregivers regarding sputum observation. Moreover, it is questioning because the antibiotic prescription is based partially on this observation. In clinical practice, when an antibiotic prescription is based only on change in color sputum, it could be useful to combine sputum color evaluation from a physician and a microbiologist.
Global poor intra- and inter-rater reliability on the Sputum Color Chart reflects an inherent challenge with difficult evaluation of the color of secretions. Raters' disagreements on sputum color explain the negative Cohen's kappa coefficients that were calculated for a series of evaluations. Negative Cohen's kappa coefficients result when agreement occurs less often than predicted by chance alone. This suggests genuine disagreement between raters or an underlying issue with the instrument itself.28
Some limitations regarding this study need to be addressed. First, the sputum samples were presented as photographs, and this could influence the interpretation of the sputum color compared with true secretions. However, the Sputum Color Chart also uses photographs providing accurate representation of the 3 major grades of color, and the computer screen was always the same to avoid color modification due to the screen. Second, there was no control in the presentation of sputum. Third, generalizing our results to other raters should be performed with caution. Reliability could also differ according to other rater categories and other levels of skill. The individual level of training (as demonstrated previously for auscultation)25,26 of our raters potentially influenced our results, and our randomized sample might not represent all raters with similar disciplines and training. But this means that sputum color evaluation is complex.
Conclusions
Even if the Sputum Color Chart is a useful tool for the clinician in the context of clinical deterioration, it presents a non-uniform reliability regarding the caregivers and their category.
Footnotes
- Correspondence: Gregory Reychler PhD PT, Pneumology Unit, Cliniques Universitaires St-Luc (UCL), Avenue Hippocrate 10, 1200 Brussels, Belgium. E-mail: gregory.reychler{at}uclouvain.be.
The authors have disclosed no conflicts of interest.
- Copyright © 2016 by Daedalus Enterprises