Introduction

In the past two decades, significant progress has been made in defining and measuring patient-reported outcomes (PROs). A PRO instrument is defined as any measure of a patient’s health status that is directly elicited from the patient and assesses how the patient “survives, functions, or feels” in relation to his or her health condition [1, 2]. PROs are utilized for several purposes: (1) to serve as primary or secondary outcomes in clinical trials evaluating new pharmaceutical, behavioral, or surgical interventions [3], (2) to compare existing treatments, (3) to measure the impact of a disease and its treatments on daily functioning, (4) to aid in clinical decision-making, and (5) to analyze the costs and benefits of medical interventions [4, 5]. If used as an outcome measure in a clinical trial, it must meet rigorous psychometric criteria, including a well-defined conceptual framework, qualitative input from patients and health care providers, evidence of reliability and validity, and determination of the minimal important difference (MID) score to facilitate interpretation [1, 6, 7].

Although generic PRO instruments, such as the 36-Item Short Form Health Survey (SF-36) and the Quality of Well-Being Scale [8, 9], were initially the most widely used, numerous studies suggest that disease-specific measures are more sensitive to change and provide information that is more relevant for clinical interventions [2, 5, 10]. Efforts to develop reliable and valid PRO instruments have been very successful, resulting in numerous well-validated tools and a growing empirical and clinical database [11]. PROs are particularly important for patients with chronic diseases, such as cystic fibrosis (CF), whose daily lives are significantly impacted by their illness.

CF is an autosomal recessive disorder affecting multiple organs, including the lungs, pancreas, and digestive and reproductive systems. Median life expectancy has increased dramatically in the past 20–37.8 years in the United States; however, progression of lung disease continues to lead to shortened life spans [12], and in younger patients, achieving normal nutrition and growth are challenging. Despite the development of new palliative treatments for CF, patients experience frequent pulmonary exacerbations, hospitalizations, and segregation from peers due to multi-resistant bacteria. High rates of depression and anxiety have also been reported by both patients and caregivers [13, 14]. Beginning in adolescence, gender differences in both morbidity and mortality have been identified, with women experiencing greater declines in lung function than men [15, 16].

In addition, the treatment regimen for CF is highly complex and time-consuming, requiring 2–4 h of treatment every day. The treatment regimen includes multiple inhaled therapies, airway clearance two times per day, oral medications, and boosting calories to 110–200% of the recommended daily allowance. The challenges of adhering to this regimen include the time required, the complexity of using and cleaning the equipment, and its considerable cost [17]. These difficulties increase the importance of including PROs in clinical trials of new therapies, to assess patients’ perceptions of efficacy and treatment burden [18].

The Cystic Fibrosis Questionnaire was the first disease-specific PRO developed for CF patients and caregivers. It was developed using a life span approach, with developmentally appropriate versions for ages 6 years through adulthood. Its development began with a conceptual framework that included both disease-specific and generic domains, followed by qualitative interviews with patients, parents, and health care professionals. Cognitive testing of the items was then performed with revisions to the items and scales as necessary [1921].

The instrument underwent psychometric testing in its original version [7, 1921] and was then slightly revised. This revision included adding two items to the Role Functioning and one item to the Social Functioning scales of the teen/adult version, deleting one item from the Emotional Functioning and adding one item to the Social Functioning scales of the child version, and rewording one item on the Treatment Burden scale of the parent version.

In its revised form, the CFQ-R is currently the most widely used and well-validated PRO for CF [4, 10] and is the only instrument with versions for young children and parent caregivers [19]. The CFQ-R has now been translated into 34 languages, is being used in both international clinical trials and quality improvement initiatives at CF care sites in the US, and has appeared in dozens of publications [4, 10]. The CFQ-R Respiratory Symptom Scale has been used as an outcome measure in clinical trials evaluating the efficacy of new therapies, including inhaled antibiotics [3, 22], hypertonic saline [23, 24], and gene correctors and potentiators [25]. The CFQ-R Respiratory Symptoms Scale was recently used as a primary endpoint to support the approval of aztreonam for inhalation, a new inhaled antibiotic, for patients with CF [3]. The purpose of the current study was to evaluate the psychometric properties of the revised version of the CFQ-R, across all developmental versions (school-age children and their parents, adolescents, and adults with CF), and to provide normative data on a large, national sample of patients with CF ages 6 years through adulthood to aid in interpretation of test scores. Test–retest reliability and the MID have been established previously [6, 7].

Objectives of the study

The overall objective of this study was to evaluate the psychometric properties of the CFQ-R in a national sample of children, teens, and adults with CF, as well as parent caregivers. Specific objectives included:

  1. 1.

    Evaluation of CFQ-R scores for children, teens, and adults with CF and parent caregivers to provide normative data, estimation of floor and ceiling effects, and internal consistency.

  2. 2.

    Evaluation of discriminant validity comparing patients seen for “well” versus “sick” visits and among stages of lung disease based on pulmonary function.

  3. 3.

    Evaluation of gender differences between males and females with CF on specific domains of functioning.

  4. 4.

    Evaluation of agreement between parent–child dyads.

  5. 5.

    Evaluation of convergence between CFQ-R scores and health outcomes, including pulmonary function, body mass index (BMI), and number of courses of intravenous (IV) antibiotics.

Methods

The Epidemiologic Study of Cystic Fibrosis (ESCF) is a multicenter, longitudinal cohort study initiated by Genentech, Inc., in 1994 to collect data on care practices and outcomes of CF patients in the US. Beginning in 2003, HRQOL was assessed using a disease-specific instrument, the CFQ-R, which was administered before evaluation of objective clinical measures (e.g., spirometry) in order to prevent any bias in completing the PRO [7, 19]. The child version has fewer items (35 items) than the parent (44 items) or teen/adult version (50 items; see Table 1 for a list of scales and exemplar items). Responses are made using 4-point Likert rating scales that include frequency (always, often, sometimes, never), intensity (a great deal, somewhat, a little, not at all), and true–false scales (very true, somewhat true, somewhat false, very false). Scores are standardized across scales and range from 0 to 100, with higher scores indicating better HRQOL.

Table 1 CFQ-R scales and exemplar items

Data on patient demographics, spirometry, anthropometric characteristics, and therapies were collected at each clinic encounter. For children ages 6–11 years, the CFQ-R was administered by a trained nurse coordinator. Children were trained to use the rating scales on two practice items that corresponded to blue and orange rating cards which were used during administration. Children and parents completed the CFQ-R in separate rooms. All sites entered the data using electronic data capture with edit checks. This study was approved by the institutional review board at each participating site, and all participants or their guardians provided written informed consent.

Study population

Patients completed the age-appropriate CFQ-R between February 2003 and November 2005. The first age-appropriate CFQ-R form for each patient was used in this analysis, regardless of whether patients completed measures at multiple assessments. A patient was considered “sick” when completing the CFQ-R if there was any indication of sickness (as noted by the clinician on the encounter form) within ±21 days of the CFQ-R date. A patient was classified as “well” when completing the CFQ-R if there was at least one well encounter within ±21 days of the CFQ-R date. If no encounters were observed within this window, the closest encounter within 90 days before administration of the CFQ-R was used; if this encounter was “well” and there were only well or missing pulmonary function tests since that date, then the patient was considered “well” at the point of CFQ-R administration. The remainder of CFQ-Rs (those not classified as sick or well) were considered ambiguous and excluded from analysis. Patients were categorized according to forced expiratory volume in 1 s (FEV1) % predicted score closest to their date of CFQ-R (normal 100+; mild 70 to <100; moderate 40 to <70; severe <40).

Statistical analysis

Descriptive statistics were generated for demographic and clinical variables for each CFQ-R version (child, parent, teen/adult). Means and standard deviations of the domain scores were calculated within each version for all the CFQ-R forms and by visit type (sick versus well), disease stage (as defined by FEV1% predicted), and gender. For the sick-versus-well analysis, all available CFQ-R forms were used, including multiple forms per patient. Comparisons were drawn between groups within each version by calculating effect sizes (for visit type and gender) or by performing one-way analyses of variation (ANOVAs) with Tukey adjustment (for disease stage). P values should not be interpreted as tests of any specific null hypotheses, but rather as indexing the strengths of associations; hence, no adjustment for Type I error was made. The presence of linear trends of CFQ-R scores across disease stage categories was assessed for each version. Intraclass correlation coefficients were generated to assess agreement between parent–child dyads. Pearson correlation coefficients were calculated between CFQ-R scores and clinical outcomes (BMI, FEV1% predicted, and the number of IV antibiotic–treated exacerbations within the past year for each version).

Results

Among the 32,585 patients enrolled in the ESCF, 14,268 unique CFQ-Rs were completed by patients or their parents during a clinic visit. Of the completed CFQ-Rs, 9,566 (67.1%) were from visits classified as “well” and 3,871 (27.1%) from visits classified as “sick.” Another 831 (5.8%) were determined to be ambiguous and excluded from analysis. Therefore, 13,437 CFQ-Rs corresponding to 7,330 patients were included in the analysis: 4,679 teens and adults (ages ≥ 14 years), and 2,068 school-age children (ages 6–13 years). Additionally, 2,728 parents of school-age children (2,145 of whom were also included in the school-age children sample) completed proxy versions of the CFQ-R. Demographic and medical data for each sample are reported in Table 2. The demographic characteristics of this sample were similar to those reported in the national CF Foundation Patient Registry [26].

Table 2 Patient demographics and clinical characteristics by CFQ-R version

Descriptive statistics, floor and ceiling effects, and internal consistency

Means and standard deviations for each version of the CFQ-R (child, parent, teen/adult) are presented in Table 3. To examine potential floor and ceiling effects, distributions of scores (0–100 standardized values) were calculated for each domain and version of the CFQ-R. The percentage of respondents with scores less than or equal to 5 were considered at “floor” and those with scores greater than or equal to 95 were considered at “ceiling.” Minimal floor effects were found across the eight domains of the CFQ-R Child version, ranging from 0.1 to 4.8%, with lowest scores found in Social and Emotional Functioning. Ceiling effects ranged from 6.9 to 46.5% and were most prevalent in three domains: Digestion (37.2%), Body Image (39.6%), and Eating (46.5%). Minimal floor effects were found across the 12 domains of the CFQ-R Parent instrument, ranging from 0.0 (Vitality and Emotional Functioning) to 16.8% (Weight). Ceiling effects ranged from 3.1 to 50.3% and were most prevalent in four domains: Body Image (32.8%), Eating (50.3%), School Functioning (34.6%), and Weight (37.9%). Similarly, few floor effects were found in the teen/adult instrument, ranging from 0.0 to 14.5%. Ceiling effects ranged from 3.1 (Vitality) to 66.3% (Eating) and were most prevalent in two domains: Eating (66.3%) and Weight (47.4%).

Table 3 CFQ-R domains by version

Internal consistency coefficients for each domain were calculated using the Cronbach alpha, with a majority of scales demonstrating adequate reliability using a standard of 0.70, or 0.60 for newly developed scales [27]. Reliability for the majority of the CFQ-R Child scales was acceptable, with lower reliabilities found on the Treatment Burden and Social Functioning scales. The majority of scales in the Parent version also demonstrated acceptable reliability, with lower reliabilities found on the Treatment Burden and School Functioning scales. Good reliability was also found for the teen/adult measure, with lower reliabilities on the Treatment Burden and Social Functioning scales (see Table 3).

Discriminant validity

First, discriminant validity was evaluated by comparing CFQ-R scores completed during a “sick” versus “well” visit. For the Child version, 636 sick CFQ-Rs and 1,943 well CFQ-Rs were completed. Significant differences between the groups were found in all CFQ-R domains, with children who were sick consistently reporting worse HRQOL scores than children who were well (Student’s t test; data not shown). Similar results were found for the CFQ-R Parent version. CFQ-R scores for 980 sick visits were significantly lower than the 2,984 scores generated during a well visit across all domains. Finally, comparisons of 2,255 sick visit scores for the teen/adult sample were significantly different from the 4,639 well-visit scores. With the exception of the Digestion scale on the Teen/Adult version, all comparisons were statistically significant, with lower scores reported by those completing the measure during a sick visit; comparisons and effect sizes are reported in Table 4. Thus, CFQ-R scores on all scales and across all versions demonstrated strong discriminant validity.

Table 4 CFQ-R scores and effect sizes by visit type

Next, the ability of the CFQ-R scales to differentiate among levels of disease stage was evaluated using ANOVA with Tukey adjustments for all pairwise comparisons. Most scales were expected to vary by lung function, providing evidence of convergent validity. The Digestion scale was not expected to differ by lung function and, thus, provides a basis for comparison. Patients were categorized into four levels of disease stage on the basis of their FEV1% predicted score. Strong support was found for this hypothesis. Significant differences in CFQ-R scores were found between all levels of disease stage for a majority of CFQ-R Child scales, except Emotional and Social Functioning and Treatment Burden. For these domains, significant differences were generally found between those with normal or mild disease and those with severe disease (see Fig. 1). As hypothesized, scores on Digestion did not differ significantly by disease stage. Similar results were found for the CFQ-R Parent version, with significant differences found for a majority of domains except Emotional Functioning, School Functioning, Weight, and Treatment Burden. In general, these scales differentiated between patients with normal and mild disease versus those with severe disease. As expected, Digestion did not differ across levels of lung disease stage. For the CFQ-R Teen/Adult version, all scales differed significantly by disease stage, except for Eating Problems, Role Functioning, and Emotional Functioning, which differed significantly across all comparisons except that for normal versus mild disease. As hypothesized, Digestion scores did not differ significantly by any level of disease stage (see Fig. 1). Furthermore, significant linear trends across disease stage categories were observed for all CFQ-R scales and versions, except for Digestion (no trend for any version).

Fig. 1
figure 1

Mean CFQ-R score by disease severity.*For all significant linear trend was detected, except on the digestive symptoms scale

Gender differences in teens and adults with CF

Based on well-established differences in pulmonary functioning and survival between males and females with CF, with more rapid declines in pulmonary functioning and a shorter life span observed for women, gender differences were expected on the CFQ-R Teen/Adult instrument. Specifically, social pressures to be thin were expected to affect females differentially, leading to higher scores for them than for males on domains related to eating problems, body image, and weight. This hypothesis was partially supported; significant gender differences were found in the predicted direction on two of the three scales. Teen and adult males reported significantly better HRQOL in all CFQ-R domains except Body Image and Weight, in which females reported better scores (Table 5).

Table 5 CFQ-R scores and effect sizes by gender

Agreement between parent–child dyads on the CFQ-R

Intraclass correlation coefficients were calculated between parent–child dyads to assess the extent of agreement (P values < 0.001, unless otherwise noted). Paired correlations indicated significant agreement between dyads across all scales (r’s, 0.26–0.56); however, stronger agreement was found on domains that measured more observable signs and symptoms, such as Physical Functioning (r = 0.46), Eating Problems (r = 0.56), and Respiratory Symptoms (r = 0.55). Lower agreement was found for Emotional Functioning (r = 0.28) and Treatment Burden (r = 0.26), with children reporting significantly worse Emotional Functioning than parents; in contrast, parents, who are primarily responsible for daily treatments, reported worse Treatment Burden than children.

Associations between CFQ-R scales and health outcomes

To evaluate the construct validity and clinical utility of the CFQ-R, correlations were calculated between CFQ-R scores and several key health outcomes, including FEV1% predicted, number of pulmonary exacerbations treated with IV antibiotics, and BMI. Given the large sample size, only correlations with magnitude of 0.20 or higher were reported. For children, moderate associations were found between the CFQ-R Physical Functioning and Respiratory Symptoms domains and FEV1% predicted, with r’s of 0.33 and 0.32, respectively. A similar pattern was observed for Physical Functioning and Respiratory Symptoms and number of IV-treated exacerbations, with r’s of −0.23 and −0.24, respectively. The CFQ-R Eating scale was also modestly correlated with BMI (r = 0.22). In all cases, better HRQOL was correlated with better health outcomes.

Similar patterns were found for the CFQ-R Parent version. Significant associations were found between the following CFQ-R scales and FEV1% predicted: Physical Functioning (r = 0.38), Respiratory Symptoms (r = 0.33), Health Perceptions (r = 0.30), and Body Image (r = 0.22). Number of exacerbations was also significantly correlated with Physical Functioning (r = −0.34), Respiratory Symptoms (r = −0.28), and Health Perceptions (r = −0.27). Finally, BMI was significantly associated with Weight (r = 0.42) and Eating Problems (r = 0.24). In all cases, higher CFQ-R scores were associated with better health outcomes.

For the CFQ-R Teen/Adult measure, strong associations were found between the following CFQ-R scales and both FEV1% predicted and number of exacerbations, respectively: Physical Functioning (r’s = 0.51 and −0.35), Role Functioning (r’s = 0.29 and −0.32), Vitality (r’s = 0.26 and −0.25), Health Perceptions (r’s = 0.40 and −0.30), and Respiratory Symptoms (r’s = 0.42 and −0.27). Significant associations were also found between FEV1% predicted and Social Functioning (r = 0.25), Body Image (r = 0.29), and Weight (r = 0.29). Moderate associations were found between BMI and Body Image (r = 0.34) and Weight (r = 0.44). In all cases, higher CFQ-R scores were correlated with better health outcomes.

Discussion

Overall, the results of this study supported the psychometric properties of the CFQ-R, a PRO measure that can be completed by patients across the life span from early childhood through adulthood, with a proxy report for parents of school-age children. We tested this instrument using a national sample that was similar to the CF Foundation Patient Registry data in terms of demographic and health variables, suggesting that our results are generalizable to the larger US community of CF patients. Minimal floor and ceiling effects were observed on a majority of scales, suggesting that the items cover a broad range of functioning, which allows for differentiation among patients. Scores on the CFQ-R discriminated between patients attending the sites for a sick visit versus a well visit and between patients who differed in lung function. Little evidence of “response shift” was observed: patients who had more severe pulmonary disease reported worse symptoms and greater impairments in daily functioning. Further, the CFQ-R scales were significantly associated with relevant health outcome variables (e.g., lung function, BMI).

As expected, teen and adult females reported worse HRQOL in a majority of domains when compared to males. This may be partially related to differences in health outcomes between women and men with CF, with women experiencing greater morbidity and earlier mortality. However, women reported better functioning on scales that measured body image and weight, which could be related to a societal preference for thinness in women, regardless of the health consequences of low body weight in CF. This suggests that targeted education on the links between poor nutritional status and worse lung function should be emphasized because of nutrition’s effects on long-term health. Clinicians may need to more directly address issues related to eating and weight gain in young women with CF.

Parents and children demonstrated good agreement on symptoms and behaviors that were observable, such as eating, ability to climb stairs, and frequency of cough. These results parallel those found in the child psychology literature, which have consistently found higher agreement on externalizing behaviors, such as oppositional behavior, than internalizing symptoms of depression [28, 29]. In the recently published FDA Guidance, parent proxy measures were recommended only for symptoms and behaviors that can be observed by parents [1]. Our findings, and those reported in Belgium and Germany [30, 31], also show that children reported worse emotional functioning than their parents were aware of, suggesting that measuring both respondents’ perceptions of HRQOL is important. The CFQ-R is one of the few PRO tools with a validated version for children. We are currently developing a pictorial version of the CFQ-R for preschoolers (3–6 years) with CF [32].

This study had several limitations. First, the study coordinators at each site were responsible for entering the data into an electronic data capture system; however, there was no source verification. Second, although good discriminant validity was demonstrated across CFQ-R versions, the Child instrument was limited by less variability in disease stage as measured by lung function: few school-age children had FEV1 below 70% predicted [33]. Finally, although the means and standard deviations showed a reasonable distribution of responses and good discrimination between sick and well visits, ceiling effects were observed on the Eating Problems scale across the three versions. In general, both patients and parent caregivers tended to report minimal eating problems and relatively good HRQOL in this domain.

Our results demonstrated strong convergence between CFQ-R scales and key health outcomes. For example, good correlations were found between the CFQ-R Respiratory Symptoms scale and FEV1% predicted and number of IV antibiotic courses. Convergent validity was found between CFQ-R scales and both pulmonary and nutritional outcomes, indicating that these items accurately reflect their underlying constructs.

Newly available descriptive statistics (Online Resource appendices 1–4) have been calculated to provide clinicians with values that can be used to compare individual patients to this large, national sample, which should greatly facilitate interpretation of these scores. For patients with CFQ-R forms during “well” visits, score deciles were generated for each CFQ-R domain, stratified by disease severity (FEV1% predicted), gender, and age group (teens/adults and children). For example, providers can look up the normative CFQ-R values for an adolescent female with mild lung disease obtained during a “well” visit and compare her scores to a similar patient being seen that day. If that young woman scored, for instance, at the 10th percentile of same-age peers on the Eating Problems and Body Image domains, a referral to the dietitian on the multidisciplinary team could be made. Further, if a new medication is added to the CF treatment regimen, clinicians can monitor potential improvements in respiratory or digestive symptoms on the CFQ-R, as well as its impact on perceptions of treatment burden.

In conclusion, this measure has demonstrated robust psychometric properties in a large national sample of patients with CF and their parents. Strong evidence of reliability, discriminant validity, and convergence with pre-specified health outcomes was found. It is also one of the few PRO measures with developmentally appropriate versions across the life span.