Original articleClinical relevance vs. statistical significance: Using neck outcomes in patients with temporomandibular disorders as an example
Introduction
Most of the results of research in general and health research have used statistical significance in order to demonstrate effectiveness of an intervention, differences among groups in some variables of interest, or associations between variables. Statistical significance is based on hypothesis testing (Kirk, 1996). The null hypothesis states that there is no difference between groups or that an independent variable does not have an effect on the dependent variable. The alternative hypothesis states that groups are different or that an independent variable does have an effect on the dependent variable. After conducting the research, the statistical analysis provides one with the “p” value which indicates the strength of the evidence against the null hypothesis. Thus, statistical significance analysis only provides a dichotomous answer: it may or may not be statistically significant (in other words we have enough evidence against the null hypothesis or not) (Sterne and Smith, 2001). Therefore, statistical significance does not offer an indication of how important the result of the study is (Thompson, 1999, Ogles et al., 2001, Millis, 2003).
Statistical significance can also provide misleading results. A statistical difference between groups could be found if the sample size is large and if the intersubject variability is low, even though the difference between groups is small to be considered clinically important (Millis, 2003). Some authors have argued that tests of statistical significance are not generally useful and instead confidence intervals (CIs) and measures of effect size should be the main focus of research findings since they can provide more complete information regarding the magnitude of the association between variables, changes after a treatment, or differences between groups (Olejnik and Algina, 2000, Sterne and Smith, 2001). For example, CIs contain all of the information provided by a significance test in addition to a range of values within which the true difference is likely to lie. This information facilitates understanding of the “magnitude of the effect” by researchers and clinicians and offers a richer source of information in addition to the simple yes/no dichotomy of hypothesis testing (McNeely and Warren, 2006).
A result can be clinically relevant but might be neglected if statistical significance was not attained due to small sample sizes and high intersubject variability. Clinical relevance (also called clinical significance) assessment indicates whether the results are meaningful or not. In this way the evaluation of clinical relevance can provide more interesting results for health care clinicians as well as clients receiving care, facilitating the transfer of knowledge into clinical practice (Musselman, 2007). Some authors in the areas of education (Kirk, 1996, Carnine, 1997) as well as health research (Millis, 2003, Musselman, 2007) have urged that research findings be reported in language that is familiar to practitioners. With the advancement of health care and the introduction of evidence based practice, researchers need to provide information regarding their research that can be used in clinical practice and demonstrate an impact in health care and clinical decisions. The information of “p” values is insufficient to achieve these requirements and because it provides insufficient and limited information, clinical researchers needed to present the clinical relevance of their results to help busy clinicians with interpretation.
Some methods to determine clinical relevance have been created in order to provide clinicians, clients and policy makers with standards of meaningful change. The most common and used methods to determine clinical relevance are “distribution-based methods” and “anchor-based methods”. Distribution-based methods are based on the statistical distribution and the psychometric properties of the outcomes. The calculation of the effect size, the minimal important difference (MID), and the standard error of measurement are examples of distribution-based methods to evaluate clinical relevance. Anchor-based methods involve the clients’ perspective in the assessment of clinical relevance and are used prospectively.
Clinical relevance is generally evaluated as a result of an intervention; however, clinical relevance can also be assessed in other types of research such as cross-sectional studies. In these studies, patients and controls are assessed on certain variables of interest and it is important to know if the differences found between groups are in fact clinically meaningful. The interpretation of a score on a certain outcome in a cross-sectional study is performed by comparing the values obtained with those found in a reference population. Unfortunately, most of the outcomes used in clinical research lack “normative or reference values” to establish “normality” of health status. Thus the interpretation of clinical relevance for results in this type of research is uncertain and difficult to make for the general practitioner. Therefore, other methods for assessing clinical relevance in cross-sectional studies need to be used in the absence of normative values for the outcomes of interest. In addition, information regarding clinical relevance for neck outcomes is lacking and clinicians have difficulty to interpret results from research studies. Thus, the objectives of this paper were (1) to explore and analyze different methods to evaluate the clinical relevance of the results using a cross-sectional study as an example comparing different neck outcomes between subjects with temporomandibular disorders (TMD) and healthy controls and (2) to discuss different issues regarding clinical relevance and statistical significance when interpreting these results.
Section snippets
Sample data
The data used for this example was obtained from a large study investigating the involvement of cervical spine in patients with TMD. Details regarding this study are described elsewhere (Armijo-Olivo, 2010, Armijo-Olivo et al., 2010b). The general description of the sample is as follows.
Results
Mean differences and 95% confidence intervals between groups in the variables of interest, as well as values for clinical relevance based on different methods (i.e. effect size, and MID) are described in Table 1, Table 2, Table 3.
A summary of the specific results of each one of the variables is as follows.
Discussion
This study shows an example of how to evaluate the clinical relevance of research results using data obtained from a cross-sectional study comparing several outcomes used for evaluating neck musculoskeletal functioning in patients with TMD when compared with healthy subjects. The results of this study show that it is possible to have statistical significance without having clinical relevance, to have both statistical significance and clinical relevance, to have clinical relevance without having
Conclusion
The evaluation of clinical relevance in clinical research is crucial to simplify the transfer of knowledge from research into practice. Clinicians and researchers need to be aware of the importance of the research results and should abandon the only simplistic approach of statistical significance interpretation. This paper encourages researchers to assess and present the clinical relevance of their research results in addition to the statistical significance analysis. In addition, editors of
Acknowledgements
Alberta Provincial CIHR Training Program in Bone and Joint Health, Canadian Institutes of Health Research, Government of Chile (MECESUP Program), University Catholic of Maule, Physiotherapy Foundation of Canada through an Alberta Research Award and the University of Alberta. Authors would like to thank Martha Funabashi, Larissa Costa, and Anelise Silveira for their constructive feedback.
References (44)
- et al.
Reduced endurance of the cervical flexor muscles in patients with concurrent temporomandibular disorders and neck disability
Manual Therapy
(2010) - et al.
Is maximal strength of the cervical flexor muscles reduced in patients with temporomandibular disorders?
Archives of Physical Medicine and Rehabilitation
(2010) Upper cervical spine flexor muscles: age related performance in asymptomatic women
Australian Journal of Physiotherapy
(1994)- et al.
Validity of a measure of the frequency of headaches with overt neck involvement, and reliability of measurement of cervical spine anthropometric and muscle performance factors
Archives of Physical Medicine and Rehabilitation
(2000) - et al.
The reproducibility of natural head posture: a methodological study
American Journal of Orthodontics and Dentofacial Orthopedics. Official publication of the American Association Of Orthodontists, its constituent societies, and the American Board Of Orthodontics.
(1988) - et al.
The value of RCT evidence depends on the quality of statistical analysis
Behaviour Research and Therapy
(2008) Measuring the endurance capacity of the cervical short flexor muscle group
Australian Journal of Physiotherapy
(1994)- et al.
Methods to explain the clinical significance of health status measures
Mayo Clinic Proceedings
(2002) Effect magnitude: a different focus
Journal of Statistical Planning and Inference
(2007)- et al.
Neck muscle endurance, self-report, and range of motion data from subjects with treated and untreated neck pain
Journal of Manipulative and Physiological Therapeutics
(2005)
Three methods for minimally important difference: no relationship was found with the net proportion of patients improving
Journal of Clinical Epidemiology
Impaired health status, sleep disorders, and pain in the craniomandibular and cervical spinal regions
European Journal of Pain
Clinical significance: history, application, and current practice
Clinical Psychology Review
Measures of effect size for comparative studies: applications, interpretations, and limitations
Contemporary Educational Psychology
Reliability of a clinical test for deep cervical flexor endurance
Journal of Manipulative and Physiological Therapeutics
Age- and sex-specific reference values of a test of neck muscle endurance
Journal of Manipulative and Physiological Therapeutics
Relationship between cervical musculoskletal impairments and temporomandibular disorders: clinical and electromyographic variables, faculty of rehabilitation medicine
The association between neck disability and jaw disability
Journal of Oral Rehabilitation
Making subjective judgments in quantitative studies: the importance of using effect sizes and confidence intervals
Human Resource Development Quarterly
Cited by (108)
Motor alterations along the kinetic chain in amateur volleyball and handball athletes with shoulder pain: An observational comparative study
2024, Journal of Bodywork and Movement TherapiesRotator cuff isometric exercises in combination with scapular muscle strengthening and stretching in individuals with rotator cuff tendinopathy: A multiple-subject case report
2024, Journal of Bodywork and Movement TherapiesCervical sensitivity, range of motion and strength in individuals with shoulder pain: A cross-sectional case control study
2023, Musculoskeletal Science and PracticeThe Effect of Upper Cervical Mobilization/Manipulation on Temporomandibular Joint Pain, Maximal Mouth Opening, and Pressure Pain Thresholds: A Systematic Review and Meta-Analysis
2023, Archives of Rehabilitation Research and Clinical TranslationOptimizing the question
2023, Handbook for Designing and Conducting Clinical and Translational Surgery