Content validity of a health science evidence-based practice questionnaire (HS-EBP) with a web-based modified Delphi approach. Int J Qual Health Care. 2016 Dec 1;28(6):764-773. doi: 10.1093/intqhc/mzw106. Int J Qual Health Care. 2016. PMID: 27655793
In the context of intense interest in identifying what works in mental health, we sought to establish a consensus on what doesnot work—discredited psychological assessments and treatments used with children and adolescents. Applying a Delphi methodology, we engaged a panel of 139 experts to participate in a two-stage survey. Participants reported their familiarity with 67 treatments and 35 assessment techniques and rated each on a continuum from not at all discredited to certainly discredited. The composite results suggest considerable convergence in what is considered discredited and offer a first step in identifying discredited procedures in modern mental health practice for children and adolescents. It may prove as useful and easier to identify what does not work for youth as it is to identify what does work—as in evidence-based practice compilations. In either case, we can simultaneously avoid consensually identified discredited practices to eradicate what does not work and use inclusively defined evidence-based practices to promote what does work. Which psychological tests and assessment procedures give us the most accurate data when assessing child and adolescent clients? Which psychotherapies consistently prove effective in treating the conditions we diagnose? The era of evidence-based practice (EBP) has inundated clinicians with lists of best practices, treatment guidelines, empirically supported therapies, practice guidelines, and reimbursable procedure codes. Dozens of compilations now offer, to varying degrees, evidence-based methods to employ with youth (e.g., American Psychiatric Association, 2006; Hersen & Sturmey, 2012; LeCroy, 2008; Rubin, 2011; Spirito & Kazak, 2005; Weisz & Kazdin, 2010). All are noble attempts to identify and disseminate what works in mental health. At the same time, relatively little attention has focused on identifying ineffective treatments and invalid tests for youth. That is, what does not work beyond the passage of time alone, expectancy, base rates, or credible placebo. In those clinical circumstances when few validation studies or few randomized clinical trials exist, how can we, as practitioners, educators, and as an entire discipline, draw a line between methods that enjoy the confidence of the experts and those that experience widespread skepticism? Several authors have attempted to identify pseudoscientific, unvalidated, potentially harmful, or “quack” psychotherapies (e.g., Carroll, 2003; Della Sala, 1999; Eisner, 2000; Lilienfeld, 2007; Lilienfeld, Lynn, & Lohr, 2003; Singer & Lalich, 1996), including those for select youth disorders (e.g., Jacobson, Foxx, & Mulick, 2005). Parallel efforts have focused on identifying assessment measures of questionable validity on psychometric grounds (e.g., Hunsley, Crabb, & Mash, 2004; Hunsley & Mash, 2005). These pioneering efforts suffered from at least two prominent limitations. First, none of the efforts systematically relied on expert consensus to reach their conclusions. Instead, the authors assumed that a professional consensus already existed, or they selected entries on the basis of their own opinions. Second, these authors provided little logical differentiation between credible and noncredible treatments and between validated and unvalidated tests. This demarcation problem (Gardner, 2000)—the challenge of formulating sharp distinctions between validated and unvalidated—lead to rather crude and dichotomous judgments. Previous efforts were often less than systematic. We took a different tack to identifying discredited procedures in mental health. We chose to conduct Delphi polls of mental health experts to secure a consensus and to establish more refined characterizations of treatments and tests ranging along a continuum from not at all discredited to certainly discredited. Having previously focused on discredited procedures in both adults and the addictions (Norcross, Koocher, Fala, & Wexler, 2010; Norcross, Koocher, & Garofalo, 2006), we sought in this study to apply the same tack to identify discredited assessment and treatment methods used with youths. We searched broadly to collect nominations for discredited mental health treatments and tests via literature searches, electronic list requests, and peer consultations. Our inclusion criteria included treatments and tests used professionally for mental health purposes during the past 100 years in the United States or Western Europe. Exclusion criteria were controversial theories of psychology that did not directly involve mental health (e.g., maternal employment as a cause of child maladjustment, intrauterine learning), unusual phenomena regarding youth (e.g., imaginary playmates, extra sensory perception) that have not yielded pertinent treatments, treatments or assessments that have never found advocacy in mental health (e.g., astrology, numerology), medications or biochemical substances (including conventional, herbal, naturopathic, or homeopathic preparations), and practices used primarily outside the United States and Western Europe. Using these criteria, we compiled and listed separately 59 candidate treatments and 30 candidate assessment procedures on a questionnaire. In the interest of inclusiveness, we listed all nominations received, even though some of the methods have acquired a body of published peer-reviewed support. The poll listed the 89 therapy and assessment methods and asked each participant to rate them using a 5-point Likert-type scale (per instructions provided next). Items were presented alphabetically and with reference to a particular purpose or condition. For example, “acupuncture for treatment of childhood mental/behavioral disorders” and “applied kinesiology for treatment of ADHD” were listed under the treatment section, and “anatomically detailed dolls for use in diagnosing child sexual abuse” and “Blacky Test for personality assessment with children” were listed under the test section. In October 2012, we invited approximately 150 doctoral-level mental health experts to participate in our Delphi poll using personalized e-mail messages. The adjective “approximately” references the fact that in order to comply with antispam policies we contacted each potential participant individually to ensure that we had an accurate e-mail address and that the person would entertain our solicitation. This involved probing more than 150 potential e-mail addresses to connect electronically with the participants. All were mental health professionals with demonstrated expertise in working with children and adolescents. Once 150 valid addresses were confirmed, we solicited those identified to serve on our panel. Specifically, we invited
Many of the invited participants met more than one selection criterion but are categorized here using the initial mechanism for identifying them. The 139 who ultimately agreed to participate were sent a link to a SurveyMonkey online questionnaire. All 139 completed and submitted the first round of the questionnaire. Following the standard Delphi procedure, our panel of experts answered the same items twice. In the first round, the experts answered the questions anonymously and without knowledge of the responses of their peers. During subsequent rounds, the experts were provided with anonymous data summarizing the responses of the entire panel and were given the opportunity to revise their ratings in light of the group judgment. The accuracy of probability forecasts increases over Delphi rounds, up to the second round (Ascher, 1978; Martino, 1972), and when statistical summaries are provided to the experts (Rowe, Wright, & McColl, 2005). Following the initial mailing and a subsequent reminder to the 139 responders to round one, we received 67 responses to Round 2. The response rate to Round 1 was 93% (139/150) and the Round 2 response was 48%. The experts were primarily child psychologists living in the United States. Eighty-five percent described themselves as child/adolescent psychologists and 8% as child/adolescent psychiatrists. More than half (53%) reported earning board certification in their specialty, 65% authored or edited a book in child mental health, and 65% served as an editorial board member of a peer-reviewed journal in child psychology or psychiatry. Approximately one third (37%) served as an editor or associate editor of a peer-reviewed journal in child psychology or psychiatry, and 55% currently or previously held peer-reviewed grant funding in child psychology or psychiatry. These categories are not mutually exclusive, of course; many panelists fit more than one. The average number of years of clinical experience was 26.3. Women accounted for 41% of respondents. In terms of ethnicity, participants providing their race/ethnicity characterized themselves as follows: 57 White/Caucasian; three Black/African American; and one each Native American, Hispanic, Asian American, and other. We presented the following instructions to the panelists when they linked to the SurveyMonkey site: For the purpose of this Delphi poll of experts, we operationally define discredited treatments and tests as those unable to consistently generate treatment outcomes (treatments) or valid assessment data (tests) beyond that obtained by the passage of time alone, expectancy, base rates, or credible placebo. Our use of the term “discredited” subsumes ineffective and detrimental interventions but forms a broader and more inclusive characterization. We are interested in identifying disproven practices. Please rate the extent to which you view the treatment or test as discredited along a continuum from “not at all discredited” to “certainly discredited.” A treatment or assessment tool can be discredited according to several types of evidence: peer-reviewed controlled research, clinical practice, and/or professional consensus. Please think in terms of the criteria for expert opinions as delineated in well-known court decisions such as Daubert v. Merrell Dow Pharmaceuticals (1993) or Kumho Tire Co. v. Carmichael (1999). In these cases the federal courts cited factors, such as experimental testing, peer review, error rates, and acceptability in the relevant scientific community, some or all of which might prove helpful in determining the validity of a particular scientific theory or technique. We use a 5-point, Likert-type format with the following ratings: 1- Not at all discredited, 2 - Not likely discredited, 3 - Possibly discredited, 4 - Probably discredited, and 5- Certainly discredited. If you cannot make a rating because of unfamiliarity with the treatment or test, then kindly check the not familiar with treatment/test column. If you lack familiarity with the treatment/test's research or clinical use, then kindly check the not familiar with research or clinical use column. You may also mark both. If experts indicated that they were unfamiliar with a particular test or treatment, they could not numerically rate it. In this way, ratings were contributed by only those professionals who felt sufficiently cognizant of the procedure and its evidence base. Our Delphi poll results are summarized in Tables and , which display the results from both rounds for treatments and assessments, respectively. The data in the tables are ranked in descending order from those regarded as most likely to least likely discredited in the second round of ratings (i.e., from high to low in column 6). The tables display the mean ratings and standard deviations of each item for both rounds, along with the percentage of panelists indicating unfamiliarity with the particular method. Download CSVDisplay Table Download CSVDisplay Table As expected with consensus-building procedures, the mean ratings in Round 2 tended toward less variability than in Round 1. Only 18 of the standard deviations for the 89 items (11 tests, seven treatments) evidenced an increase from Round 1 to Round 2. The panelists developed a greater consensus in their ratings on what comprised discredited procedures. Before proceeding to the treatment and assessment methods judged by the panel as discredited, we should note that several of those proposed as such in the public literature or in private discussions did not merit such condemnation according to the expert consensus. We would characterize as not discredited those methods receiving mean ratings in the second round between 1.0 and 2.5. Among the assessment methods were the Balthazar Scale of Adaptive Behavior, Bender Visual Motor Gestalt Test, Connor's Symptom Checklist, Devereaux Child Behavior Checklist, Finger Localization Test, Jesness Inventory, Raven Standard Progressive Matrices, Tactile Localization Test, Vineland Adaptive Behavior Scales, and Wepman's Auditory test for the uses or purposes stated on the questionnaire. The expert panelists were not necessarily recommending these tools for their identified purpose or other purposes, but as of 2012 they did not consider them as discredited. Among the psychological treatments, the not discredited methods included Communication Cards to improve social skills, Picture Exchange Communication, System Self-Control Training for treatment of ADHD, and the TEACCH approach for treatment of autism. Table presents the assessment tools and tests in ranked order from the most to least discredited according to the expert panel. Eleven earned mean ratings above 4.25 on the 5-point scale (4 = probably discredited, 5 = certainly discredited). The expert consensus held that enneagrams, Szondi Test, Brain Balance, biorhythms, Hand Test, handwriting analysis, Animal Naming Test, Fairy Tale Test, Blacky Test, IQ test scale scores, and Holtzman Inkblot Test were discredited for their purported assessment uses among children and adolescents. Forty-two of the 59 listed treatment methods received average ratings above 4.25 in the second round. There was greater unanimity in the proportion of candidates treatments considered discredited than in the assessments. Table presents them in ranked order according to the Round 2 mean rating. Those receiving the highest discredited ratings (4.88 or higher) were magnet therapy, rebirthing therapy, past life regression therapy, crystal healing, Kirlian therapy, penduluming, Bio-Ching, JoyTouch, withholding food/water, aura therapy, Orgone Therapy, Astrotherapy, Conversion Therapy/Reparative Therapy for adolescent homosexuality, and Triggering Anger Therapy for treatment of reactive attachment disorder. We designed our research to identify a professional consensus concerning discredited assessment and treatment methods for youth. We conducted a Delphi poll to secure such a consensus and to establish more refined characterizations of treatments and tests. The results do suggest a continuum from not discredited to certainly discredited, according to our experts. It is useful to identify and avoid those practices professionally judged as ineffective, perhaps even detrimental; it is also useful to delineate those that are not. One person's opinion in a review article or book chapter does not constitute a collective or definitive judgment, nor is professional consensus a guarantee of truth. But insofar as science is a process and product of collecting replicated provisional “facts,” expert consensus is probably superior to individuals’ judgment. Even experts can be wrong, but less so than single practitioners and especially those marketing mental health practices. Despite the presence of a number of projective personality instruments in the “top eleven,” a number of other more popular or better known projective tools—such as the Children's Apperception Test, Roberts Apperception Test, and Rorschach Inkblot technique—drew less harsh ratings. In addition, some objective or inventory-style assessments, including the Connor's Symptom Checklist for diagnosing ADHD, the Devereaux Child Behavior Checklist for assessing ADHD, and the Vineland Adaptive Behaviors Scales, were rated by the experts as “not likely discredited.” The instruments faring worst tended to be those with obsolete or sparse research and those relying on narrow theoretical approaches (e.g., the Blacky Test, which relies on Freud's psychosexual theory of development for construct validity) or those with highly suspect theoretical underpinnings (e.g., the Szondi Test's foundation in “hereditobiology”). Many of the treatment and assessment methods condemned as “discredited’ by the experts maintain current adherents, as quick Internet searches will reveal. For example, one can become “board certified” as a “past life therapist” (http://www.ibrt.org/). At least one of the techniques rated as likely discredited, rebirthing therapy, has led to a documented patient death and a state law in Colorado banning its use (Josefson, 2001). When we urge fellow practitioners to refrain from using or teaching the methods consensually judged as discredited, we are frequently met with two immediate protests: “What do those experts know!” and “But it worked for (me, my client, my aunt).” Regarding the former, we remind protesters that dispassionate experts are imperfect, but less imperfect than biased individuals, and that they did excuse themselves from rating those methods with which they were unfamiliar. Regarding the latter, we acknowledge that all assessments and treatments will indeed appear to work for some clients some of the time, due to chance, time, accident, or placebo. The proper comparison is to outperforming chance and placebo, not whether a method happens to succeed on occasion. The collective results remind us that “old (professional) habits die hard.” Many practitioners adhere to favored theories and treasured methods in which they were originally trained in graduate school. Both historical analysis (Kuhn, 1962) and empirical research (e.g., Neimeyer, Taylor, & Rozensky, 2012) suggest that the accelerating profusion of knowledge will probably translate into shorter durability of current knowledge. A recent Delphi poll indicated that the half-life of knowledge in professional psychology is expected to decrease within the next decade from nearly 9 years to just over 7 years (Neimeyer et al., 2012). Readers should bear in mind both practical and conceptual constraints when interpreting our results. On the practical side, our panel consisted of psychotherapists living and working in the United States; generalizations regarding the perspectives of experts in other countries are unwarranted. Second, our sample was largely composed of seasoned, doctoral-level psychologists and psychiatrists. Other professions or practitioners with different credentials may not share the same perspectives. Third, the response rate to the first round was high (139/150), but the response to the second round was less so (67/139). We cannot rule out the potential of an unknown response bias. Fourth, we acknowledge that by not surveying experts in pseudoscientific interventions per se, the conclusions reached in this study may not reflect their particular consensus. It is possible that some experts in child and adolescent psychology and psychiatry know little about the “dark side” of their profession. Finally, many of the items had an even lower number of raters because the panelists could indicate that they were unfamiliar with the method and thus did not contribute to the mean rating. We cannot say whether the experts’ lack of familiarity with some procedures altered the final ratings, although one would expect experts to possess familiarity with widely respected and widely shunned practices as standard of care knowledge. On the conceptual side, the experts’ ratings addressed particular uses or purposes of the assessment or treatment method. The validity is therefore conditional; usefulness is purpose-and context-specific. A therapy method considered discredited for youth might be considered more credible for another purpose or with a different population. The experts’ theoretical orientations, which we did not assess, might also potentially impact their ratings. One might reasonably suspect that, say, a psychodynamic psychologist would respond more favorably to the credibility of projective devices than, say, a cognitive-behavioral psychologist. And these consensus ratings may well change with the passage of time and the publication of new research. Several of today's mainstream treatments and tests may be regarded as discredited 30 years from now, and several of those characterized as discredited in 2012 may emerge as EBP within a decade. Psychological science should strive to be vigilant and self-correcting. Yet these results leave us feeling encouraged. Psychology, in its scientific base, relies on evidence, and the discipline is making progress in differentiating science from pseudoscience, EBPs from discredited practices. We ardently hope that our Delphi poll sparks a broader, overdue discussion within the profession about discredited practices in working with some of our most vulnerable populations. The risk to patients and practitioners in using discredited procedures is real; as Voltaire (1765) wrote in Questions sur les miracles: “Those who make you believe absurdities can make you commit atrocities.” It may prove as useful and easier to identify what does not work for youth (as in this study) as it is to identify what does work (as in the EBP compilations). In either case, we can simultaneously avoid (consensually identified) discredited practices to eradicate what does not work and use (inclusively defined) EBPs to promote what does work.
We express our gratitude to all the participants in our Delphi poll and take pleasure in acknowledging those who authorized us to share their names. They include Thomas Achenbach, Anne Marie Albano, Cindy Anderson, Barry Anton, Glen P. Aylward, Russell Barkley, Jeffery E. Barnett, William Bernet, Steve Boggs, Susan Campbell, Monit Cheung, Ann Davis, Andres De Los Reyes, Dennis D. Drotar, Mina Dulcan, Sheila M. Eyberg, Frank R. Ezzo, Kurt Freeman, Daniel Hiliker, Yo Jackson, Daphne Keen, Kristin Kutash, John Lavigne, Adam Lewin, Katherine A. Loveland, Eric Mash, Elizabeth McQuaid, Thomas Ollendick, Tonya Palermo, Brenda Payne, Mitch Prinstein, Cecil Reynolds, Stephen Shirk, Jennifer Shroff Pendley, Wendy Silverman, Douglas Tynan, Abby Wasserman, Robert Weis, Linda Wilmhurst, and Keith W Yeates.
|