Things We Don't Know: How reliable is psychological science?

Things We Don't Know Anymore

TWDK Psychology doodle copyright Giles Meakin / Things We Don't Know CIC

Our psychology editor Malte Elson explores the “replication crisis”, and questions our level of confidence in established psychology. Image credit: Things We Don't Know / Giles Meakin (CC-BY)

The last few years haven’t been easy on psychological science. Don’t get me wrong – the field in itself is flourishing, boasting an ever-increasing number of publications, journals, conferences, faculty positions, and university graduates all over the world. It has gained more and more respect and acceptance, both in academia and society. The case of Harvard evolutionary biologist and primate researcher Marc Hauser’s fraudulent publications was already fading from our minds when in September 2011, the discovery of the scientific misconduct by the Dutch social psychologist Diederik Stapel shattered the grounds of psychological science. In at least 50 cases of scientific fraud that have been discovered by the Levelt Committee, Stapel had doctored, mangled, and completely fabricated datasets to successfully publish in the field’s top-ranked outlets - up to the most prestigious journals like Science. Among Stapel’s highly regarded publications were findings on how untidy environments encourage racist discrimination^[1], or how to reduce racist biases in judges' legal decisions on minority defendants^[2]. Nullifying the content of these publications constitutes a setback for social psychology, and - to a somewhat lesser extent – society overall.

Although they work in a highly competitive environment, we trust scientists to be committed to finding the truth. And when playing it smart, like Stapel, it is quite easy to abuse this trust for personal gain in the form of a prestigious academic career. Instead of looking for the truth, Stapel was on a quest for aesthetics, for beauty, as he was quoted saying by the New York Times. One might think that it’s not that much of an issue - Stapel got caught after all! Reaching for the stars he committed fraud, but got brought back down to reality when his deeds were unveiled, so the system works. But does it really?

There is good reason to believe that severely fraudulent behaviors, such as the fabrication of data, as in the Stapel affair, are extremely rare among scientists of all disciplines, including psychology. Not only does it require enormous efforts to maintain this fragile house of cards for a longer period, but in case one gets caught, the scientific community will most likely not pardon the fraud. However, there is a large gray area of questionable research practices (QRPs) that have a higher acceptance in the community, and that a considerable number of scholars engage in: while fabricating data, indeed, seems to have a relatively low prevalence (and is never perceived as justified), according to a survey on over 2,000 researchers^[3], failing to report all dependent measures or experimental conditions, selectively reporting studies that “worked”, or even stopping data collection after achieving a desired result are surprisingly common in psychological science. Does that mean psychology as a whole is looking for beauty over truth?

Tuning the 12 string by Marcus Holland-Moritz

Questionable research practices and a methodological flexibility allows tweaking data just a little to make them sound “sexy”. Image credit: Marcus Holland-Moritz (CC-BY)

That is, of course, a rather grim reading of these findings, and probably an unfair comparison to make given the differences in scale between not reporting some of the measures of a larger experiment and completely making up data. But there is a kernel of truth to this statement: Psychology evidently prefers “positive” over “negative” or null findings. According to Fanelli^[4], approximately 90% of all empirical papers published in psychology’s top journals are in support of a tested hypothesis, and the number seems to be increasing even further over the years⁵. This led Arina Bones⁶, the satirical “alter ego” of social psychologist Brian Nosek, to humorously remark that given the almost clairvoyant abilities of psychologists, the empirical testing of hypotheses seems to be largely unnecessary. Of course, there are at least two substantial consequences of this “aversion to the null”: The empirical literature in psychology suffers from a massive publication bias, also called the “file drawer problem”, as studies yielding null findings very often end up in the researchers’ file drawers instead of academic journals^[7]. Even more serious is the fact that psychologists, pressured to publish if they don’t want to “perish” (i.e., fail in their career), aim to achieve statistical significance in their research – and sometimes maybe a little too hard. Current practices that are commonly employed entail a sufficient “methodological flexibility” that allows tweaking statistics just a little so that, more often than not, studies “pass” the significance test which, in turn, dramatically increases their chance of getting published⁸. Ioannidis' warning⁹ that most published research findings are false might be particularly true for psychology.

Academics from other disciplines seem to be genuinely surprised when they hear about this current state of affairs in psychology, given that there is an obvious solution to the problem: replication. With a sufficient number of replications, it should be easy to discern reliable findings from “false positives” (results “made” statistically significant when they are actually not). However, psychologists have only rarely engaged in systematic replication efforts, at least historically. According to Makel, Plucker, and Hegarty^[10], the replication rate in psychology’s empirical literature is only about 1%, just half of which comes from researchers not working at the original laboratory. Again, one apparent problem is that many psychological journals don’t publish replications, thus making it quite unattractive to spend any resources on replication efforts. But it’s not that simple.

"mirror mirror" by Tanya Hart (CC-BY) via Flickr

Replication is an important tenet in science, as it provides confidence - if you drop something, you are very sure it will fall to the ground because you’ve seen it happen innumerable times. But if you’ve only seen something happen once, you can’t be sure it will happen every time.
Image credit: Tanya Hart (CC-BY)

About two months ago the journal Social Psychology published an entire special issue^[11] dedicated to the replication of the field’s “textbook classics”. Again, while this might seem trivial to academics from other disciplines, it’s an almost revolutionary effort to psychologists. Some of the original findings were replicated, some were not. Besides the scientific value of each replication attempt, the lesson to be learnt from this debate (and drama) that they sparked seems even more important. There was a lengthy, and quite heated, back-and-forth between a team of replicators (Johnson, Cheung, and Donnellan) and the author of a “textbook classic” (Schnall) about who or what might have been responsible for the failure to replicate the original findings. As previously observed in similar exchanges, psychologists can sometimes become defensive (rightfully or not) when their ideas get challenged by failed replications. Other commenters chimed in, too. Harvard psychologist Dan Gilbert, for instance, dubbed Johnson and colleagues the replication police and shameless little bullies on Twitter. In fact, some scholars have started an astounding controversy on whether failed replications, as opposed to successful replications, have a place in psychology at all. Harvard’s Jason Mitchell, for example, argues that unsuccessful experiments have no meaningful scientific value and do not constitute scientific output. And while Mitchell has been met with severe criticism (e.g., by Neuroskeptic) from a large number of commenters - including other psychologists - the fact that a tenured professor working at one of the most prestigious research institutions in the world publicly expresses doubts about the usefulness of one of the hallmarks of empirical science might be telling us that it is still a long way for psychology to attain the status that other disciplines already earned.

Psychology notes 3 ring binder 3 ring binder

Like our doodle? Studying psychology? Now you can keep your lecture notes in style and help support us at the same time with this stylish ring binder from zazzle!

Every case of academic fraud is, of course, a major setback to the accumulated scientific knowledge. Hidden questionable research practices and methodological flexibility issues should make us cautious towards the conclusiveness of many findings. Even studies that have been done properly should be interpreted carefully due to the extremely low rate of replications. Taken together, it appears that psychology’s list of Things We Don’t Know might be longer than previously assumed. So what do we do? Despite the negative views on research practices in psychology expressed here, I’m happy to end on a positive note: There are efforts to amend some of these problems. Initiatives like the Open Science Framework try to maximize transparency and replicability in psychological research by providing a platform that allows pre-registering hypotheses, hosting test materials and measures, and making data-sets publicly available. Others discuss potential models to incentivize replications^[12]. I would argue that the crisis in psychology does not mean that empirical research in psychology is a lost cause, but that it provides an opportunity for psychologists, particularly younger ones like myself, to improve the credibility of the field by asking some of the basic questions that this discipline seemed to have answered again, and ultimately to arrive at more reliable results through methodological rigor and high ethical standards. Because that’s our business – we are scientists. Well, almost.

References

why don't all references have links?

[1] Stapel, D. A., & Lindenberg, S. (2011). Coping with chaos: How disordered contexts promote stereotyping and discrimination. Science, 332(6026), 251–253. doi:10.1126/science.1201068 (RETRACTION NOTICE)
[2] Lammers, J., & Stapel, D. A. (2011). Racist biases in legal decisions are reduced by a justice focus. European Journal of Social Psychology, 41(3), 375–387. doi:10.1002/ejsp.783 (RETRACTION NOTICE)
[3] John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23(5), 524–532. doi:10.1177/0956797611430953
[4] Fanelli, D. (2010). “Positive” results increase down the hierarchy of the sciences. PloS ONE, 5(4), e10068. doi:10.1371/journal.pone.0010068
[5] Fanelli, D. (2012). Negative results are disappearing from most disciplines and countries. Scientometrics, 90(3), 891–904. doi:10.1007/s11192-011-0494-7
[6] Bones, A. K. (2012). We knew the future all along: Scientific hypothesizing is much more accurate than other forms of precognition - a satire in one part. Perspectives on Psychological Science, 7(3), 307–309. doi:10.1177/1745691612441216
[7] Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86(3), 638–641. doi:10.1037//0033-2909.86.3.638
[8] Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366. doi:10.1177/0956797611417632
[9] Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124. doi:10.1371/journal.pmed.0020124
[10] Makel, M. C., Plucker, J. A., & Hegarty, B. (2012). Replications in psychology research: How often do they really occur? Perspectives on Psychological Science, 7(6), 537–542. doi:10.1177/1745691612460688
[11] Nosek, B. A., & Lakens, D. (2014). Registered reports. A method to increase the credibility of published results. Social Psychology, 45(3), 137–141. doi:10.1027/1864-9335/a000192
[12] Koole, S. L., & Lakens, D. (2012). Rewarding replications: A sure and simple way to improve psychological science. Perspectives on Psychological Science, 7(6), 608–614. doi:10.1177/1745691612462586

Things We Don't Know

Search our site

Wednesday, 16 July 2014

How reliable is psychological science?

Things We Don't Know Anymore

No comments:

Post a Comment