posted on 2020-02-04, 19:48authored byJustin F. Landy, Miaolei (Liam) Jia, Isabel L. Ding, Domenico Viganola, Warren Tierney, Anna Dreber, Magnus Johannesson, Thomas Pfeiffer, Charles R. Ebersole, Quentin F. Gronau, Alexander Ly, Don van den Bergh, Maaten Marsman, Koen Derks, Eric-Jan Wagenmakers, Andrew Proctor, Daniel M. Bartels, Christopher W. Bauman, William J. Brady, Felix Cheung, Andrei Cimpian, Simone Dohle, Brent M. Donnellan, Adam Hahn, Michael P. Hall, William Jiménez-Leal, David J. Johnson, Richard E. Lucas, Benoit Monin, Andres Montealegre, Elizabeth Mullen, Jun Pang, Jennifer Ray, Diego A. Reinero, Jesse Reynolds, Walter Sowden, Daniel Storage, Runkun Su, Christina M. Tworek, Jay J. Van Bavel, Daniel Walco, Julian Wills, Xiaobing Xu, Chi Kai Yam, Xiaoyu Yang, William A. Cunningham, Martin Schweinsberg, Molly Urwitz, Eric Luis Uhlmann
To what extent are research results influenced by subjective decisions that scientists make as
they design studies? Fifteen research teams independently designed studies to answer five
original research questions related to moral judgments, negotiations, and implicit cognition.
Participants from two separate large samples (total N > 15,000) were then randomly assigned to
complete one version of each study. Effect sizes varied dramatically across different sets of
materials designed to test the same hypothesis: materials from different teams rendered
statistically significant effects in opposite directions for four out of five hypotheses, with the
narrowest range in estimates being d = -0.37 to +0.26. Meta-analysis and a Bayesian perspective
on the results revealed overall support for two hypotheses, and a lack of support for three
hypotheses. Overall, practically none of the variability in effect sizes was attributable to the skill
of the research team in designing materials, while considerable variability was attributable to the
hypothesis being tested. In a forecasting survey, predictions of other scientists were significantly
correlated with study results, both across and within hypotheses. Crowdsourced testing of
research hypotheses helps reveal the true consistency of empirical support for a scientific claim.