A Credibility Crisis in Psychology?

Jerzy Marian Brzeziński

Adam Mickiewicz University, Poznań; Faculty of Psychology and Cognitive Science
https://orcid.org/0000-0003-1582-4013


Abstract

The interest in the global result obtained by B. Nosek’s team increased significantly, not only among psychologists, after an article presenting the results of a large-scale international replication of psychological empirical research had been published in Science (cf. Open Science Collaboration, 2015). While 97% of the original research yielded statistically significant results (p <. 05), only 36% of the results were significant in the replication. The author of the present article postulates that this result laid the ground for unjustified generalizations about the methodological weaknesses of psychology as an empirical science. Psychology is an empirical science, but it also has its peculiarities due to the specificity of the subject matter and the method (e.g. Orne, 1962, 1973; Rosenthal, 1966/2009; Rosenzweig, 1933). Equally importantly, psychology is not practiced in social or cultural isolation. Finally, psychological research is bound by rigorous ethical standards/constraints, and psychologists (as well as researchers in other fields) who publish the results of empirical research to be analyzed statistically are constrained by the editorial practices of scientific journals. Journals have an interest only in papers that present statistically significant results (where “p < .05”!), which leads to the so-called file-drawer effect (Rosenthal, 1979). As strongly emphasized by the author, the debate cannot be limited to the statistical significance of psychological research (in particular the power of statistical test which has emerged as a popular trend in recent years). In this article, the author discusses (and presents his point of view) the following problems: 1) the methodological specificity of psychology as an empirical science, 2) the triad of statistical significance (the problematic criterion of “p < .05”), effect size, and the power of a statistical test, 3) the socio-cultural context of psychological research, 4) researchers' failure to follow methodological and ethical guidelines, and 5) possible precautions and remedies.


Keywords:

science, intersubjectivity, stability, rationality, credibility, replication, psychological research, statistics, statistical test, confidence interval, p <.05, power of statistical test, effect size, data fishing, p-hacking, HARKing, interpersonal expectation, demand characteristic, file drawer effect, pre-registration research


Aguinis, H., Villamor, I., & Ramani, R. S. (2021). MTurk research: Review and recommendations. Journal of Management, 47(4), 823–837. https://doi.org/10.1177/0149206320969787
Crossref   Google Scholar

Ajdukiewicz, K. (1949/2003). Zagadnienia i kierunki filozofii. Teoria poznania. Metafizyka [Issues and directions of philosophy. Epistemology. Metaphysics]. Czytelnik.   Google Scholar

Ajdukiewicz, K. (1957/2020). O wolności nauki [On freedom of science]. Nauka, 2, 7–24. https://doi.org/10.24425/nauka.2020.132629
Crossref   Google Scholar

Ajdukiewicz, K. (1958). Zagadnienie racjonalności zawodnych sposobów wnioskowania [The issue of the rationality of unreliable ways of reasoning]. Studia Filozoficzne, 4, 14–29.   Google Scholar

American Psychological Association. (2020). Publication manual of the American Psychological Association (7th ed.). Author.   Google Scholar

American Psychological Association Presidential Task Force on Evidence-Based Practice. (2006). Evidence-based practice in psychology. American Psychologist, 61(4), 271–285. https://doi.org/10.1037/0003-066X.61.4.271
Crossref   Google Scholar

American Psychological Association Publications and Communications Board Working Group on Journal Article Reporting Standards. (2008). Reporting standards for research in psychology: Why do we need them? What might they be? American Psychologist, 63(9), 839–851. https://doi.org/10.1037/0003-066X.63.9.839
Crossref   Google Scholar

Blanck, P. D. (Ed.). (1993). Interpersonal expectations. Theory, research, and applications. Cambridge University Press.
Crossref   Google Scholar

Brzeziński, J. (2012). Badania eksperymentalne w psychologii i pedagogice (wyd. Popr.) [Experimental research in psychology and education (Rev. ed.)]. Wydawnictwo Naukowe Scholar.   Google Scholar

Brzeziński, J. (2016). Towards a comprehensive model of scientific research and professional practice in psychology. Current Issues in Personality Psychology, 4(1), 2–10. https://doi.org/10.5114/cipp.2016.58442
Crossref   Google Scholar

Brzeziński, J. M. (2019). Metodologia badań psychologicznych. Wydanie nowe. [Methodology of psychological research. New edition]. Warszawa: Wydawnictwo Naukowe PWN.   Google Scholar

Brzeziński, J. M. (2023). Pytania do psychologów prowadzących badania naukowe. [Questions for psychologists conducting research] In A. Jonkisz, J. Poznański SJ, & J. Koszteyn (Eds.), Zrozumieć nasze postrzeganie i pojmowanie człowieka i świata. Profesorowi Józefowi Bremerowi SJ z okazji 70-lecia urodzin [To understand our perception and comprehension of the human and the world. Papers dedicated to Professor Józef Bremer SJ on the occasion of his 70th birthday] (pp. 289–311). Wydawnictwo Naukowe Akademii Ignatianum.   Google Scholar

Brzeziński, J. M., & Oleś, P. K. (2021). O psychologii i psychologach. Między uniwersytetem a praktyką społeczną [On psychology and psychologists. Between university and social practice]. Wydawnictwo Naukowe PWN.   Google Scholar

Brzeziński, J., & Siuta, J. (Eds.). (1991). Społeczny kontekst badań psychologicznych i pedagogicznych. Wybór tekstów [The social context of psychological and pedagogical research. A reader]. Wydawnictwo Naukowe UAM.   Google Scholar

Brzeziński, J., & Siuta, J. (Eds.). (2006). Metodologiczne i statystyczne problemy psychologii. Wybór tekstów [Methodological and statistical problems of psychology. A reader]. Wydawnictwo Naukowe UAM.   Google Scholar

Brzeziński, J., & Stachowski, R. (1981/1984). Zastosowanie analizy wariancji w eksperymentalnych badaniach psychologicznych (2nd ed.) [Application of analysis of variance in experimental psychological research]. Państwowe Wydawnictwo Naukowe.   Google Scholar

Buchanan, E., & Scofield, J. E. (2018). Methods to detect low quality data and its implication for psychological research. Behavior Research Methods, 50(3), 2586–2596. https://doi.org/10.3758/s13428-018-1035-6
Crossref   Google Scholar

Budzicz, Ł. (2015). Post-Stapelian psychology. Discussions on the reliability of data and publications in psychology. Annals of Psychology, 18(1), 25–40.
Crossref   Google Scholar

Buhrmester M. D., Talaifar S., & Gosling S. D. (2018). An evaluation of Amazon’s Mechanical Turk, its rapid rise, and its effective use. Perspectives on Psychological Science, 13(2), 149–154. https://doi.org/10.1177/1745691617706516
Crossref   Google Scholar

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). L. Erlbaum.   Google Scholar

Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45(12), 1304–1312. https://doi.org/10.1037/0003-066X.45.12.1304
Crossref   Google Scholar

Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003. https://doi.org/10.1037/0003-066X.49.12.997
Crossref   Google Scholar

Edwards, A. L. (1950/1960/1968/1972). Experimental design in psychological research. Holt, Rinehart and Winston.   Google Scholar

Fisher, R. A. (1925/1938). Statistical methods for research workers (7th ed.). Oliver & Boyd.   Google Scholar

Fisher, R. A. (1935/1971). The design of experiment (8th ed.). Oliver & Boyd.   Google Scholar

Grissom, R. J., & Kim, J. J. (2005). Effect sizes for research. A broad practical approach. The Psychology Press, Taylor and Francis Group.   Google Scholar

Grissom, R. J., & Kim, J. J. (2011). Effect sizes for research. Univariate and multivariate applications (2nd ed.). Routledge, Taylor and Francis Group.
Crossref   Google Scholar

Harlow, L. L., Mulaik, S. A., & Steiger, J. H. (Eds.). (1997). What if there were no significance tests? L. Erlbaum.   Google Scholar

Hays, W. L. (1973). Statistics for the social sciences (2nd ed.). Holt, Rinehart, and Winston. [1st ed.1963: Statistics for psychologists; 5th ed.1994: Statistics].   Google Scholar

Henkel, E., & Morrison, D. E. (Ed.). (1970). The significance test controversy. A reader. Butterworths.   Google Scholar

Keith, M. G., Tay L., & Harms, P.D. (2017). Systems perspective of Amazon Mechanical Turk for organizational research: Review and recommendations. Frontiers in Psychology, 8, 1359. https://doi.org/10.3389/fpsyg.2017.01359
Crossref   Google Scholar

King, B. M., & Minium, E. W. (2003). Statistical reasoning in psychology and education (4th ed.). John Wiley & Sons.   Google Scholar

Kirk, R. E. (1968/1982/1995). Experimental design: Procedures for the behavioral sciences. Brooks/Cole.   Google Scholar

Kirk, R. E. (2012). Experimental design: Procedures for the behavioral sciences (4th ed.). Sage.
Crossref   Google Scholar

Labowitz, S. (1970). Criteria for selecting a significance level: A note on the sacredness of .05. In E. Henkel & D. E. Morrison (Ed.), The significance test controversy. A reader (pp. 166–171). Butterworths.
Crossref   Google Scholar

Larsen, R. J. (2005). Saul Rosenzweig (1907–2004). American Psychologist, 60(3), 259. https://doi.org/10.1037/0003-066X.60.3.259
Crossref   Google Scholar

Loftus, G. (1996). Psychology will be a much better science when we change the way we analyze data. Current Directions in Psychological Science, 5, 161–171.
Crossref   Google Scholar

Loftus, G. R. (2002). Analysis, interpretation, and visual presentation of experimental data. In H. Pashler & J. Wixted (Eds.), Stevens' handbook of experimental psychology: Methodology in experimental psychology (pp. 339–390). John Wiley & Sons, Inc. https://doi.org/10.1002/0471214426.pas0409
Crossref   Google Scholar

Miller, A. G. (Ed.). (1972). The social psychology of psychological research. The Free Press.   Google Scholar

Neuliep, J. W. (Ed.). (1991). Replication research in the social sciences. Sage.   Google Scholar

Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science, 349(6251). https://www.researchgate.net/publication/281286234_Estimating_the_reproducibility_of_psychological_science
Crossref   Google Scholar

Orne, M. T. (1962). On the social psychology of the psychological experiment: With particular reference to demand characteristics and their implications. American Psychologist, 17(11), 776–783. https://doi.org/10.1037/h0043424
Crossref   Google Scholar

Orne, M. T. (1973). Communication by the total experimental situation: Why it is important, how it is evaluated, and its significance for the ecological validity of findings. In P. Pliner, L. Krames, & T. Alloway (Eds.), Communication and affect: Language and thought (pp. 157–191). Academic Press. https://doi.org/10.1016/B978-0-12-558250-6.50014-6
Crossref   Google Scholar

Popper, K. (1974). The logic of scientific discovery. Hutchinson.   Google Scholar

Reichenbach, H. (1938/1989). Trzy zadania epistemolo¬gii [Pol. transl. W. Sady: §1: The three tasks of epistemo¬logy. In H. Reichenbach, Experience and prediction (pp. 3–16). University of Chicago Press]. Studia Filozoficzne, 7-8, 205–212.   Google Scholar

Rosenthal, R. (1966/2009), Experimenter effects in behavioral research. New York: Appleton-Century-Crofts. In Artifacts in behavioral research: Robert Rosenthal and Ralph L. Rosnow's classic books (pp. 287–666). Oxford University Press.
Crossref   Google Scholar

Rosenthal, R. (1979) The "file drawer problem" and tolerance for null results. Psychological Bulletin, 86(3), 838–641.
Crossref   Google Scholar

Rosenthal, R., Rosnow, R. L., & Rubin, (2000). Contrasts and effect sizes in behavioral research: A correlational approach. Cambridge University Press.
Crossref   Google Scholar

Rosenzweig, S. (1933). The experimental situation as a psychological problem. Psychological Review, 40, 337–354.
Crossref   Google Scholar

Saad, D. (2021), Nowe narzędzia i techniki zwiększające trafność badań internetowych [Increasing validity of online research by implementing new tools and techniques], com.press, 4(1), 106–121. https://doi.org/10.51480/compress.2021.4-1.248
Crossref   Google Scholar

Schneider, W. J., & McGrew, K. S. (2012). The Cattell-Horn-Carroll model of Intelligence. In D. P. Flanagan & P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and issues (pp. 99–144). The Guilford Press.   Google Scholar

Schwarzer, G. (2022). General Package for Meta-Analysis. Version 6.0-0. https://cran.rproject.org/web/packages/meta/meta.pdf   Google Scholar

Skipper, Jr., Guenther, A. L., & Nass, G. (1967/1970). The sacredness of .05: A note concerning the uses of statistical levels of significance in social science. In R. E. Henkel & D. E. Morrison (Eds.), The significance test controversy. A reader (pp. 155–160). Butterworths.
Crossref   Google Scholar

Sosnowski, T., & Jarmakowska-Kostrzanowska, L. (2020). Do czego potrzebna jest moc statystyczna? [What is statistical power needed for?]. In M. Trojan & M. Gut (Eds.), Nowe technologie i metody w psychologii [New technologies and methods in psychology] (pp. 449–470). Liberi Libri. https://doi.org/10.47943/lib.9788363487430.rozdzial21
Crossref   Google Scholar

Trusz, S. (Ed.). (2013). Efekty oczekiwań interpersonalnych. Wybór tekstów [Interpersonal expectation effect. A reader]. Wydawnictwo Naukowe Scholar.   Google Scholar

Tukey, J. B. (1977). Exploratory data analysis. Addison-Wesley.   Google Scholar

Webb, M. A., & Tangney, J. P. (2022). Too good to be true: Bots and bad data from Mechanical Turk. Perspectives on Psychological Science, 1–4. https://csl.mpg.de/427800/webb_tangney__too_good_to_be_true_2022.pdf; https://doi.org/10.1177/17456916221120027
Crossref   Google Scholar

Wilkinson, L. & Task Force on Statistical Inference American Psychological Association, Science Directorate. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594–604. https://doi.org/10.1037/0003-066X.54.8.594
Crossref   Google Scholar

Winer, B. J. (1962/1971). Statistical principles in experimental design. McGraw-Hill.
Crossref   Google Scholar

Winer, B. J., Brown, D. R., & Michels, K. M. (1991). Statistical principles in experimental design (3rd ed.). McGraw-Hill.   Google Scholar

Wolski, P. (2016a). Istotność statystyczna I. Nieodrobiona lekcja [Statistical significance I. A lesson not learned]. Rocznik Kognitywistyczny, 9, 27–35. https://doi.org/10.4467/20843895RK.16.003.5471   Google Scholar

Wolski, P. (2016b). Istotność statystyczna II. Pułapki interpretacyjne [Statistical significance II. Interpretive pitfalls]. Rocznik Kognitywistyczny, 9, 59–70. https://doi.org/10.4467/20843895RK.16.006.6412
Crossref   Google Scholar

Wolski, P. (2016c). Istotność statystyczna III. Od rytuału do myślenia statystycznego [Statistical significance III. From ritual to statistical thinking]. Rocznik Kognitywistyczny, 9, 71–85. https://doi.org/10.4467/20843895RK.16.007.6413
Crossref   Google Scholar

Download


Published
2023-10-26

Cited by

Brzeziński, J. M. (2023). A Credibility Crisis in Psychology?. The Review of Psychology, 66(1), 145–164. https://doi.org/10.31648/przegldpsychologiczny.9680

Jerzy Marian Brzeziński 
Adam Mickiewicz University, Poznań; Faculty of Psychology and Cognitive Science
https://orcid.org/0000-0003-1582-4013