A Credibility Crisis in Psychology?
Jerzy Marian Brzeziński
Adam Mickiewicz University, Poznań; Faculty of Psychology and Cognitive Sciencehttps://orcid.org/0000-0003-1582-4013
Abstrakt
The interest in the global result obtained by B. Nosek’s team increased significantly, not only among psychologists, after an article presenting the results of a large-scale international replication of psychological empirical research had been published in Science (cf. Open Science Collaboration, 2015). While 97% of the original research yielded statistically significant results (p <. 05), only 36% of the results were significant in the replication. The author of the present article postulates that this result laid the ground for unjustified generalizations about the methodological weaknesses of psychology as an empirical science. Psychology is an empirical science, but it also has its peculiarities due to the specificity of the subject matter and the method (e.g. Orne, 1962, 1973; Rosenthal, 1966/2009; Rosenzweig, 1933). Equally importantly, psychology is not practiced in social or cultural isolation. Finally, psychological research is bound by rigorous ethical standards/constraints, and psychologists (as well as researchers in other fields) who publish the results of empirical research to be analyzed statistically are constrained by the editorial practices of scientific journals. Journals have an interest only in papers that present statistically significant results (where “p < .05”!), which leads to the so-called file-drawer effect (Rosenthal, 1979). As strongly emphasized by the author, the debate cannot be limited to the statistical significance of psychological research (in particular the power of statistical test which has emerged as a popular trend in recent years). In this article, the author discusses (and presents his point of view) the following problems: 1) the methodological specificity of psychology as an empirical science, 2) the triad of statistical significance (the problematic criterion of “p < .05”), effect size, and the power of a statistical test, 3) the socio-cultural context of psychological research, 4) researchers' failure to follow methodological and ethical guidelines, and 5) possible precautions and remedies.
Słowa kluczowe:
science, intersubjectivity, stability, rationality, credibility, replication, psychological research, statistics, statistical test, confidence interval, p <.05, power of statistical test, effect size, data fishing, p-hacking, HARKing, interpersonal expectation, demand characteristic, file drawer effect, pre-registration researchBibliografia
Aguinis, H., Villamor, I., & Ramani, R. S. (2021). MTurk research: Review and recommendations. Journal of Management, 47(4), 823–837. https://doi.org/10.1177/0149206320969787
Crossref
Google Scholar
Ajdukiewicz, K. (1949/2003). Zagadnienia i kierunki filozofii. Teoria poznania. Metafizyka [Issues and directions of philosophy. Epistemology. Metaphysics]. Czytelnik. Google Scholar
Ajdukiewicz, K. (1957/2020). O wolności nauki [On freedom of science]. Nauka, 2, 7–24. https://doi.org/10.24425/nauka.2020.132629
Crossref
Google Scholar
Ajdukiewicz, K. (1958). Zagadnienie racjonalności zawodnych sposobów wnioskowania [The issue of the rationality of unreliable ways of reasoning]. Studia Filozoficzne, 4, 14–29. Google Scholar
American Psychological Association. (2020). Publication manual of the American Psychological Association (7th ed.). Author. Google Scholar
American Psychological Association Presidential Task Force on Evidence-Based Practice. (2006). Evidence-based practice in psychology. American Psychologist, 61(4), 271–285. https://doi.org/10.1037/0003-066X.61.4.271
Crossref
Google Scholar
American Psychological Association Publications and Communications Board Working Group on Journal Article Reporting Standards. (2008). Reporting standards for research in psychology: Why do we need them? What might they be? American Psychologist, 63(9), 839–851. https://doi.org/10.1037/0003-066X.63.9.839
Crossref
Google Scholar
Blanck, P. D. (Ed.). (1993). Interpersonal expectations. Theory, research, and applications. Cambridge University Press.
Crossref
Google Scholar
Brzeziński, J. (2012). Badania eksperymentalne w psychologii i pedagogice (wyd. Popr.) [Experimental research in psychology and education (Rev. ed.)]. Wydawnictwo Naukowe Scholar. Google Scholar
Brzeziński, J. (2016). Towards a comprehensive model of scientific research and professional practice in psychology. Current Issues in Personality Psychology, 4(1), 2–10. https://doi.org/10.5114/cipp.2016.58442
Crossref
Google Scholar
Brzeziński, J. M. (2019). Metodologia badań psychologicznych. Wydanie nowe. [Methodology of psychological research. New edition]. Warszawa: Wydawnictwo Naukowe PWN. Google Scholar
Brzeziński, J. M. (2023). Pytania do psychologów prowadzących badania naukowe. [Questions for psychologists conducting research] In A. Jonkisz, J. Poznański SJ, & J. Koszteyn (Eds.), Zrozumieć nasze postrzeganie i pojmowanie człowieka i świata. Profesorowi Józefowi Bremerowi SJ z okazji 70-lecia urodzin [To understand our perception and comprehension of the human and the world. Papers dedicated to Professor Józef Bremer SJ on the occasion of his 70th birthday] (pp. 289–311). Wydawnictwo Naukowe Akademii Ignatianum. Google Scholar
Brzeziński, J. M., & Oleś, P. K. (2021). O psychologii i psychologach. Między uniwersytetem a praktyką społeczną [On psychology and psychologists. Between university and social practice]. Wydawnictwo Naukowe PWN. Google Scholar
Brzeziński, J., & Siuta, J. (Eds.). (1991). Społeczny kontekst badań psychologicznych i pedagogicznych. Wybór tekstów [The social context of psychological and pedagogical research. A reader]. Wydawnictwo Naukowe UAM. Google Scholar
Brzeziński, J., & Siuta, J. (Eds.). (2006). Metodologiczne i statystyczne problemy psychologii. Wybór tekstów [Methodological and statistical problems of psychology. A reader]. Wydawnictwo Naukowe UAM. Google Scholar
Brzeziński, J., & Stachowski, R. (1981/1984). Zastosowanie analizy wariancji w eksperymentalnych badaniach psychologicznych (2nd ed.) [Application of analysis of variance in experimental psychological research]. Państwowe Wydawnictwo Naukowe. Google Scholar
Buchanan, E., & Scofield, J. E. (2018). Methods to detect low quality data and its implication for psychological research. Behavior Research Methods, 50(3), 2586–2596. https://doi.org/10.3758/s13428-018-1035-6
Crossref
Google Scholar
Budzicz, Ł. (2015). Post-Stapelian psychology. Discussions on the reliability of data and publications in psychology. Annals of Psychology, 18(1), 25–40.
Crossref
Google Scholar
Buhrmester M. D., Talaifar S., & Gosling S. D. (2018). An evaluation of Amazon’s Mechanical Turk, its rapid rise, and its effective use. Perspectives on Psychological Science, 13(2), 149–154. https://doi.org/10.1177/1745691617706516
Crossref
Google Scholar
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). L. Erlbaum. Google Scholar
Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45(12), 1304–1312. https://doi.org/10.1037/0003-066X.45.12.1304
Crossref
Google Scholar
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003. https://doi.org/10.1037/0003-066X.49.12.997
Crossref
Google Scholar
Edwards, A. L. (1950/1960/1968/1972). Experimental design in psychological research. Holt, Rinehart and Winston. Google Scholar
Fisher, R. A. (1925/1938). Statistical methods for research workers (7th ed.). Oliver & Boyd. Google Scholar
Fisher, R. A. (1935/1971). The design of experiment (8th ed.). Oliver & Boyd. Google Scholar
Grissom, R. J., & Kim, J. J. (2005). Effect sizes for research. A broad practical approach. The Psychology Press, Taylor and Francis Group. Google Scholar
Grissom, R. J., & Kim, J. J. (2011). Effect sizes for research. Univariate and multivariate applications (2nd ed.). Routledge, Taylor and Francis Group.
Crossref
Google Scholar
Harlow, L. L., Mulaik, S. A., & Steiger, J. H. (Eds.). (1997). What if there were no significance tests? L. Erlbaum. Google Scholar
Hays, W. L. (1973). Statistics for the social sciences (2nd ed.). Holt, Rinehart, and Winston. [1st ed.1963: Statistics for psychologists; 5th ed.1994: Statistics]. Google Scholar
Henkel, E., & Morrison, D. E. (Ed.). (1970). The significance test controversy. A reader. Butterworths. Google Scholar
Keith, M. G., Tay L., & Harms, P.D. (2017). Systems perspective of Amazon Mechanical Turk for organizational research: Review and recommendations. Frontiers in Psychology, 8, 1359. https://doi.org/10.3389/fpsyg.2017.01359
Crossref
Google Scholar
King, B. M., & Minium, E. W. (2003). Statistical reasoning in psychology and education (4th ed.). John Wiley & Sons. Google Scholar
Kirk, R. E. (1968/1982/1995). Experimental design: Procedures for the behavioral sciences. Brooks/Cole. Google Scholar
Kirk, R. E. (2012). Experimental design: Procedures for the behavioral sciences (4th ed.). Sage.
Crossref
Google Scholar
Labowitz, S. (1970). Criteria for selecting a significance level: A note on the sacredness of .05. In E. Henkel & D. E. Morrison (Ed.), The significance test controversy. A reader (pp. 166–171). Butterworths.
Crossref
Google Scholar
Larsen, R. J. (2005). Saul Rosenzweig (1907–2004). American Psychologist, 60(3), 259. https://doi.org/10.1037/0003-066X.60.3.259
Crossref
Google Scholar
Loftus, G. (1996). Psychology will be a much better science when we change the way we analyze data. Current Directions in Psychological Science, 5, 161–171.
Crossref
Google Scholar
Loftus, G. R. (2002). Analysis, interpretation, and visual presentation of experimental data. In H. Pashler & J. Wixted (Eds.), Stevens' handbook of experimental psychology: Methodology in experimental psychology (pp. 339–390). John Wiley & Sons, Inc. https://doi.org/10.1002/0471214426.pas0409
Crossref
Google Scholar
Miller, A. G. (Ed.). (1972). The social psychology of psychological research. The Free Press. Google Scholar
Neuliep, J. W. (Ed.). (1991). Replication research in the social sciences. Sage. Google Scholar
Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science, 349(6251). https://www.researchgate.net/publication/281286234_Estimating_the_reproducibility_of_psychological_science
Crossref
Google Scholar
Orne, M. T. (1962). On the social psychology of the psychological experiment: With particular reference to demand characteristics and their implications. American Psychologist, 17(11), 776–783. https://doi.org/10.1037/h0043424
Crossref
Google Scholar
Orne, M. T. (1973). Communication by the total experimental situation: Why it is important, how it is evaluated, and its significance for the ecological validity of findings. In P. Pliner, L. Krames, & T. Alloway (Eds.), Communication and affect: Language and thought (pp. 157–191). Academic Press. https://doi.org/10.1016/B978-0-12-558250-6.50014-6
Crossref
Google Scholar
Popper, K. (1974). The logic of scientific discovery. Hutchinson. Google Scholar
Reichenbach, H. (1938/1989). Trzy zadania epistemolo¬gii [Pol. transl. W. Sady: §1: The three tasks of epistemo¬logy. In H. Reichenbach, Experience and prediction (pp. 3–16). University of Chicago Press]. Studia Filozoficzne, 7-8, 205–212. Google Scholar
Rosenthal, R. (1966/2009), Experimenter effects in behavioral research. New York: Appleton-Century-Crofts. In Artifacts in behavioral research: Robert Rosenthal and Ralph L. Rosnow's classic books (pp. 287–666). Oxford University Press.
Crossref
Google Scholar
Rosenthal, R. (1979) The "file drawer problem" and tolerance for null results. Psychological Bulletin, 86(3), 838–641.
Crossref
Google Scholar
Rosenthal, R., Rosnow, R. L., & Rubin, (2000). Contrasts and effect sizes in behavioral research: A correlational approach. Cambridge University Press.
Crossref
Google Scholar
Rosenzweig, S. (1933). The experimental situation as a psychological problem. Psychological Review, 40, 337–354.
Crossref
Google Scholar
Saad, D. (2021), Nowe narzędzia i techniki zwiększające trafność badań internetowych [Increasing validity of online research by implementing new tools and techniques], com.press, 4(1), 106–121. https://doi.org/10.51480/compress.2021.4-1.248
Crossref
Google Scholar
Schneider, W. J., & McGrew, K. S. (2012). The Cattell-Horn-Carroll model of Intelligence. In D. P. Flanagan & P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and issues (pp. 99–144). The Guilford Press. Google Scholar
Schwarzer, G. (2022). General Package for Meta-Analysis. Version 6.0-0. https://cran.rproject.org/web/packages/meta/meta.pdf Google Scholar
Skipper, Jr., Guenther, A. L., & Nass, G. (1967/1970). The sacredness of .05: A note concerning the uses of statistical levels of significance in social science. In R. E. Henkel & D. E. Morrison (Eds.), The significance test controversy. A reader (pp. 155–160). Butterworths.
Crossref
Google Scholar
Sosnowski, T., & Jarmakowska-Kostrzanowska, L. (2020). Do czego potrzebna jest moc statystyczna? [What is statistical power needed for?]. In M. Trojan & M. Gut (Eds.), Nowe technologie i metody w psychologii [New technologies and methods in psychology] (pp. 449–470). Liberi Libri. https://doi.org/10.47943/lib.9788363487430.rozdzial21
Crossref
Google Scholar
Trusz, S. (Ed.). (2013). Efekty oczekiwań interpersonalnych. Wybór tekstów [Interpersonal expectation effect. A reader]. Wydawnictwo Naukowe Scholar. Google Scholar
Tukey, J. B. (1977). Exploratory data analysis. Addison-Wesley. Google Scholar
Webb, M. A., & Tangney, J. P. (2022). Too good to be true: Bots and bad data from Mechanical Turk. Perspectives on Psychological Science, 1–4. https://csl.mpg.de/427800/webb_tangney__too_good_to_be_true_2022.pdf; https://doi.org/10.1177/17456916221120027
Crossref
Google Scholar
Wilkinson, L. & Task Force on Statistical Inference American Psychological Association, Science Directorate. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594–604. https://doi.org/10.1037/0003-066X.54.8.594
Crossref
Google Scholar
Winer, B. J. (1962/1971). Statistical principles in experimental design. McGraw-Hill.
Crossref
Google Scholar
Winer, B. J., Brown, D. R., & Michels, K. M. (1991). Statistical principles in experimental design (3rd ed.). McGraw-Hill. Google Scholar
Wolski, P. (2016a). Istotność statystyczna I. Nieodrobiona lekcja [Statistical significance I. A lesson not learned]. Rocznik Kognitywistyczny, 9, 27–35. https://doi.org/10.4467/20843895RK.16.003.5471 Google Scholar
Wolski, P. (2016b). Istotność statystyczna II. Pułapki interpretacyjne [Statistical significance II. Interpretive pitfalls]. Rocznik Kognitywistyczny, 9, 59–70. https://doi.org/10.4467/20843895RK.16.006.6412
Crossref
Google Scholar
Wolski, P. (2016c). Istotność statystyczna III. Od rytuału do myślenia statystycznego [Statistical significance III. From ritual to statistical thinking]. Rocznik Kognitywistyczny, 9, 71–85. https://doi.org/10.4467/20843895RK.16.007.6413
Crossref
Google Scholar
Adam Mickiewicz University, Poznań; Faculty of Psychology and Cognitive Science
https://orcid.org/0000-0003-1582-4013
Licencja
Utwór dostępny jest na licencji Creative Commons Uznanie autorstwa – Użycie niekomercyjne – Bez utworów zależnych 4.0 Międzynarodowe.