Statistical power of a test – an analysis of a test’s power, its role in the research methodology and the interpretation of (non-)significance in a low- (high-) powered test
Lilianna Jarmakowska-Kostrzanowska
Nicolaus Copernicus University in Toruńhttps://orcid.org/0000-0003-3644-006X
Abstrakt
Aim
This study has two main aims – to present the statistical power of a test and to discuss the main problems in analyses of a test’s power with the use of a new-old tool. The applied tool is new because it marks a recent addition to a researcher’s standard toolbox, but it is old because has been long recognized in statistics. The technical aspects of a power analysis in relation to the p-value were also discussed.
Hypotheses
The power analysis and statistical significance are concepts that originate from two different approaches to null hypothesis statistical testing (NHST). The lack of conformity between different approaches to the NHST paradigm creates problems in the interpretation of test results.
Conclusions
The required sample size can be determined in a power analysis, but the results of a power test are not easy to interpret. There are no clear rules for interpreting a statistically non-significant result in a high-powered test or a significant result in a low-powered test. A test’s power does not confirm the statistically significant result, nor does it disprove the null hypothesis when the result is not statistically significant.
Słowa kluczowe:
statistical significance, p-value, power analysis, power of a testBibliografia
Brzeziński, J. (1997). Metodologia badań psychologicznych. Wydawnictwo Naukowe PWN. Google Scholar
Carney, D. R., Cuddy, A. J. C., Yap, A. J. (2010). Power Posing: Brief Nonverbal Displays Affect Neuroendocrine Levels and Risk Tolerance. Psychological Science, 21(10), 1363–1368. DOI: https://doi.org/10.1177/0956797610383437.
Crossref
Google Scholar
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed). L. Erlbaum Associates. Google Scholar
Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45(12), 1304– 1312. https://doi.org/10.1037/0003-066X.45.12.1304.
Crossref
Google Scholar
Cumming, G. (2011). Understanding The New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis. Routledge. ISBN 9780415879682. Google Scholar
Gelman, A. (2019, January 4). Yes, it makes sense to do design analysis (“power calculations”) after the data have been collected. Statistical Modeling, Causal Inference, and Social Science. Google Scholar
Gigerenzer, G. (2004). Mindless statistics. The Journal of Socio-Economics, 33(5), 587– 606. DOI: https://doi.org/10.1016/j.socec.2004.09.033.
Crossref
Google Scholar
Hubbard, R., Bayarri, M. J. (2003). Confusion Over Measures of Evidence (p ’s) Versus Errors (α ’s) in Classical Statistical Testing. The American Statistician, 57(3), 171–178. DOI: https://doi.org/10.1198/0003130031856.
Crossref
Google Scholar
Huberty, C. J. (1993). Historical Origins of Statistical Testing Practices: The Treatment of Fisher versus Neyman-Pearson Views in Textbooks. The Journal of Experimental Education, 61(4), 317–333. DOI: http://www.jstor.org/stable/20152384.
Crossref
Google Scholar
Ioannidis, J. P. A. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2(8), e124. DOI: https://doi.org/10.1371/journal.pmed.0020124.
Crossref
Google Scholar
Jarmakowska-Kostrzanowska, L. (2016). W statystycznym matriksie: kontrowersje wokół testowania istotności hipotezy zerowej (null hypothesis significance testing, NHST) oraz p-wartości. Psychologia Społeczna, 4(39), 458–473. DOI: https://doi. org/10.7366/1896180020163906. Google Scholar
Kelley, K. (2013). Effect Size and Sample Size Planning. W: The Oxford Handbook of Quantitative Methods (Vol. 1, pp. 206–222).
Crossref
Google Scholar
Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B., Jr., Bahník, Š., Bernstein, M. J., … Nosek, B. A. (2014). Investigating variation in replicability: A “many labs” replication project. Social Psychology, 45(3), 142–152. DOI: http://dx.doi.org/10.1027/1864-9335/a000178.
Crossref
Google Scholar
Lewandowska, A. (2018, February 18). Power pozycja! Healthy Plan by Ann. https://hpba. pl/power-pozycja/. Google Scholar
Mayo, D. G. (2018). Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (1st ed.). Cambridge University Press. DOI: https://doi.org/10.1017/9781107286184.
Crossref
Google Scholar
Meyer, A., Frederick, S., Burnham, T. C., Guevara Pinto, J. D., Boyer, T. W., Ball, L. J., Pennycook, G., Ackerman, R., Thompson, V. A., Schuldt, J. P. (2015). Disfluent fonts don’t help people solve math problems. Journal of Experimental Psychology: General, 144(2), e16– e30. DOI: https://doi.org/10.1037/xge0000049.
Crossref
Google Scholar
Murphy, K.R., Myors, B., i Wolach, A. (2014). Statistical Power Analysis: A Simple and General Model for Traditional and Modern Hypothesis Tests. Routledge.
Crossref
Google Scholar
Nakagawa, S., Foster, T. M. (2004). The case against retrospective statistical power analyses with an introduction to power analysis. Acta Ethologica, 7(2), 103–108. DOI: https://doi.org/10.1007/s10211-004-0095-z.
Crossref
Google Scholar
Neyman, J. (1977). Frequentist Probability and Frequentist Statistics. Synthese, 36(1), 97–131. JSTOR. DOI: 10.1007/BF00485695.
Crossref
Google Scholar
Ranehill, E., Dreber, A., Johannesson, M., Leiberg, S., Sul, S., Weber, R. A. (2015). Assessing the Robustness of Power Posing: No Effect on Hormones and Risk Tolerance in a Large Sample of Men and Women. Psychological Science, 26(5), 653–656. DOI: https://doi.org/10.1177/0956797614553946.
Crossref
Google Scholar
Sirota, M., Theodoropoulou, A., Juanchich, M. (2020). Disfluent fonts do not help people to solve math and non-math problems regardless of their numeracy. Thinking & Reasoning, 1– 18. https://doi.org/10.1080/13546783.2020.1759689.
Crossref
Google Scholar
Strack, F., Martin, L. L., & Stepper, S. (1988). Inhibiting and facilitating conditions of the human smile: A nonobtrusive test of the facial feedback hypothesis. Journal of Personality and Social Psychology, 54(5), 768–777. DOI: https://doi.org/10.1037/00223514.54.5.768.
Crossref
Google Scholar
Utts, J. M. (2005). Seeing through statistics (3rd ed). Thomson, Brooks/Cole. Google Scholar
Wagenmakers, E.-J., Beek, T., Dijkhoff, L., Gronau, Q. F., Acosta, A., Adams, R. B., Albohn, D. N., Allard, E. S., Benning, S. D., Blouin-Hudon, E.-M., Bulnes, L. C., Caldwell, T. L., Calin-Jageman, R. J., Capaldi, C. A., Carfagno, N. S., Chasten, K. T., Cleeremans, A., Connell, L., DeCicco, J. M., … Zwaan, R. A. (2016). Registered Replication Report: Strack, Martin, Stepper (1988). Perspectives on Psychological Science, 11(6), 917–928. DOI: https://doi.org/10.1177/1745691616674458.
Crossref
Google Scholar
Wasserstein, R. L., Lazar, N. A. (2016). The ASA Statement on p-Values: Context, Process, and Purpose. The American Statistician, 70(2), 129–133. DOI: 10.1080/ 000 31305.2016.1154108.
Crossref
Google Scholar
Wolski, P. (2017). Istotność statystyczna III. Od rytuału do myślenia statystycznego. Rocznik Kognitywistyczny, 9(2016). DOI: https://doi.org/10.4467/20843895RK.16.007.6413.
Crossref
Google Scholar
Nicolaus Copernicus University in Toruń
https://orcid.org/0000-0003-3644-006X
Licencja
Utwór dostępny jest na licencji Creative Commons Uznanie autorstwa – Użycie niekomercyjne – Bez utworów zależnych 4.0 Międzynarodowe.