Statistical power of a test – an analysis of a test’s power, its role in the research methodology and the interpretation of (non-)significance in a low- (high-) powered test

Lilianna Jarmakowska-Kostrzanowska

Nicolaus Copernicus University in Toruń
https://orcid.org/0000-0003-3644-006X


Abstrakt

Aim
This study has two main aims – to present the statistical power of a test and to discuss the main problems in analyses of a test’s power with the use of a new-old tool. The applied tool is new because it marks a recent addition to a researcher’s standard toolbox, but it is old because has been long recognized in statistics. The technical aspects of a power analysis in relation to the p-value were also discussed. 

Hypotheses
The power analysis and statistical significance are concepts that originate from two different approaches to null hypothesis statistical testing (NHST). The lack of conformity between different approaches to the NHST paradigm creates problems in the interpretation of test results.

Conclusions
The required sample size can be determined in a power analysis, but the results of a power test are not easy to interpret. There are no clear rules for interpreting a statistically non-significant result in a high-powered test or a significant result in a low-powered test. A test’s power does not confirm the statistically significant result, nor does it disprove the null hypothesis when the result is not statistically significant.


Słowa kluczowe:

statistical significance, p-value, power analysis, power of a test


Brzeziński, J. (1997). Metodologia badań psychologicznych. Wydawnictwo Naukowe PWN.   Google Scholar

Carney, D. R., Cuddy, A. J. C., Yap, A. J. (2010). Power Posing: Brief Nonverbal Displays Affect Neuroendocrine Levels and Risk Tolerance. Psychological Science, 21(10), 1363–1368. DOI: https://doi.org/10.1177/0956797610383437.
Crossref   Google Scholar

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed). L. Erlbaum Associates.   Google Scholar

Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45(12), 1304– 1312. https://doi.org/10.1037/0003-066X.45.12.1304.
Crossref   Google Scholar

Cumming, G. (2011). Understanding The New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis. Routledge. ISBN 9780415879682.   Google Scholar

Gelman, A. (2019, January 4). Yes, it makes sense to do design analysis (“power calculations”) after the data have been collected. Statistical Modeling, Causal Inference, and Social Science.   Google Scholar

Gigerenzer, G. (2004). Mindless statistics. The Journal of Socio-Economics, 33(5), 587– 606. DOI: https://doi.org/10.1016/j.socec.2004.09.033.
Crossref   Google Scholar

Hubbard, R., Bayarri, M. J. (2003). Confusion Over Measures of Evidence (p ’s) Versus Errors (α ’s) in Classical Statistical Testing. The American Statistician, 57(3), 171–178. DOI: https://doi.org/10.1198/0003130031856.
Crossref   Google Scholar

Huberty, C. J. (1993). Historical Origins of Statistical Testing Practices: The Treatment of Fisher versus Neyman-Pearson Views in Textbooks. The Journal of Experimental Education, 61(4), 317–333. DOI: http://www.jstor.org/stable/20152384.
Crossref   Google Scholar

Ioannidis, J. P. A. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2(8), e124. DOI: https://doi.org/10.1371/journal.pmed.0020124.
Crossref   Google Scholar

Jarmakowska-Kostrzanowska, L. (2016). W statystycznym matriksie: kontrowersje wokół testowania istotności hipotezy zerowej (null hypothesis significance testing, NHST) oraz p-wartości. Psychologia Społeczna, 4(39), 458–473. DOI: https://doi. org/10.7366/1896180020163906.   Google Scholar

Kelley, K. (2013). Effect Size and Sample Size Planning. W: The Oxford Handbook of Quantitative Methods (Vol. 1, pp. 206–222).
Crossref   Google Scholar

Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B., Jr., Bahník, Š., Bernstein, M. J., … Nosek, B. A. (2014). Investigating variation in replicability: A “many labs” replication project. Social Psychology, 45(3), 142–152. DOI: http://dx.doi.org/10.1027/1864-9335/a000178.
Crossref   Google Scholar

Lewandowska, A. (2018, February 18). Power pozycja! Healthy Plan by Ann. https://hpba. pl/power-pozycja/.   Google Scholar

Mayo, D. G. (2018). Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (1st ed.). Cambridge University Press. DOI: https://doi.org/10.1017/9781107286184.
Crossref   Google Scholar

Meyer, A., Frederick, S., Burnham, T. C., Guevara Pinto, J. D., Boyer, T. W., Ball, L. J., Pennycook, G., Ackerman, R., Thompson, V. A., Schuldt, J. P. (2015). Disfluent fonts don’t help people solve math problems. Journal of Experimental Psychology: General, 144(2), e16– e30. DOI: https://doi.org/10.1037/xge0000049.
Crossref   Google Scholar

Murphy, K.R., Myors, B., i Wolach, A. (2014). Statistical Power Analysis: A Simple and General Model for Traditional and Modern Hypothesis Tests. Routledge.
Crossref   Google Scholar

Nakagawa, S., Foster, T. M. (2004). The case against retrospective statistical power analyses with an introduction to power analysis. Acta Ethologica, 7(2), 103–108. DOI: https://doi.org/10.1007/s10211-004-0095-z.
Crossref   Google Scholar

Neyman, J. (1977). Frequentist Probability and Frequentist Statistics. Synthese, 36(1), 97–131. JSTOR. DOI: 10.1007/BF00485695.
Crossref   Google Scholar

Ranehill, E., Dreber, A., Johannesson, M., Leiberg, S., Sul, S., Weber, R. A. (2015). Assessing the Robustness of Power Posing: No Effect on Hormones and Risk Tolerance in a Large Sample of Men and Women. Psychological Science, 26(5), 653–656. DOI: https://doi.org/10.1177/0956797614553946.
Crossref   Google Scholar

Sirota, M., Theodoropoulou, A., Juanchich, M. (2020). Disfluent fonts do not help people to solve math and non-math problems regardless of their numeracy. Thinking & Reasoning, 1– 18. https://doi.org/10.1080/13546783.2020.1759689.
Crossref   Google Scholar

Strack, F., Martin, L. L., & Stepper, S. (1988). Inhibiting and facilitating conditions of the human smile: A nonobtrusive test of the facial feedback hypothesis. Journal of Personality and Social Psychology, 54(5), 768–777. DOI: https://doi.org/10.1037/00223514.54.5.768.
Crossref   Google Scholar

Utts, J. M. (2005). Seeing through statistics (3rd ed). Thomson, Brooks/Cole.   Google Scholar

Wagenmakers, E.-J., Beek, T., Dijkhoff, L., Gronau, Q. F., Acosta, A., Adams, R. B., Albohn, D. N., Allard, E. S., Benning, S. D., Blouin-Hudon, E.-M., Bulnes, L. C., Caldwell, T. L., Calin-Jageman, R. J., Capaldi, C. A., Carfagno, N. S., Chasten, K. T., Cleeremans, A., Connell, L., DeCicco, J. M., … Zwaan, R. A. (2016). Registered Replication Report: Strack, Martin, Stepper (1988). Perspectives on Psychological Science, 11(6), 917–928. DOI: https://doi.org/10.1177/1745691616674458.
Crossref   Google Scholar

Wasserstein, R. L., Lazar, N. A. (2016). The ASA Statement on p-Values: Context, Process, and Purpose. The American Statistician, 70(2), 129–133. DOI: 10.1080/ 000 31305.2016.1154108.
Crossref   Google Scholar

Wolski, P. (2017). Istotność statystyczna III. Od rytuału do myślenia statystycznego. Rocznik Kognitywistyczny, 9(2016). DOI: https://doi.org/10.4467/20843895RK.16.007.6413.
Crossref   Google Scholar


Opublikowane
2021-12-30

Cited By /
Share

Jarmakowska-Kostrzanowska, L. (2021). Statistical power of a test – an analysis of a test’s power, its role in the research methodology and the interpretation of (non-)significance in a low- (high-) powered test . Przegląd Psychologiczny, 64(4), 177–193. https://doi.org/10.31648/przegldpsychologiczny.7889

Lilianna Jarmakowska-Kostrzanowska 
Nicolaus Copernicus University in Toruń
https://orcid.org/0000-0003-3644-006X