The use of p-values is controversial?

Jim Bowie
Aug 9, 2022
3 min read

Updated: Mar 27

Basic and Applied Social Psychology (BASP) has banned the use of p-values and confidence intervals in articles they publish. The publication “will require strong descriptive statistics, including effect sizes, encourage the presentation of frequency or distributional data when this is feasible, and encourage the use of larger sample sizes than is typical in much psychology research, because as the sample size increases, descriptive statistics become increasingly stable and sampling error is less of a problem.[1]”

The American Statistical Association (ASA) attempts to clarify the use of (address the misuse of) p-values through six principles rather than abandon (or ban) them in order “to improve the conduct and interpretation of quantitative science and inform the growing emphasis on reproducibility of science research” and “steer research into a ‘post p<0.05 era.”[2]

This is a progressive concept. The p-value (with α typically designated at 0.05 – in itself subjective from the beginning) is a general, isolated target point that has been applied universally across research and practical applications (especially in business) with limited context to support its meaning or value (including effect or importance). This historically accepted practice decouples the effort and computations required to yield the value, from detailed and careful description of the problem through the statistical applications and dismisses the chain of results that feed the final product (the p-value).

Often, these values are reported several pages after the hypothesis is stated, and the disconnect drives a lack of comprehensive understanding. This flies in the face of a primary aspect classic education in any scientific or mathematic field, that being “show your work”. This basic principle creates a history of context, what did the raw data are/show (reproducibility), where the data came from, how it was collected and analyzed, which results were discarded and which were applied, etc.

Data context is likely the most abandoned piece of analyses today. Point and click “sticky” software allows us to enter the data and yield a result, with the latter being the prize. This is efficient and expedient but might not be effective. Unfortunately, without context, exploring the root cause or even the validity of results (when reported in a singular fashion such as the p-value) becomes a “voodoo” ceremony versus research.

This could “have the effect of increasing the quality of submitted manuscripts by liberating authors from the stultified structure of NHSTP thinking thereby eliminating an important obstacle to creative thinking.”[1] I agree. There is no standard process for research, because problems rarely standardize (note: standardization here is not referencing repetition, but similarity in initial presentation). Removing the standard format framework from the chaos the world can present (but retaining the scientific method) will force analysts to think through their approaches and results, presenting findings with context, ultimately reinforcing their findings and any associated recommendations.

Using a p-value or comparing a critical value with a test statistic provide a binary response, and the complexity of organizations, large and small, rarely affords the isolated use of such a response. However, banning p-values and confidence intervals might not be the optimal approach. Another method would require the context and additional statistical analyses/rigor along with the p-value. Over time this could help to validate or invalidate its use, demonstrate levels of agreement, and refine its application and/or calculation.

USEFUL NOTES

“Informally, a p-value is the probability under a specified statistical model that a statistical summary of the data (e.g., the sample mean difference between two compared groups) would be equal to or more extreme than its observed value.”[2] “A p-value provides one approach to summarizing the incompatibility between a particular set of data and a proposed model (this is the null hypothesis) for the data. The smaller the p-value, the greater the statistical incompatibility of the data with the null hypothesis…”[2]

[1] David Trafimow & Michael Marks (2015) Editorial, Basic and Applied Social Psychology, 37:1, 1-2, DOI: 10.1080/01973533.2015.1012991 [2] American Statistical Association Releases Statement on Statistical Significance and P-Values (http://amstat.tandfonline.com/doi/full/10.1080/00031305.2016.1154108?scroll=top&needAccess=true)

The use of p-values is controversial?

Recent Posts

Comments