How scientists do massage with ‘P-hacking’
Jonathan KitchenGetty Images
Scientific pursuits are designed to search for meaning in the maze of data. At least it should work.
According to some reports, that façade began to crumble in 2010, when Cornell University social psychologist Darryl Bem published a decade-long analysis in a prominent journal. Journal of Personality and Social Psychology, using widely accepted statistical methods, demonstrated that extrasensory perception (ESP), essentially the ‘sixth sense’, is an observable phenomenon. Bem’s colleagues were unable to reproduce the results of this paper and engaged in what we now call “p-hacking”, massaging and over-analyzing the data in search of statistically significant and publishable results. I was quick to blame it on the process of doing it.
♾ You like math. So are we. Let’s dig deep into that complexity together. Join Pop Mech Pro.
The goal to support or refute a hypothesis is to establish statistical significance by recording a “p-value” of less than 0.05, explained Benjamin Baer, a postdoctoral researcher and statistician at the University of Rochester. doing. this problem. The ‘p’ in the p-value stands for probability, a measure of how likely the null hypothesis outcome is to chance.
For example, if you want to test whether all roses are red, count the number of red roses and roses of other colors in your sample and perform a hypothesis test to compare the values. If this test yields a p-value of less than 0.05, there is statistically significant grounds for claiming that only red roses are present.
Misuse of p-values to support the idea that ESP exists may be relatively harmless, but this practice could have far more lethal consequences if used in medical trials. Yes, says Baer. “I think the big risk is that the wrong decisions can be made,” he explains. Based on what it should be. ”
Baer was the first author of a paper published in the journal at the end of 2021. PNAS Together with his former Cornell University mentor and professor of statistics Martin Wells, we examined how the new statistic could improve the use of p-values. The metric they examined is called the vulnerability index and is designed to complement and improve p-values.
This measure represents the fragility of a data set where some data points flip from positive to negative results. If changing just a few of these data points is enough to demote a result from statistically significant to not, the result is considered weak.
In 2014, physician Michael Walsh first proposed the vulnerability index. Journal of Clinical EpidemiologyIn this paper, he and his colleagues applied the vulnerability index to just under 400 randomized controlled trials and obtained statistically significant results, demonstrating that 1 in 4 had a low vulnerability score. discovered.
However, the vulnerability index has yet to gain much momentum in medical trials. Some, like Rickey Carter of the Mayo Clinic, have criticized this approach for being too p-value-like and not yielding enough improvement. “Ironically, the vulnerability index was his p-hacking approach,” he says Carter.
To improve the vulnerability index, Baer, Wells, and colleagues focused on improving two main elements to answer previous criticisms: make only sufficiently likely changes, and We generalize the approach to work beyond 2×2 tables (representing results for positive or negative controls and experimental groups). .
Despite the uphill battle for the vulnerability index so far, Baer believes it’s a useful metric for medical statisticians, and that the improvements made in the recent study are making it available to others. I hope it helps to convince you.
“Talking to a victim’s family after a failed operation is very different. [experience] More than a statistician sitting at a desk doing math,” says Bear.