Data dredging is the cherry-picking of multiple statistical tests on a data set to demonstrate a promising or attractive finding. This leads to a spurious excess of false-positive and statistically significant results. This typically happens when a data set is examined too many times with many statistical tests on the data and then only reporting or paying attention to those results that come back with statistical significance.
If you do many and repeated statistical tests (multiple comparisons) on a data set, then some will be statistically significant by chance. They may not be a true relationship and is spurious and any correlation found is by chance.
Data dredging is also referred to as fishing, p-hacking, significance chasing or data snooping.
It is now common practice to register clinical trials and specify in advance what the primary endpoints and hypotheses are to avoid the bias of data dredging. Another solution to the problem of data dredging is to use the Bonferroni correction.
Page last updated:
Comments are closed.