Building an evidence base for data analysis.
A key issue when we see the results of any business or scientific report is whether we can trust the data analysis. As more people are trained and educated in data analysis we need evidence for what data analysis tools and procedures lead to data analysis that is replicable and reproducible.
Replicability of a study is the likelihood that an independent study aimed at the same question will give a result that is consistent with the original study.
Reproducibility is the capacity to re-analyze given data to obtain a consistent result.
Roger Peng, of Johns Hopkins University, has written forcefully on the urgent need to address a reproducibility crisis in science: The reproducibility crisis in science (see Peng, 2015, in the references below).
The same reproducibility crisis exists in business and health, and in all areas where we require that data analysis be reproducible. The issue is whether we can trust any given data analysis.
Evidence-based data analysis is an empirically-based approach to identifying what works in increasing replicability and reproducibility, and then recommending, and expecting, those methods, techniques and tools, to be used in data analysis.
- Dawes, M., Summerskill, W., Glasziou, P., Cartabellotta, A., Martin, J., Hopayian, K., … & Osborne, J. (2005). Sicily statement on evidence-based practice. BMC medical education, 5(1), 1.
- Eddy, D. M. (2005). Evidence-based medicine: a unified approach. Health affairs, 24(1), 9-17.
- Fisher, A., Anderson, G. B., Peng, R., & Leek, J. (2014). A randomized trial in a massive online open course shows people don’t know what a statistically significant relationship looks like, but they can learn. PeerJ, 2, e589.
- Goodman, Steven and Greenland, Sander. (2007) Assessing the unreliability of the medical literature: a response to “Why most published research findings are false” Johns Hopkins University, Dept. of Biostatistics Working Papers. Working Paper 135.
- Guyatt G.H., Haynes B., Jaeschke, et al. The philosophy of evidence-based medicine. In: Users’ Guides to the Medical Literature. Guyatt G, Rennie D, eds. AMA Press, Chicago, IL, 2002.
- Ioannidis, J. P. (2005). Why most published research findings are false. Chance, 18(4), 40-47.
- Kupczynski, M. (2015). Significance tests and sample homogeneity loophole. arXiv preprint arXiv:1505.06349.
- Leek, J. T., & Peng, R. D. (2015). Opinion: Reproducible research can still be wrong: Adopting a prevention approach. Proceedings of the National Academy of Sciences, 112(6), 1645-1646.
- Leek, J. T., & Peng, R. D. (2015). Statistics: P values are just the tip of the iceberg. Nature, 520(7549), 612-612.
- Moonesinghe, R., Khoury, M. J., & Janssens, A. C. J. W. (2007). Most published research findings are false—But a little replication goes a long way.PLoS Med, 4(2), e28.
- Peng, R. D. (2009). Reproducible research and Biostatistics. Biostatistics, 10(3), 405-408.
- Peng, R. (2015). The reproducibility crisis in science: A statistical counterattack. Significance, 12(3), 30-32.
- Rödiger, S., Burdukiewicz, M., Blagodatskikh, K., Jahn, M. & Schierack, P. (2015) R as an environment for reproducible analysis of DNA amplification experiments. The R Journal, Vol. 7/1, 127-150.
- Timmermans, S., & Mauck, A. (2005). The promises and pitfalls of evidence-based medicine. Health Affairs, 24(1), 18-28.