Appendix - Recommended Statistical Significance Tests for NLP Tasks

Rotem Dror,Roi Reichart
DOI: https://doi.org/10.48550/arXiv.1809.01448
2018-09-05
Abstract:Statistical significance testing plays an important role when drawing conclusions from experimental results in NLP papers. Particularly, it is a valuable tool when one would like to establish the superiority of one algorithm over another. This appendix complements the guide for testing statistical significance in NLP presented in \cite{dror2018hitchhiker} by proposing valid statistical tests for the common tasks and evaluation measures in the field.
Computation and Language
What problem does this paper attempt to address?