Understanding adversarial examples requires a theory of artefacts for deep learning

Cameron Buckner
DOI: https://doi.org/10.1038/s42256-020-00266-y
IF: 23.8
2020-11-23
Nature Machine Intelligence
Abstract:Deep neural networks are currently the most widespread and successful technology in artificial intelligence. However, these systems exhibit bewildering new vulnerabilities: most notably a susceptibility to adversarial examples. Here, I review recent empirical research on adversarial examples that suggests that deep neural networks may be detecting in them features that are predictively useful, though inscrutable to humans. To understand the implications of this research, we should contend with some older philosophical puzzles about scientific reasoning, helping us to determine whether these features are reliable targets of scientific investigation or just the distinctive processing artefacts of deep neural networks.
computer science, artificial intelligence, interdisciplinary applications
What problem does this paper attempt to address?