Abstract:Humans and animals exhibit a range of interesting behaviors in dynamic environments, and it is unclear how our brains actively reformat this dense sensory information to enable these behaviors. Experimental neuroscience is undergoing a revolution in its ability to record and manipulate hundreds to thousands of neurons while an animal is performing a complex behavior. As these paradigms enable unprecedented access to the brain, a natural question that arises is how to distill these data into interpretable insights about how neural circuits give rise to intelligent behaviors. The classical approach in systems neuroscience has been to ascribe well-defined operations to individual neurons and provide a description of how these operations combine to produce a circuit-level theory of neural computations. While this approach has had some success for small-scale recordings with simple stimuli, designed to probe a particular circuit computation, often times these ultimately lead to disparate descriptions of the same system across stimuli. Perhaps more strikingly, many response profiles of neurons are difficult to succinctly describe in words, suggesting that new approaches are needed in light of these experimental observations. In this thesis, we offer a different definition of interpretability that we show has promise in yielding unified structural and functional models of neural circuits, and describes the evolutionary constraints that give rise to the response properties of the neural population, including those that have previously been difficult to describe individually. We demonstrate the utility of this framework across multiple brain areas and species to study the roles of recurrent processing in the primate ventral visual pathway; mouse visual processing; heterogeneity in rodent medial entorhinal cortex; and facilitating biological learning.

Under the Hood of Neural Networks: Characterizing Learned Representations by Functional Neuron Populations and Network Ablations

Which Neural Network Makes More Explainable Decisions? an Approach Towards Measuring Explainability

Understanding Neural Networks through Representation Erasure.

How Do You Act? An Empirical Study to Understand Behavior of Deep Reinforcement Learning Agents

Ablation Studies in Artificial Neural Networks

Neural Networks Decoded: Targeted and Robust Analysis of Neural Network Decisions via Causal Explanations and Reasoning

Neural network interpretability with layer-wise relevance propagation: novel techniques for neuron selection and visualization

Unveiling the intrinsic dynamics of biological and artificial neural networks: from criticality to optimal representations

Functional Network: A Novel Framework for Interpretability of Deep Neural Networks

Investigating Neuron Ablation in Attention Heads: The Case for Peak Activation Centering

Saliency Suppressed, Semantics Surfaced: Visual Transformations in Neural Networks and the Brain

Neural Networks from Biological to Artificial and Vice Versa

On Functional Activations in Deep Neural Networks

Interpret Neural Networks by Extracting Critical Subnetworks

Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience

A Goal-Driven Approach to Systems Neuroscience

Identifying Interpretable Visual Features in Artificial and Biological Neural Systems

From deep learning to mechanistic understanding in neuroscience: the structure of retinal prediction

Intriguing properties of neural networks

Library network, a possible path to explainable neural networks