Abstract
Supplementary Material
Open Access and Permissions
Share and Cite
Article Metrics
Order Article Reprints
Abstract
Understanding the function learned by a neural network is crucial in many domains, e.g., to detect a model’s adaption to concept drift in online learning. Existing global surrogate model approaches generate explanations by maximizing the fidelity between the neural network and a surrogate model on a sample-basis, which can be very time-consuming. Therefore, these approaches are not applicable in scenarios where timely or frequent explanations are required. In this paper, we introduce a real-time approach for generating a symbolic representation of the function learned by a neural network. Our idea is to generate explanations via another neural network (called the Interpretation Network, or I-Net), which maps network parameters to a symbolic representation of the network function. We show that the training of an I-Net for a family of functions can be performed up-front and subsequent generation of an explanation only requires querying the I-Net once, which is computationally very efficient and does not require training data. We empirically evaluate our approach for the case of low-order polynomials as explanations, and show that it achieves competitive results for various data and function complexities. To the best of our knowledge, this is the first approach that attempts to learn mapping from neural networks to symbolic representations.
1. Introduction
The ability of artificial neural networks to act as general function approximators has led to impressive results in many application areas. However, the price for this universal applicability is the limited interpretability of the trained model. Overcoming this limitation is a subject of active research in the machine learning community [1]. Popular approaches for explaining the results of neural networks such as LIME [2], SHAP [3], or LRP [4] focus on the impact of different attributes on the predictions of the model for certain examples. While this provides a partial explanation for individual examples, it does not shed a light on the complete network function. Especially when dealing with streaming data, uncovering the network function is very important, e.g., for detecting the adjustment of a model to concept drift or for the identification of catastrophic forgetting.
While there are existing approaches for constructing a compact representation of the network function (such as symbolic metamodeling [5] and symbolic regression [6], for instance), they generate their explanations on a sample-basis. Generating explanations through maximizing the fidelity to the neural network on a sample-basis means that the optimization process for finding a suitable explanation must be performed independently for each model we want to interpret. Since this optimization process is usually very time-consuming, it precludes the application of this method in scenarios where timely explanations are required. Furthermore, they require access to the training data, or at least knowledge of its distribution [7]