Domain Level Interpretability: Interpreting Black-box Model with Domain-specific Embedding.

Ya-Lin Zhang,Caizhi Tang,Lu Yu,Jun Zhou,Longfei Li,Qing Cui,Fangfang Fan,Linbo Jiang,Xiaosong Zhao
DOI: https://doi.org/10.1145/3616855.3635688
2024-01-01
Abstract:The importance of incorporating interpretability into machine learning models has been increasingly emphasized. While previous literature has typically focused on feature level interpretability, such as analyzing which features are important and how they influence the final decision, real-world applications often require domain level interpretability, which relates to a group of features. Domain-level interpretability holds the potential for enhanced informativeness and comprehensibility. Unfortunately, there has been limited research in this direction. In this paper, we address this issue and introduce our proposed method DIDE, which obtains domain-level interpretability from domain-specific latent embeddings. To enhance the effectiveness of the framework, we draw inspiration from the gradient smooth philosophy and propose noisy injection in the embedding space, resulting in smoothed interpretability. We conduct extensive experiments to validate the effectiveness of DIDE, and demonstrate its applications in assisting daily business tasks in Alipay.
What problem does this paper attempt to address?