Title: ECCCos from the Black Box: Faithful Explanations through Energy-Constrained Conformal Counterfactuals
Keywords: Explainable AI, Counterfactual Explanations, Algorithmic Recourse, Energy-Based Models, Conformal Prediction
Abstract: (see paper)
Corresponding Author: p.altmeyer@tudelft.nl
Revier Nomination: Arie.vanDeursen@tudelft.nl
Primary Area: Interpretability and Explainability
Claims: Yes
Code of Ethics: Yes
Broader Impacts: A narrow focus on generating plausible counterfactuals may lead practitioners and researchers to believe that even a highly vulnerable black-box model has learned plausible data representations. Our work aims to mitigate this.
Limitations: Yes
Theory: While we do not include any theoretical results in terms of formal proofs, we have approached the topic of Counterfactual Explanations from a new theoretical angle in this work. Where necessary we have clearly stated our assumptions.
Experiments: Yes
Training Details: Yes
Error Bars: Yes
Compute: All of our experiments could be run locally on a personal machine. We will provide details regarding training times and compute in the supplementary material.
Reproducibility: Yes
Safeguards: n/a
Licenses: Yes
Assets: Yes
Human Subjects: n/a
IRB Approvals: n/a
TLDR: We leverage ideas from energy-based modelling and conformal prediction to generate faithful Counterfactual Explanations that can distinguish trustworthy from unreliable models.