Skip to content
Snippets Groups Projects
Commit a27f0571 authored by pat-alt's avatar pat-alt
Browse files

more progress

parent fa8f93fb
No related branches found
No related tags found
No related merge requests found
No preview for this file type
......@@ -123,7 +123,7 @@ Most state-of-the-art approaches to generating Counterfactual Explanations rely
where $\text{yloss}$ denotes the primary loss function already introduced above and $\text{cost}$ is either a single penalty or a collection of penalties that are used to impose constraints through regularization. Following the convention in \citet{altmeyer2023endogenous} we use $\mathbf{s}^\prime=\{ s_k\}_K$ to denote the vector $K$-dimensional array of counterfactual states. This is to explicitly account for the fact that we can generate multiple counterfactuals, as with DiCE \citep{mothilal2020explaining}, and may choose to traverse a latent representation $\mathcal{Z}$ of the feature space $\mathcal{X}$, as we will discuss further below.
Solutions to Equation~\ref{eq:general} are considered valid as soon as the predicted label matches the target label. A stripped-down counterfactual explanation is therefore little different from an adversarial example. In Figure~\ref{fig:adv}, for example, we have the baseline approach proposed in \citet{wachter2017counterfactual} to MNIST data (centre panel). This approach solves Equation~\ref{eq:general} through gradient-descent in the feature space with a penalty for the distance between the factual $x$ and the counterfactual $x^{\prime}$. The underlying classifier $M_{\theta}$ is a simple Multi-Layer Perceptron (MLP) with good test accuracy. For the generated counterfactual $x^{\prime}$ the model predicts the target label with high confidence (centre panel in Figure~\ref{fig:adv}). The explanation is valid by definition, even though it looks a lot like an Adversarial Example \citep{goodfellow2014explaining}. \citet{schut2021generating} make the connection between Adversarial Examples and Counterfactual Explanations explicit and propose using a Jacobian-Based Saliency Map Attack to solve Equation~\ref{eq:general}. They demonstrate that this approach yields realistic and sparse counterfactuals for Bayesian, adversarially robust classifiers. Applying their approach to our simple MNIST classifier does not yield a realistic counterfactual but this one, too, is valid (right panel in Figure~\ref{fig:adv}).
Solutions to Equation~\ref{eq:general} are considered valid as soon as the predicted label matches the target label. A stripped-down counterfactual explanation is therefore little different from an adversarial example. In Figure~\ref{fig:adv}, for example, we have the baseline approach proposed in \citet{wachter2017counterfactual} to MNIST data (centre panel). This approach solves Equation~\ref{eq:general} through gradient-descent in the feature space with a penalty for the distance between the factual $x$ and the counterfactual $x^{\prime}$. The underlying classifier $M_{\theta}$ is a simple Multi-Layer Perceptron (MLP) with good test accuracy. For the generated counterfactual $x^{\prime}$ the model predicts the target label with high confidence (centre panel in Figure~\ref{fig:adv}). The explanation is valid by definition, even though it looks a lot like an Adversarial Example \citep{goodfellow2014explaining}. \citet{schut2021generating} make the connection between Adversarial Examples and Counterfactual Explanations explicit and propose using a Jacobian-Based Saliency Map Attack (JSMA) to solve Equation~\ref{eq:general}. They demonstrate that this approach yields realistic and sparse counterfactuals for Bayesian, adversarially robust classifiers. Applying their approach to our simple MNIST classifier does not yield a realistic counterfactual but this one, too, is valid (right panel in Figure~\ref{fig:adv}).
The crucial difference between Adversarial Examples (AE) and Counterfactual Explanations is one of intent. While an AE is intended to go unnoticed, a CE should have certain desirable properties. The literature has made this explicit by introducing various so-called \textit{desiderata}. To properly serve both AI practitioners and individuals affected by AI decision-making systems, counterfactuals should be sparse, proximate~\citep{wachter2017counterfactual}, actionable~\citep{ustun2019actionable}, diverse~\citep{mothilal2020explaining}, plausible~\citep{joshi2019realistic,poyiadzi2020face,schut2021generating}, robust~\citep{upadhyay2021robust,pawelczyk2022probabilistically,altmeyer2023endogenous} and causal~\citep{karimi2021algorithmic} among other things.
......@@ -219,23 +219,25 @@ As noted by \citet{guidotti2022counterfactual}, these distance-based measures ar
Now that we have a framework for evaluating Counterfactual Explanations in terms of their plausibility and conformity, we are interested in finding a way to generate counterfactuals that are as plausible and conformal as possible. We hypothesize that a narrow focus on plausibility may come at the cost of reduced conformity. Using a surrogate model for the generative task, for example, may improve plausibility but inadvertently yield counterfactuals that are more consistent with the surrogate than the Black Box Model itself. We suggest that one way to ensure model conformity is to rely strictly on the model itself. In this section, we introduce a novel framework that meets this requirement, works under minimal assumptions and does not impede the plausibility objective: Conformal Counterfactual Explanations.
\subsection{Predictive Uncertainty Quantification for Counterfactual Explanations}
\subsection{Plausible Counterfactuals through Minimal Uncertainty}
Our proposed methodolgy is built off of the findings presented in ~\citet{schut2021generating}. The authors demonstrate that it is not only possible but remarkably easy to generate plausible counterfactuals for Black Box Models that provide predictive uncertainty estimates. By avoiding counterfactual paths that are associated with high predictive uncertainty, we end up generating counterfactuals for which the model $M_{\theta}$ predicts the target label $t$ with high confidence. Provided the model is well-calibrated, these counterfactuals are plausible which the authors demonstrate empirically through benchmarks.
Our proposed methodology is built on the findings presented in~\citet{schut2021generating}. The authors demonstrate that it is not only possible but remarkably easy to generate plausible counterfactuals for Black Box Models that provide predictive uncertainty estimates. Their proposed algorithm solves Equation~\ref{eq:general} by greedily applying JSMA in the feature space with standard cross-entropy loss and no penalty at all. They show that this is equivalent to minimizing predictive uncertainty and hence yields counterfactuals for which the model $M_{\theta}$ predicts the target label $t$ with high confidence. Provided the model is well-calibrated, these counterfactuals are plausible which the authors demonstrate empirically through benchmarks \citep{schut2021generating}.
\textbf{TBD}: maximizing predicted probability is equivalent to minimizing uncertainty ...
Unfortunately, this idea hinges on the crucial assumption that the Black Box Model provides predictive uncertainty estimates. The authors argue that in light of rapid advances in Bayesian Deep Learning (DL), this assumption is overall less costly than the engineering overhead induced by using surrogate models. This is even more true today, as recent work has put Laplace Approximation back on the map for truly effortless Bayesian DL \citep{immer2020improving,daxberger2021laplace,antoran2023sampling}. Nonetheless, the need for Bayesian methods may be too restrictive in some cases.
The approach proposed by \citet{schut2021generating} hinges on the crucial assumption that the Black Box Model provides predictive uncertainty estimates. The authors argue that in light of rapid advances in Bayesian Deep Learning (DL), this assumption is overall less costly than the engineering overhead induced by using surrogate models. This is even more true today, as recent work has put Laplace Approximation back on the map for truly effortless Bayesian DL \citep{immer2020improving,daxberger2021laplace,antoran2023sampling}. Nonetheless, the need for Bayesian methods may be too restrictive in some cases.
Fortunately, there is a promising alternative approach to predictive uncertainty quantification (UQ) that we will turn to next: Conformal Prediction.
In looking for ways to lift that restriction, we found a promising alternative candidate for predictive uncertainty quantification (UQ) that we will briefly introduce next: Conformal Prediction.
\subsection{Conformal Prediction}
Conformal Prediction (CP): a scalable and statistically rigorous Not only is CP statistically rigorous and works under minimal distributional assumptions, it is also model-agnostic and can be applied at test time. That last part is critical since it allows us to relax the assumption that the Black Box Model needs to learn to generate predictive uncertainty estimates during training. In other words, CP promises to provide a way to generate plausible counterfactuals for any model without the need for surrogate models.
Conformal Prediction (CP) is a scalable and statistically rigorous approach to predictive UQ that works under minimal distributional assumptions \citep{angelopoulos2021gentle}. It has recently gained popularity in the Machine Learning community \citep{angelopoulos2021gentle,manokhin2022awesome}. Crucially for our intended application, CP is model-agnostic and can be applied at test time. This allows us to relax the assumption that the Black Box Model needs to learn to generate predictive uncertainty estimates during training. In other words, CP promises to provide a way to generate plausible counterfactuals for any standard discriminative model without the need for surrogate models.
...
\subsection{Conformal Counterfactual Explanations}
The fact that conformal classifiers produce set-valued predictions introduces a challenge: it is not immediately obvious how to use such classifiers in the context of gradient-based counterfactual search. Put differently, it is not clear how to use set-valued outputs $M_{\theta}(x)$ in Equation~\ref{eq:general}. Fortunately, \citet{stutz2022learning} have recently proposed a framework for Conformal Training that also hinges on differentiability. To evaluate the performance of conformal classifiers during training, they introduce a custom loss function as well as a smooth set size penalty.
...
\section{Experiments}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment