<p>We consider a simple binary classification problem. Let <spanclass="math inline">\((X_i, Y_i), \ i=1,...,n\)</span> denote our feature-label pairs and let <spanclass="math inline">\(\mu: \mathcal{X} \mapsto \mathcal{Y}\)</span> denote the mapping from features to labels. For illustration purposes, we will use linearly separable data.</p>
<p>While we could use a linear classifier in this case, let’s pretend we need a black-box model for this task and rely on a small Multi-Layer Perceptron (MLP):</p>
<p>We can fit this model to data to produce plug-in predictions.</p>
<p>Here we will instead use a specific case of CP called <em>split conformal prediction</em> which can then be summarized as follows:<ahref="#fn1"class="footnote-ref"id="fnref1"role="doc-noteref"><sup>1</sup></a></p>
<oltype="1">
<li>Partition the training into a proper training set and a separate calibration set: <spanclass="math inline">\(\mathcal{D}_n=\mathcal{D}^{\text{train}} \cup \mathcal{D}^{\text{cali}}\)</span>.</li>
<li>Train the machine learning model on the proper training set: <spanclass="math inline">\(\hat\mu_{i \in \mathcal{D}^{\text{train}}}(X_i,Y_i)\)</span>.</li>
</ol>
<p>The model <spanclass="math inline">\(\hat\mu_{i \in \mathcal{D}^{\text{train}}}\)</span> can now produce plug-in predictions.</p>
<p>Note that this represents the starting point in applications of Algorithmic Recourse: we have some pre-trained classifier <spanclass="math inline">\(M\)</span> for which we would like to generate plausible Counterfactual Explanations. Next, we turn to the calibration step.</p>
</div>
</div>
<olstart="3"type="1">
<li>Compute nonconformity scores, <spanclass="math inline">\(\mathcal{S}\)</span>, using the calibration data <spanclass="math inline">\(\mathcal{D}^{\text{cali}}\)</span> and the fitted model <spanclass="math inline">\(\hat\mu_{i \in \mathcal{D}^{\text{train}}}\)</span>.</li>
<li>For a user-specified desired coverage ratio <spanclass="math inline">\((1-\alpha)\)</span> compute the corresponding quantile, <spanclass="math inline">\(\hat{q}\)</span>, of the empirical distribution of nonconformity scores, <spanclass="math inline">\(\mathcal{S}\)</span>.</li>
<li>For the given quantile and test sample <spanclass="math inline">\(X_{\text{test}}\)</span>, form the corresponding conformal prediction set:</li>
<p>This is the default procedure used for classification and regression in <ahref="https://github.com/pat-alt/ConformalPrediction.jl"><code>ConformalPrediction.jl</code></a>.</p>
<p>Using the package, we can apply Split Conformal Prediction as follows:</p>
<p>To be clear, all of the calibration steps (3 to 5) are post hoc, and yet none of them involved any changes to the model parameters. These are two important characteristics of Split Conformal Prediction (SCP) that make it particularly useful in the context of Algorithmic Recourse. Firstly, the fact that SCP involves posthoc calibration steps that happen after training, ensures that we need not place any restrictions on the black-box model itself. This stands in contrast to the approach proposed by <spanclass="citation"data-cites="schut2021generating">Schut et al. (<ahref="#ref-schut2021generating"role="doc-biblioref">2021</a>)</span> in which they essentially restrict the class of models to Bayesian models. Secondly, the fact that the model itself is kept entirely intact ensures that the generated counterfactuals maintain fidelity to the model. Finally, note that we also have not resorted to a surrogate model to learn more about <spanclass="math inline">\(X \sim \mathcal{X}\)</span>. Instead, we have used the fitted model itself and a calibration data set to learn about the model’s predictive uncertainty.</p>
<p>In order to use CP in the context of gradient-based counterfactual search, we need it to be differentiable. <spanclass="citation"data-cites="stutz2022learning">Stutz et al. (<ahref="#ref-stutz2022learning"role="doc-biblioref">2022</a>)</span> introduce a framework for training differentiable conformal predictors. They introduce a configurable loss function as well as smooth set size penalty.</p>
<h3data-number="1.3.1"><spanclass="header-section-number">1.3.1</span> Smooth Set Size Penalty</h3>
<p>Starting with the former, <spanclass="citation"data-cites="stutz2022learning">Stutz et al. (<ahref="#ref-stutz2022learning"role="doc-biblioref">2022</a>)</span> propose the following:</p>
<p>Here, <spanclass="math inline">\(C_{\theta,k}(x;\tau)\)</span> is loosely defined as the probability that class <spanclass="math inline">\(k\)</span> is assigned to the conformal prediction set <spanclass="math inline">\(C\)</span>. In the context of Conformal Training, this penalty reduces the <strong>inefficiency</strong> of the conformal predictor.</p>
<p>In our context, we are not interested in improving the model itself, but rather in producing <strong>plausible</strong> counterfactuals. Provided that our counterfactual <spanclass="math inline">\(x^\prime\)</span> is already inside the target domain (<spanclass="math inline">\(\mathbb{I}_{y^\prime = t}=1\)</span>), penalizing <spanclass="math inline">\(\Omega(C_{\theta}(x;\tau))\)</span> corresponds to guiding counterfactuals into regions of the target domain that are characterized by low ambiguity: for <spanclass="math inline">\(\kappa=1\)</span> the conformal prediction set includes only the target label <spanclass="math inline">\(t\)</span> as <spanclass="math inline">\(\Omega(C_{\theta}(x;\tau))\)</span>. Arguably, less ambiguous counterfactuals are more <strong>plausible</strong>. Since the search is guided purely by properties of the model itself and (exchangeable) calibration data, counterfactuals also maintain <strong>high fidelity</strong>.</p>
<p>The left panel of <ahref="#fig-losses">Figure <spanclass="quarto-unresolved-ref">fig-losses</span></a> shows the smooth size penalty in the two-dimensional feature space of our synthetic data.</p>
<p>The right panel of <ahref="#fig-losses">Figure <spanclass="quarto-unresolved-ref">fig-losses</span></a> shows the configurable classification loss in the two-dimensional feature space of our synthetic data.</p>
<h2data-number="1.4"><spanclass="header-section-number">1.4</span> Fidelity and Plausibility</h2>
<p>The main evaluation criteria we are interested in are <em>fidelity</em> and <em>plausibility</em>. Interestingly, we could also consider using these measures as penalties in the counterfactual search.</p>
<p><spanclass="theorem-title"><strong>Definition 1.1 (High-Fidelity Counterfactuals) </strong></span>Let <spanclass="math inline">\(\mathcal{X}_{\theta}|y = p_{\theta}(X|y)\)</span> denote the class-conditional distribution of <spanclass="math inline">\(X\)</span> defined by <spanclass="math inline">\(\theta\)</span>. Then for <spanclass="math inline">\(x^{\prime}\)</span> to be considered a high-fidelity counterfactual, we need: <spanclass="math inline">\(\mathcal{X}_{\theta}|t \approxeq \mathcal{X}^{\prime}\)</span> where <spanclass="math inline">\(t\)</span> denotes the target outcome.</p>
</div>
<p>We can generate samples from <spanclass="math inline">\(p_{\theta}(X|y)\)</span> following <spanclass="citation"data-cites="grathwohl2020your">Grathwohl et al. (<ahref="#ref-grathwohl2020your"role="doc-biblioref">2020</a>)</span>. In <ahref="#fig-energy">Figure <spanclass="quarto-unresolved-ref">fig-energy</span></a>, I have applied the methodology to our synthetic data.</p>
<p>As an evaluation metric and penalty, we could use the average distance of the counterfactual <spanclass="math inline">\(x^{\prime}\)</span> from these generated samples, for example.</p>
<p>We propose to define plausibility as follows:</p>
<divid="def-plausible"class="theorem definition">
<p><spanclass="theorem-title"><strong>Definition 1.2 (Plausible Counterfactuals) </strong></span>Formally, let <spanclass="math inline">\(\mathcal{X}|t\)</span> denote the conditional distribution of samples in the target class. As before, we have <spanclass="math inline">\(x^{\prime}\sim\mathcal{X}^{\prime}\)</span>, then for <spanclass="math inline">\(x^{\prime}\)</span> to be considered a plausible counterfactual, we need: <spanclass="math inline">\(\mathcal{X}|t \approxeq \mathcal{X}^{\prime}\)</span>.</p>
</div>
<p>As an evaluation metric and penalty, we could use the average distance of the counterfactual <spanclass="math inline">\(x^{\prime}\)</span> from (potentially bootstrapped) training samples in the target class, for example.</p>
<p>Next, let’s generate counterfactual explanations for our synthetic data. We first wrap our model in a container that makes it compatible with <code>CounterfactualExplanations.jl</code>. Then we draw a random sample, determine its predicted label <spanclass="math inline">\(\hat{y}\)</span> and choose the opposite label as our target.</p>
<p>The generic Conformal Counterfactual Generator penalises the only the set size only:</p>
Grathwohl, Will, Kuan-Chieh Wang, Joern-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, and Kevin Swersky. 2020. <span>“Your Classifier Is Secretly an Energy Based Model and You Should Treat It Like One.”</span> In. <ahref="https://openreview.net/forum?id=Hkxzx0NtDB">https://openreview.net/forum?id=Hkxzx0NtDB</a>.
Schut, Lisa, Oscar Key, Rory Mc Grath, Luca Costabello, Bogdan Sacaleanu, Yarin Gal, et al. 2021. <span>“Generating <span>Interpretable Counterfactual Explanations By Implicit Minimisation</span> of <span>Epistemic</span> and <span>Aleatoric Uncertainties</span>.”</span> In <em>International <span>Conference</span> on <span>Artificial Intelligence</span> and <span>Statistics</span></em>, 1756–64. <span>PMLR</span>.
Stutz, David, Krishnamurthy Dj Dvijotham, Ali Taylan Cemgil, and Arnaud Doucet. 2022. <span>“Learning <span>Optimal</span><span>Conformal</span><span>Classifiers</span>.”</span> In. <ahref="https://openreview.net/forum?id=t8O-4LKFVx">https://openreview.net/forum?id=t8O-4LKFVx</a>.
<liid="fn1"><p>In other places split conformal prediction is sometimes referred to as <em>inductive</em> conformal prediction.<ahref="#fnref1"class="footnote-back"role="doc-backlink">↩︎</a></p></li>
</ol>
</section>
</main><!-- /main -->
<script id = "quarto-html-after-body"type="application/javascript">
"title: High-Fidelity Counterfactual Explanations through Conformal Prediction\n",
"subtitle: Research Proposal\n",
"abstract: |\n",
" We propose Conformal Counterfactual Explanations: an effortless and rigorous way to produce realistic and faithful Counterfactual Explanations using Conformal Prediction. To address the need for realistic counterfactuals, existing work has primarily relied on separate generative models to learn the data-generating process. While this is an effective way to produce plausible and model-agnostic counterfactual explanations, it not only introduces a significant engineering overhead but also reallocates the task of creating realistic model explanations from the model itself to the generative model. Recent work has shown that there is no need for any of this when working with probabilistic models that explicitly quantify their own uncertainty. Unfortunately, most models used in practice still do not fulfil that basic requirement, in which case we would like to have a way to quantify predictive uncertainty in a post-hoc fashion.\n",
"---"
],
"id": "0675cd89"
},
{
"cell_type": "code",
"metadata": {},
"source": [
"include(\"notebooks/setup.jl\")\n",
"eval(setup_notebooks)"
],
"id": "8a310cd5",
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Motivation\n",
"\n",
"Counterfactual Explanations are a powerful, flexible and intuitive way to not only explain black-box models but also enable affected individuals to challenge them through the means of Algorithmic Recourse. \n",
"\n",
"### Counterfactual Explanations or Adversarial Examples?\n",
"\n",
"Most state-of-the-art approaches to generating Counterfactual Explanations (CE) rely on gradient descent in the feature space. The key idea is to perturb inputs $x\\in\\mathcal{X}$ into a black-box model $f: \\mathcal{X} \\mapsto \\mathcal{Y}$ in order to change the model output $f(x)$ to some pre-specified target value $t\\in\\mathcal{Y}$. Formally, this boils down to defining some loss function $\\ell(f(x),t)$ and taking gradient steps in the minimizing direction. The so-generated counterfactuals are considered valid as soon as the predicted label matches the target label. A stripped-down counterfactual explanation is therefore little different from an adversarial example. In @fig-adv, for example, generic counterfactual search as in @wachter2017counterfactual has been applied to MNIST data.\n"
],
"id": "ea1eb4c1"
},
{
"cell_type": "code",
"metadata": {},
"source": [
"# Data:\n",
"counterfactual_data = load_mnist()\n",
"X, y = CounterfactualExplanations.DataPreprocessing.unpack_data(counterfactual_data)\n",
"{#fig-adv}\n",
"\n",
"The crucial difference between adversarial examples and counterfactuals is one of intent. While adversarial examples are typically intended to go unnoticed, counterfactuals in the context of Explainable AI are generally sought to be \"plausible\", \"realistic\" or \"feasible\". To fulfil this latter goal, researchers have come up with a myriad of ways. @joshi2019realistic were among the first to suggest that instead of searching counterfactuals in the feature space, we can instead traverse a latent embedding learned by a surrogate generative model. Similarly, @poyiadzi2020face use density ... Finally, @karimi2021algorithmic argues that counterfactuals should comply with the causal model that generates them [CHECK IF WE CAN PHASE THIS LIKE THIS]. Other related approaches include ... All of these different approaches have a common goal: they aim to ensure that the generated counterfactuals comply with the (learned) data-generating process (DGB). \n",
"\n",
"::: {#def-plausible}\n",
"\n",
"## Plausible Counterfactuals\n",
"\n",
"Formally, if $x \\sim \\mathcal{X}$ and for the corresponding counterfactual we have $x^{\\prime}\\sim\\mathcal{X}^{\\prime}$, then for $x^{\\prime}$ to be considered a plausible counterfactual, we need: $\\mathcal{X} \\approxeq \\mathcal{X}^{\\prime}$.\n",
"\n",
":::\n",
"\n",
"In the context of Algorithmic Recourse, it makes sense to strive for plausible counterfactuals, since anything else would essentially require individuals to move to out-of-distribution states. But it is worth noting that our ambition to meet this goal, may have implications on our ability to faithfully explain the behaviour of the underlying black-box model (arguably our principal goal). By essentially decoupling the task of learning plausible representations of the data from the model itself, we open ourselves up to vulnerabilities. Using a separate generative model to learn $\\mathcal{X}$, for example, has very serious implications for the generated counterfactuals. @fig-latent compares the results of applying REVISE [@joshi2019realistic] to MNIST data using two different Variational Auto-Encoders: while the counterfactual generated using an expressive (strong) VAE is compelling, the result relying on a less expressive (weak) VAE is not even valid. In this latter case, the decoder step of the VAE fails to yield values in $\\mathcal{X}$ and hence the counterfactual search in the learned latent space is doomed. \n",
"\n",
"{#fig-latent}\n",
"\n",
"> Here it would be nice to have another example where we poison the data going into the generative model to hide biases present in the data (e.g. Boston housing).\n",
"\n",
"- Latent can be manipulated: \n",
" - train biased model\n",
" - train VAE with biased variable removed/attacked (use Boston housing dataset)\n",
" - hypothesis: will generate bias-free explanations\n",
"\n",
"### From Plausible to High-Fidelity Counterfactuals {#sec-fidelity}\n",
"\n",
"In light of the findings, we propose to generally avoid using surrogate models to learn $\\mathcal{X}$ in the context of Counterfactual Explanations.\n",
"\n",
"::: {#prp-surrogate}\n",
"\n",
"## Avoid Surrogates\n",
"\n",
"Since we are in the business of explaining a black-box model, the task of learning realistic representations of the data should not be reallocated from the model itself to some surrogate model.\n",
"\n",
":::\n",
"\n",
"In cases where the use of surrogate models cannot be avoided, we propose to weigh the plausibility of counterfactuals against their fidelity to the black-box model. In the context of Explainable AI, fidelity is defined as describing how an explanation approximates the prediction of the black-box model [@molnar2020interpretable]. Fidelity has become the default metric for evaluating Local Model-Agnostic Models, since they often involve local surrogate models whose predictions need not always match those of the black-box model. \n",
"\n",
"In the case of Counterfactual Explanations, the concept of fidelity has so far been ignored. This is not altogether surprising, since by construction and design, Counterfactual Explanations work with the predictions of the black-box model directly: as stated above, a counterfactual $x^{\\prime}$ is considered valid if and only if $f(x^{\\prime})=t$, where $t$ denote some target outcome. \n",
"\n",
"Does fidelity even make sense in the context of CE, and if so, how can we define it? In light of the examples in the previous section, we think it is urgent to introduce a notion of fidelity in this context, that relates to the distributional properties of the generated counterfactuals. In particular, we propose that a high-fidelity counterfactual $x^{\\prime}$ complies with the class-conditional distribution $\\mathcal{X}_{\\theta} = p_{\\theta}(X|y)$ where $\\theta$ denote the black-box model parameters. \n",
"\n",
"::: {#def-fidele}\n",
"\n",
"## High-Fidelity Counterfactuals\n",
"\n",
"Let $\\mathcal{X}_{\\theta}|y = p_{\\theta}(X|y)$ denote the class-conditional distribution of $X$ defined by $\\theta$. Then for $x^{\\prime}$ to be considered a high-fidelity counterfactual, we need: $\\mathcal{X}_{\\theta}|t \\approxeq \\mathcal{X}^{\\prime}$ where $t$ denotes the target outcome.\n",
"\n",
":::\n",
"\n",
"In order to assess the fidelity of counterfactuals, we propose the following two-step procedure:\n",
"\n",
"1) Generate samples $X_{\\theta}|y$ and $X^{\\prime}$ from $\\mathcal{X}_{\\theta}|t$ and $\\mathcal{X}^{\\prime}$, respectively.\n",
"2) Compute the Maximum Mean Discrepancy (MMD) between $X_{\\theta}|y$ and $X^{\\prime}$. \n",
"\n",
"If the computed value is different from zero, we can reject the null-hypothesis of fidelity.\n",
"\n",
"> Two challenges here: 1) implementing the sampling procedure in @grathwohl2020your; 2) it is unclear if MMD is really the right way to measure this. \n",
"\n",
"## Conformal Counterfactual Explanations\n",
"\n",
"In @sec-fidelity, we have advocated for avoiding surrogate models in the context of Counterfactual Explanations. In this section, we introduce an alternative way to generate high-fidelity Counterfactual Explanations. In particular, we propose Conformal Counterfactual Explanations (CCE), that is Counterfactual Explanations that minimize the predictive uncertainty of conformal models. \n",
"\n",
"### Minimizing Predictive Uncertainty\n",
"\n",
"@schut2021generating demonstrated that the goal of generating realistic (plausible) counterfactuals can also be achieved by seeking counterfactuals that minimize the predictive uncertainty of the underlying black-box model. Similarly, @antoran2020getting ...\n",
"- Distribution-free, model-agnostic and scalable approach to predictive uncertainty quantification.\n",
"- Conformal prediction is instance-based. So is CE. \n",
"- Take any fitted model and turn it into a conformal model using calibration data.\n",
"- Our approach, therefore, relaxes the restriction on the family of black-box models, at the cost of relying on a subset of the data. Arguably, data is often abundant and in most applications practitioners tend to hold out a test data set anyway. \n",
"\n",
"> Does the coverage guarantee carry over to counterfactuals?\n",
"\n",
"### Generating Conformal Counterfactuals\n",
"\n",
"While Conformal Prediction has recently grown in popularity, it does introduce a challenge in the context of classification: the predictions of Conformal Classifiers are set-valued and therefore difficult to work with, since they are, for example, non-differentiable. Fortunately, @stutz2022learning introduced carefully designed differentiable loss functions that make it possible to evaluate the performance of conformal predictions in training. We can leverage these recent advances in the context of gradient-based counterfactual search ...\n",
"\n",
"> Challenge: still need to implement these loss functions. \n",
"\n",
"## Experiments\n",
"\n",
"### Research Questions\n",
"\n",
"- Is CP alone enough to ensure realistic counterfactuals?\n",
"- Do counterfactuals improve further as the models get better?\n",
"- Do counterfactuals get more realistic as coverage\n",
"- What happens as we vary coverage and setsize?\n",
"- What happens as we improve the model robustness?\n",
"- What happens as we improve the model's ability to incorporate predictive uncertainty (deep ensemble, laplace)?\n",
"- What happens if we combine with DiCE, ClaPROAR, Gravitational?\n",
"- What about CE robustness to endogenous shifts [@altmeyer2023endogenous]?\n",
"\n",
"- Benchmarking:\n",
" - add PROBE [@pawelczyk2022probabilistically] into the mix.\n",
" - compare travel costs to domain shits.\n",
"\n",
"> Nice to have: What about using Laplace Approximation, then Conformal Prediction? What about using Conformalised Laplace? \n",
"\n",
"## References\n"
],
"id": "9f0a2e10"
}
],
"metadata": {
"kernelspec": {
"name": "julia-1.6",
"language": "julia",
"display_name": "Julia 1.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
\ No newline at end of file
%% Cell type:raw id:0675cd89 tags:
---
title: High-Fidelity Counterfactual Explanations through Conformal Prediction
subtitle: Research Proposal
abstract: |
We propose Conformal Counterfactual Explanations: an effortless and rigorous way to produce realistic and faithful Counterfactual Explanations using Conformal Prediction. To address the need for realistic counterfactuals, existing work has primarily relied on separate generative models to learn the data-generating process. While this is an effective way to produce plausible and model-agnostic counterfactual explanations, it not only introduces a significant engineering overhead but also reallocates the task of creating realistic model explanations from the model itself to the generative model. Recent work has shown that there is no need for any of this when working with probabilistic models that explicitly quantify their own uncertainty. Unfortunately, most models used in practice still do not fulfil that basic requirement, in which case we would like to have a way to quantify predictive uncertainty in a post-hoc fashion.
---
%% Cell type:code id:8a310cd5 tags:
``` julia
include("notebooks/setup.jl")
eval(setup_notebooks)
```
%% Cell type:markdown id:ea1eb4c1 tags:
## Motivation
Counterfactual Explanations are a powerful, flexible and intuitive way to not only explain black-box models but also enable affected individuals to challenge them through the means of Algorithmic Recourse.
### Counterfactual Explanations or Adversarial Examples?
Most state-of-the-art approaches to generating Counterfactual Explanations (CE) rely on gradient descent in the feature space. The key idea is to perturb inputs $x\in\mathcal{X}$ into a black-box model $f: \mathcal{X} \mapsto \mathcal{Y}$ in order to change the model output $f(x)$ to some pre-specified target value $t\in\mathcal{Y}$. Formally, this boils down to defining some loss function $\ell(f(x),t)$ and taking gradient steps in the minimizing direction. The so-generated counterfactuals are considered valid as soon as the predicted label matches the target label. A stripped-down counterfactual explanation is therefore little different from an adversarial example. In @fig-adv, for example, generic counterfactual search as in @wachter2017counterfactual has been applied to MNIST data.
{#fig-adv}
The crucial difference between adversarial examples and counterfactuals is one of intent. While adversarial examples are typically intended to go unnoticed, counterfactuals in the context of Explainable AI are generally sought to be "plausible", "realistic" or "feasible". To fulfil this latter goal, researchers have come up with a myriad of ways. @joshi2019realistic were among the first to suggest that instead of searching counterfactuals in the feature space, we can instead traverse a latent embedding learned by a surrogate generative model. Similarly, @poyiadzi2020face use density ... Finally, @karimi2021algorithmic argues that counterfactuals should comply with the causal model that generates them [CHECK IF WE CAN PHASE THIS LIKE THIS]. Other related approaches include ... All of these different approaches have a common goal: they aim to ensure that the generated counterfactuals comply with the (learned) data-generating process (DGB).
::: {#def-plausible}
## Plausible Counterfactuals
Formally, if $x \sim \mathcal{X}$ and for the corresponding counterfactual we have $x^{\prime}\sim\mathcal{X}^{\prime}$, then for $x^{\prime}$ to be considered a plausible counterfactual, we need: $\mathcal{X} \approxeq \mathcal{X}^{\prime}$.
:::
In the context of Algorithmic Recourse, it makes sense to strive for plausible counterfactuals, since anything else would essentially require individuals to move to out-of-distribution states. But it is worth noting that our ambition to meet this goal, may have implications on our ability to faithfully explain the behaviour of the underlying black-box model (arguably our principal goal). By essentially decoupling the task of learning plausible representations of the data from the model itself, we open ourselves up to vulnerabilities. Using a separate generative model to learn $\mathcal{X}$, for example, has very serious implications for the generated counterfactuals. @fig-latent compares the results of applying REVISE [@joshi2019realistic] to MNIST data using two different Variational Auto-Encoders: while the counterfactual generated using an expressive (strong) VAE is compelling, the result relying on a less expressive (weak) VAE is not even valid. In this latter case, the decoder step of the VAE fails to yield values in $\mathcal{X}$ and hence the counterfactual search in the learned latent space is doomed.
{#fig-latent}
> Here it would be nice to have another example where we poison the data going into the generative model to hide biases present in the data (e.g. Boston housing).
- Latent can be manipulated:
- train biased model
- train VAE with biased variable removed/attacked (use Boston housing dataset)
- hypothesis: will generate bias-free explanations
### From Plausible to High-Fidelity Counterfactuals {#sec-fidelity}
In light of the findings, we propose to generally avoid using surrogate models to learn $\mathcal{X}$ in the context of Counterfactual Explanations.
::: {#prp-surrogate}
## Avoid Surrogates
Since we are in the business of explaining a black-box model, the task of learning realistic representations of the data should not be reallocated from the model itself to some surrogate model.
:::
In cases where the use of surrogate models cannot be avoided, we propose to weigh the plausibility of counterfactuals against their fidelity to the black-box model. In the context of Explainable AI, fidelity is defined as describing how an explanation approximates the prediction of the black-box model [@molnar2020interpretable]. Fidelity has become the default metric for evaluating Local Model-Agnostic Models, since they often involve local surrogate models whose predictions need not always match those of the black-box model.
In the case of Counterfactual Explanations, the concept of fidelity has so far been ignored. This is not altogether surprising, since by construction and design, Counterfactual Explanations work with the predictions of the black-box model directly: as stated above, a counterfactual $x^{\prime}$ is considered valid if and only if $f(x^{\prime})=t$, where $t$ denote some target outcome.
Does fidelity even make sense in the context of CE, and if so, how can we define it? In light of the examples in the previous section, we think it is urgent to introduce a notion of fidelity in this context, that relates to the distributional properties of the generated counterfactuals. In particular, we propose that a high-fidelity counterfactual $x^{\prime}$ complies with the class-conditional distribution $\mathcal{X}_{\theta} = p_{\theta}(X|y)$ where $\theta$ denote the black-box model parameters.
::: {#def-fidele}
## High-Fidelity Counterfactuals
Let $\mathcal{X}_{\theta}|y = p_{\theta}(X|y)$ denote the class-conditional distribution of $X$ defined by $\theta$. Then for $x^{\prime}$ to be considered a high-fidelity counterfactual, we need: $\mathcal{X}_{\theta}|t \approxeq \mathcal{X}^{\prime}$ where $t$ denotes the target outcome.
:::
In order to assess the fidelity of counterfactuals, we propose the following two-step procedure:
1) Generate samples $X_{\theta}|y$ and $X^{\prime}$ from $\mathcal{X}_{\theta}|t$ and $\mathcal{X}^{\prime}$, respectively.
2) Compute the Maximum Mean Discrepancy (MMD) between $X_{\theta}|y$ and $X^{\prime}$.
If the computed value is different from zero, we can reject the null-hypothesis of fidelity.
> Two challenges here: 1) implementing the sampling procedure in @grathwohl2020your; 2) it is unclear if MMD is really the right way to measure this.
## Conformal Counterfactual Explanations
In @sec-fidelity, we have advocated for avoiding surrogate models in the context of Counterfactual Explanations. In this section, we introduce an alternative way to generate high-fidelity Counterfactual Explanations. In particular, we propose Conformal Counterfactual Explanations (CCE), that is Counterfactual Explanations that minimize the predictive uncertainty of conformal models.
### Minimizing Predictive Uncertainty
@schut2021generating demonstrated that the goal of generating realistic (plausible) counterfactuals can also be achieved by seeking counterfactuals that minimize the predictive uncertainty of the underlying black-box model. Similarly, @antoran2020getting ...
- Problem: restricted to Bayesian models.
- Solution: post-hoc predictive uncertainty quantification. In particular, Conformal Prediction.
### Background on Conformal Prediction
- Distribution-free, model-agnostic and scalable approach to predictive uncertainty quantification.
- Conformal prediction is instance-based. So is CE.
- Take any fitted model and turn it into a conformal model using calibration data.
- Our approach, therefore, relaxes the restriction on the family of black-box models, at the cost of relying on a subset of the data. Arguably, data is often abundant and in most applications practitioners tend to hold out a test data set anyway.
> Does the coverage guarantee carry over to counterfactuals?
### Generating Conformal Counterfactuals
While Conformal Prediction has recently grown in popularity, it does introduce a challenge in the context of classification: the predictions of Conformal Classifiers are set-valued and therefore difficult to work with, since they are, for example, non-differentiable. Fortunately, @stutz2022learning introduced carefully designed differentiable loss functions that make it possible to evaluate the performance of conformal predictions in training. We can leverage these recent advances in the context of gradient-based counterfactual search ...
> Challenge: still need to implement these loss functions.
## Experiments
### Research Questions
- Is CP alone enough to ensure realistic counterfactuals?
- Do counterfactuals improve further as the models get better?
- Do counterfactuals get more realistic as coverage
- What happens as we vary coverage and setsize?
- What happens as we improve the model robustness?
- What happens as we improve the model's ability to incorporate predictive uncertainty (deep ensemble, laplace)?
- What happens if we combine with DiCE, ClaPROAR, Gravitational?
- What about CE robustness to endogenous shifts [@altmeyer2023endogenous]?
- Benchmarking:
- add PROBE [@pawelczyk2022probabilistically] into the mix.
- compare travel costs to domain shits.
> Nice to have: What about using Laplace Approximation, then Conformal Prediction? What about using Conformalised Laplace?