" We propose Conformal Counterfactual Explanations: an effortless and rigorous way to produce realistic and faithful Counterfactual Explanations using Conformal Prediction. To address the need for realistic counterfactuals, existing work has primarily relied on separate generative models to learn the data generating process. While this an effective way to produce plausible and model-agnostic counterfactual explanations, it not only introduces an significant engineering overhead, but also reallocates the task of creating realistic model explanations from the model itsel to the generative model. Recent work has shown that there is no need for any of this when working with probabilistic models that explicitly quantify their own uncertainty. Unfortunately, most models used in practice still do not fulfil that basic requirement, in which case we would like to have a way to quantify predictive uncertainty in a post-hoc fashion.\n",
"Counterfactual Explanations are a powerful, flexible and intuitive way to not only explain black-box models, but also enable affected individuals to challenge them though the means of Algorithmic Recourse. \n",
"\n",
"### From Adversarial Examples to Counterfactual Explanations\n",
"\n",
"Most state-of-the-art approaches to generating Counterfactual Explanations (CE) rely on gradient descent in the feature space. The key idea is to perturb inputs $x\\in\\mathcal{X}$ into a black-box model $f: \\mathcal{X} \\mapsto \\mathcal{Y}$ in order to change the model output $f(x)$ to some pre-specified target value $t\\in\\mathcal{Y}$. Formally, this boils down to defining some loss function $\\ell(f(x),t)$ and taking gradient steps in the minimizing direction. The so generated counterfactuals are considered valid as soon as the predicted label matches the target label. A stripped down counterfactual explanation is therefore little different from an adversarial example.\n",
"\n",
"> You may not like it, but this is what counterfactuals look like\n"
],
"id": "17241786"
},
{
"cell_type": "code",
"metadata": {},
"source": [
"# Data:\n",
"counterfactual_data = load_mnist()\n",
"X, y = CounterfactualExplanations.DataPreprocessing.unpack_data(counterfactual_data)\n",
"The crucial difference between adversarial examples and counterfactuals is one of intent. While adversarial examples are typically intened to go unnoticed, counterfactuals in the context of Explainable AI are generally sought to be \"plausible\" or \"realistic\". To fulfill this latter goal, researchers have come up with a myriad of ways. @joshi2019realistic were among the first to suggest that instead of searching counterfactuals in the feature space, we can instead traverse a latent embedding learned by a surrogate generative model. This ensures that the generated counterfactuals comply with the (learned) data-generating process (DGB). Similarly, @poyiadzi2020face use density ...\n",
"\n",
"- Show DiCE for weak MLP\n",
"- Show Latent for same weak MLP\n",
"- Latent can be manipulated: \n",
" - train biased model\n",
" - train VAE with biased variable removed/attacked (use Boston housing dataset)\n",
" - hypothesis: will generate bias-free explanations\n",
"\n",
"::: {#prp-surrogate}\n",
"\n",
"## Avoid Surrogates\n",
"\n",
"Since we are in the business of explaining a black-box model, the task of learning realistic representations of the data should not be reallocated from the model itself to some surrogate model.\n",
"\n",
":::\n",
"\n",
"## Introduction to Conformal Prediction\n",
"\n",
"- distribution-free, model-agnostic and scalable approach to predictive uncertainty quantification\n",
"\n",
"### Post-hoc\n",
"\n",
"- Take any fitted model and turn it into a conformal model using calibration data.\n",
"\n",
"### Intrinsic --- Conformal Training [MAYBE]\n",
"\n",
"- Model explicitly trained for conformal prediction.\n",
"\n",
"## Conformal Counterfactuals\n",
"\n",
"- Realistic counterfactuals by minimizing predictive uncertainty [@schut2021generating].\n",
"- Conformal prediction is instance-based. So is CE. \n",
"- Does the coverage guarantee carry over to counterfactuals?\n",
"\n",
"### Research Questions\n",
"\n",
"- Is CP alone enough to ensure realistic counterfactuals?\n",
"- Do counterfactuals improve further as the models get better?\n",
"- Do counterfactuals get more realistic as coverage\n",
"- What happens as we vary coverage and setsize?\n",
"- What happens as we improve the model robustness?\n",
"- What happens as we improve the model's ability to incorporate predictive uncertainty (deep ensemble, laplace)?\n",
"\n",
"## Experiments\n",
"\n",
"- Maybe: conformalised Laplace\n",
"- Benchmarking:\n",
" - add PROBE into the mix\n",
" - compare travel costs to domain shits.\n",
"\n",
"## References\n"
],
"id": "8abba11d"
}
],
"metadata": {
"kernelspec": {
"name": "julia-(4-threads)-1.8",
"language": "julia",
"display_name": "Julia (4 threads) 1.8.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
\ No newline at end of file
%% Cell type:raw id:74a21809 tags:
---
title: Conformal Counterfactual Explanations
subtitle: Research Proposal
abstract: |
We propose Conformal Counterfactual Explanations: an effortless and rigorous way to produce realistic and faithful Counterfactual Explanations using Conformal Prediction. To address the need for realistic counterfactuals, existing work has primarily relied on separate generative models to learn the data generating process. While this an effective way to produce plausible and model-agnostic counterfactual explanations, it not only introduces an significant engineering overhead, but also reallocates the task of creating realistic model explanations from the model itsel to the generative model. Recent work has shown that there is no need for any of this when working with probabilistic models that explicitly quantify their own uncertainty. Unfortunately, most models used in practice still do not fulfil that basic requirement, in which case we would like to have a way to quantify predictive uncertainty in a post-hoc fashion.
Counterfactual Explanations are a powerful, flexible and intuitive way to not only explain black-box models, but also enable affected individuals to challenge them though the means of Algorithmic Recourse.
### From Adversarial Examples to Counterfactual Explanations
Most state-of-the-art approaches to generating Counterfactual Explanations (CE) rely on gradient descent in the feature space. The key idea is to perturb inputs $x\in\mathcal{X}$ into a black-box model $f: \mathcal{X} \mapsto \mathcal{Y}$ in order to change the model output $f(x)$ to some pre-specified target value $t\in\mathcal{Y}$. Formally, this boils down to defining some loss function $\ell(f(x),t)$ and taking gradient steps in the minimizing direction. The so generated counterfactuals are considered valid as soon as the predicted label matches the target label. A stripped down counterfactual explanation is therefore little different from an adversarial example.
> You may not like it, but this is what counterfactuals look like
The crucial difference between adversarial examples and counterfactuals is one of intent. While adversarial examples are typically intened to go unnoticed, counterfactuals in the context of Explainable AI are generally sought to be "plausible" or "realistic". To fulfill this latter goal, researchers have come up with a myriad of ways. @joshi2019realistic were among the first to suggest that instead of searching counterfactuals in the feature space, we can instead traverse a latent embedding learned by a surrogate generative model. This ensures that the generated counterfactuals comply with the (learned) data-generating process (DGB). Similarly, @poyiadzi2020face use density ...
- Show DiCE for weak MLP
- Show Latent for same weak MLP
- Latent can be manipulated:
- train biased model
- train VAE with biased variable removed/attacked (use Boston housing dataset)
- hypothesis: will generate bias-free explanations
::: {#prp-surrogate}
## Avoid Surrogates
Since we are in the business of explaining a black-box model, the task of learning realistic representations of the data should not be reallocated from the model itself to some surrogate model.
:::
## Introduction to Conformal Prediction
- distribution-free, model-agnostic and scalable approach to predictive uncertainty quantification
### Post-hoc
- Take any fitted model and turn it into a conformal model using calibration data.
### Intrinsic --- Conformal Training [MAYBE]
- Model explicitly trained for conformal prediction.
## Conformal Counterfactuals
- Realistic counterfactuals by minimizing predictive uncertainty [@schut2021generating].
We propose Conformal Counterfactual Explanations: an effortless and rigorous way to produce realistic and faithful Counterfactual Explanations using Conformal Prediction. To address the need for realistic counterfactuals, existing work has primarily relied on separate generative models to learn the data generating process. While this an effective way to produce plausible and model-agnostic counterfactual explanations, it not only introduces an significant engineering overhead, but also reallocates the task of creating realistic model explanations from the model itsel to the generative model. Recent work has shown that there is no need for any of this when working with probabilistic models that explicitly quantify their own uncertainty. Unfortunately, most models used in practice still do not fulfil that basic requirement, in which case we would like to have a way to quantify predictive uncertainty in a post-hoc fashion.
---
```{julia}
using CounterfactualExplanations
using CounterfactualExplanations.Data: load_mnist
using CounterfactualExplanations.Models: load_mnist_mlp
using Images
using MLDatasets
using MLDatasets: convert2image
using Plots
```
## Motivation
Counterfactual Explanations are a powerful, flexible and intuitive way to not only explain black-box models, but also enable affected individuals to challenge them though the means of Algorithmic Recourse.
### From Adversarial Examples to Counterfactual Explanations
Most state-of-the-art approaches to generating Counterfactual Explanations (CE) rely on gradient descent in the feature space. The key idea is to perturb inputs $x\in\mathcal{X}$ into a black-box model $f: \mathcal{X} \mapsto \mathcal{Y}$ in order to change the model output $f(x)$ to some pre-specified target value $t\in\mathcal{Y}$. Formally, this boils down to defining some loss function $\ell(f(x),t)$ and taking gradient steps in the minimizing direction. The so generated counterfactuals are considered valid as soon as the predicted label matches the target label. A stripped down counterfactual is therefore little different from Adversarial Examples [@goodfellow2014explaining].
Most state-of-the-art approaches to generating Counterfactual Explanations (CE) rely on gradient descent in the feature space. The key idea is to perturb inputs $x\in\mathcal{X}$ into a black-box model $f: \mathcal{X} \mapsto \mathcal{Y}$ in order to change the model output $f(x)$ to some pre-specified target value $t\in\mathcal{Y}$. Formally, this boils down to defining some loss function $\ell(f(x),t)$ and taking gradient steps in the minimizing direction. The so generated counterfactuals are considered valid as soon as the predicted label matches the target label. A stripped down counterfactual explanation is therefore little different from an adversarial example.
> You may not like it, but this is what counterfactuals look like
```{julia}
# Data:
counterfactual_data = load_mnist()
X, y = CounterfactualExplanations.DataPreprocessing.unpack_data(counterfactual_data)
input_dim, n_obs = size(counterfactual_data.X)
M = load_mnist_mlp()
# Target:
factual_label = 8
x = reshape(X[:,rand(findall(predict_label(M, counterfactual_data).==factual_label))],input_dim,1)
The crucial difference between adversarial examples and counterfactuals is one of intent. While adversarial examples are typically intened to go unnoticed, counterfactuals in the context of Explainable AI are generally sought to be "plausible" or "realistic". To fulfill this latter goal, researchers have come up with a myriad of ways. @joshi2019realistic were among the first to suggest that instead of searching counterfactuals in the feature space, we can instead traverse a latent embedding learned by a surrogate generative model. This ensures that the generated counterfactuals comply with the (learned) data-generating process (DGB). Similarly, @poyiadzi2020face use density ...
- Show DiCE for weak MLP
- Show Latent for same weak MLP
- Latent can be manipulated:
...
...
@@ -22,6 +77,14 @@ Most state-of-the-art approaches to generating Counterfactual Explanations (CE)
- train VAE with biased variable removed/attacked (use Boston housing dataset)
- hypothesis: will generate bias-free explanations
::: {#prp-surrogate}
## Avoid Surrogates
Since we are in the business of explaining a black-box model, the task of learning realistic representations of the data should not be reallocated from the model itself to some surrogate model.
:::
## Introduction to Conformal Prediction
- distribution-free, model-agnostic and scalable approach to predictive uncertainty quantification