diff --git a/_freeze/dev/proposal/execute-results/html.json b/_freeze/dev/proposal/execute-results/html.json index 7363fbbea883221afad92e1423ebb6227154412e..1d98ce871632c4fdbe1f483ab37133fa1a8ee291 100644 --- a/_freeze/dev/proposal/execute-results/html.json +++ b/_freeze/dev/proposal/execute-results/html.json @@ -1,7 +1,7 @@ { - "hash": "a2ac106a6b675eafee9a47455706943d", + "hash": "d7b4f9bf7f4bff7ce610fc8be4dcfb8b", "result": { - "markdown": "---\ntitle: Conformal Counterfactual Explanations\nsubtitle: Research Proposal\nabstract: |\n We propose Conformal Counterfactual Explanations: an effortless and rigorous way to produce realistic and faithful Counterfactual Explanations using Conformal Prediction. To address the need for realistic counterfactuals, existing work has primarily relied on separate generative models to learn the data generating process. While this an effective way to produce plausible and model-agnostic counterfactual explanations, it not only introduces an significant engineering overhead, but also reallocates the task of creating realistic model explanations from the model itsel to the generative model. Recent work has shown that there is no need for any of this when working with probabilistic models that explicitly quantify their own uncertainty. Unfortunately, most models used in practice still do not fulfil that basic requirement, in which case we would like to have a way to quantify predictive uncertainty in a post-hoc fashion.\n---\n\n\n\n## Motivation\n\nCounterfactual Explanations are a powerful, flexible and intuitive way to not only explain black-box models, but also enable affected individuals to challenge them though the means of Algorithmic Recourse. \n\n### From Adversarial Examples to Counterfactual Explanations\n\nMost state-of-the-art approaches to generating Counterfactual Explanations (CE) rely on gradient descent in the feature space. The key idea is to perturb inputs $x\\in\\mathcal{X}$ into a black-box model $f: \\mathcal{X} \\mapsto \\mathcal{Y}$ in order to change the model output $f(x)$ to some pre-specified target value $t\\in\\mathcal{Y}$. Formally, this boils down to defining some loss function $\\ell(f(x),t)$ and taking gradient steps in the minimizing direction. The so generated counterfactuals are considered valid as soon as the predicted label matches the target label. A stripped down counterfactual explanation is therefore little different from an adversarial example.\n\n> You may not like it, but this is what counterfactuals look like\n\n\n\n\n\nThe crucial difference between adversarial examples and counterfactuals is one of intent. While adversarial examples are typically intened to go unnoticed, counterfactuals in the context of Explainable AI are generally sought to be \"plausible\" or \"realistic\". To fulfill this latter goal, researchers have come up with a myriad of ways. @joshi2019realistic were among the first to suggest that instead of searching counterfactuals in the feature space, we can instead traverse a latent embedding learned by a surrogate generative model. This ensures that the generated counterfactuals comply with the (learned) data-generating process (DGB). Similarly, @poyiadzi2020face use density ...\n\n- Show DiCE for weak MLP\n- Show Latent for same weak MLP\n- Latent can be manipulated: \n - train biased model\n - train VAE with biased variable removed/attacked (use Boston housing dataset)\n - hypothesis: will generate bias-free explanations\n\n::: {#prp-surrogate}\n\n## Avoid Surrogates\n\nSince we are in the business of explaining a black-box model, the task of learning realistic representations of the data should not be reallocated from the model itself to some surrogate model.\n\n:::\n\n## Introduction to Conformal Prediction\n\n- distribution-free, model-agnostic and scalable approach to predictive uncertainty quantification\n\n### Post-hoc\n\n- Take any fitted model and turn it into a conformal model using calibration data.\n\n### Intrinsic --- Conformal Training [MAYBE]\n\n- Model explicitly trained for conformal prediction.\n\n## Conformal Counterfactuals\n\n- Realistic counterfactuals by minimizing predictive uncertainty [@schut2021generating].\n- Problem: restricted to Bayesian models.\n- Solution: post-hoc predictive uncertainty quantification. \n- Conformal prediction is instance-based. So is CE. \n- Does the coverage guarantee carry over to counterfactuals?\n\n### Research Questions\n\n- Is CP alone enough to ensure realistic counterfactuals?\n- Do counterfactuals improve further as the models get better?\n- Do counterfactuals get more realistic as coverage\n- What happens as we vary coverage and setsize?\n- What happens as we improve the model robustness?\n- What happens as we improve the model's ability to incorporate predictive uncertainty (deep ensemble, laplace)?\n\n## Experiments\n\n- Maybe: conformalised Laplace\n- Benchmarking:\n - add PROBE into the mix\n - compare travel costs to domain shits.\n\n## References\n\n", + "markdown": "---\ntitle: High-Fidelity Counterfactual Explanations through Conformal Prediction\nsubtitle: Research Proposal\nabstract: |\n We propose Conformal Counterfactual Explanations: an effortless and rigorous way to produce realistic and faithful Counterfactual Explanations using Conformal Prediction. To address the need for realistic counterfactuals, existing work has primarily relied on separate generative models to learn the data-generating process. While this is an effective way to produce plausible and model-agnostic counterfactual explanations, it not only introduces a significant engineering overhead but also reallocates the task of creating realistic model explanations from the model itself to the generative model. Recent work has shown that there is no need for any of this when working with probabilistic models that explicitly quantify their own uncertainty. Unfortunately, most models used in practice still do not fulfil that basic requirement, in which case we would like to have a way to quantify predictive uncertainty in a post-hoc fashion.\n---\n\n\n\n## Motivation\n\nCounterfactual Explanations are a powerful, flexible and intuitive way to not only explain black-box models but also enable affected individuals to challenge them through the means of Algorithmic Recourse. \n\n### Counterfactual Explanations or Adversarial Examples?\n\nMost state-of-the-art approaches to generating Counterfactual Explanations (CE) rely on gradient descent in the feature space. The key idea is to perturb inputs $x\\in\\mathcal{X}$ into a black-box model $f: \\mathcal{X} \\mapsto \\mathcal{Y}$ in order to change the model output $f(x)$ to some pre-specified target value $t\\in\\mathcal{Y}$. Formally, this boils down to defining some loss function $\\ell(f(x),t)$ and taking gradient steps in the minimizing direction. The so-generated counterfactuals are considered valid as soon as the predicted label matches the target label. A stripped-down counterfactual explanation is therefore little different from an adversarial example. In @fig-adv, for example, generic counterfactual search as in @wachter2017counterfactual has been applied to MNIST data.\n\n\n\n\n\n{#fig-adv}\n\nThe crucial difference between adversarial examples and counterfactuals is one of intent. While adversarial examples are typically intended to go unnoticed, counterfactuals in the context of Explainable AI are generally sought to be \"plausible\", \"realistic\" or \"feasible\". To fulfil this latter goal, researchers have come up with a myriad of ways. @joshi2019realistic were among the first to suggest that instead of searching counterfactuals in the feature space, we can instead traverse a latent embedding learned by a surrogate generative model. Similarly, @poyiadzi2020face use density ... Finally, @karimi2021algorithmic argues that counterfactuals should comply with the causal model that generates them [CHECK IF WE CAN PHASE THIS LIKE THIS]. Other related approaches include ... All of these different approaches have a common goal: they aim to ensure that the generated counterfactuals comply with the (learned) data-generating process (DGB). \n\n::: {#def-plausible}\n\n## Plausible Counterfactuals\n\nFormally, if $x \\sim \\mathcal{X}$ and for the corresponding counterfactual we have $x^{\\prime}\\sim\\mathcal{X}^{\\prime}$, then for $x^{\\prime}$ to be considered a plausible counterfactual, we need: $\\mathcal{X} \\approxeq \\mathcal{X}^{\\prime}$.\n\n:::\n\nIn the context of Algorithmic Recourse, it makes sense to strive for plausible counterfactuals, since anything else would essentially require individuals to move to out-of-distribution states. But it is worth noting that our ambition to meet this goal, may have implications on our ability to faithfully explain the behaviour of the underlying black-box model (arguably our principal goal). By essentially decoupling the task of learning plausible representations of the data from the model itself, we open ourselves up to vulnerabilities. Using a separate generative model to learn $\\mathcal{X}$, for example, has very serious implications for the generated counterfactuals. @fig-latent compares the results of applying REVISE [@joshi2019realistic] to MNIST data using two different Variational Auto-Encoders: while the counterfactual generated using an expressive (strong) VAE is compelling, the result relying on a less expressive (weak) VAE is not even valid. In this latter case, the decoder step of the VAE fails to yield values in $\\mathcal{X}$ and hence the counterfactual search in the learned latent space is doomed. \n\n{#fig-latent}\n\n> Here it would be nice to have another example where we poison the data going into the generative model to hide biases present in the data (e.g. Boston housing).\n\n- Latent can be manipulated: \n - train biased model\n - train VAE with biased variable removed/attacked (use Boston housing dataset)\n - hypothesis: will generate bias-free explanations\n\n### From Plausible to High-Fidelity Counterfactuals {#sec-fidelity}\n\nIn light of the findings, we propose to generally avoid using surrogate models to learn $\\mathcal{X}$ in the context of Counterfactual Explanations.\n\n::: {#prp-surrogate}\n\n## Avoid Surrogates\n\nSince we are in the business of explaining a black-box model, the task of learning realistic representations of the data should not be reallocated from the model itself to some surrogate model.\n\n:::\n\nIn cases where the use of surrogate models cannot be avoided, we propose to weigh the plausibility of counterfactuals against their fidelity to the black-box model. In the context of Explainable AI, fidelity is defined as describing how an explanation approximates the prediction of the black-box model [@molnar2020interpretable]. Fidelity has become the default metric for evaluating Local Model-Agnostic Models, since they often involve local surrogate models whose predictions need not always match those of the black-box model. \n\nIn the case of Counterfactual Explanations, the concept of fidelity has so far been ignored. This is not altogether surprising, since by construction and design, Counterfactual Explanations work with the predictions of the black-box model directly: as stated above, a counterfactual $x^{\\prime}$ is considered valid if and only if $f(x^{\\prime})=t$, where $t$ denote some target outcome. \n\nDoes fidelity even make sense in the context of CE, and if so, how can we define it? In light of the examples in the previous section, we think it is urgent to introduce a notion of fidelity in this context, that relates to the distributional properties of the generated counterfactuals. In particular, we propose that a high-fidelity counterfactual $x^{\\prime}$ complies with the class-conditional distribution $\\mathcal{X}_{\\theta} = p_{\\theta}(X|y)$ where $\\theta$ denote the black-box model parameters. \n\n::: {#def-fidele}\n\n## High-Fidelity Counterfactuals\n\nLet $\\mathcal{X}_{\\theta}|y = p_{\\theta}(X|y)$ denote the class-conditional distribution of $X$ defined by $\\theta$. Then for $x^{\\prime}$ to be considered a high-fidelity counterfactual, we need: $\\mathcal{X}_{\\theta}|t \\approxeq \\mathcal{X}^{\\prime}$ where $t$ denotes the target outcome.\n\n:::\n\nIn order to assess the fidelity of counterfactuals, we propose the following two-step procedure:\n\n1) Generate samples $X_{\\theta}|y$ and $X^{\\prime}$ from $\\mathcal{X}_{\\theta}|t$ and $\\mathcal{X}^{\\prime}$, respectively.\n2) Compute the Maximum Mean Discrepancy (MMD) between $X_{\\theta}|y$ and $X^{\\prime}$. \n\nIf the computed value is different from zero, we can reject the null-hypothesis of fidelity.\n\n> Two challenges here: 1) implementing the sampling procedure in @grathwohl2020your; 2) it is unclear if MMD is really the right way to measure this. \n\n## Conformal Counterfactual Explanations\n\nIn @sec-fidelity, we have advocated for avoiding surrogate models in the context of Counterfactual Explanations. In this section, we introduce an alternative way to generate high-fidelity Counterfactual Explanations. In particular, we propose Conformal Counterfactual Explanations (CCE), that is Counterfactual Explanations that minimize the predictive uncertainty of conformal models. \n\n### Minimizing Predictive Uncertainty\n\n@schut2021generating demonstrated that the goal of generating realistic (plausible) counterfactuals can also be achieved by seeking counterfactuals that minimize the predictive uncertainty of the underlying black-box model. Similarly, @antoran2020getting ...\n\n- Problem: restricted to Bayesian models.\n- Solution: post-hoc predictive uncertainty quantification. In particular, Conformal Prediction. \n\n### Background on Conformal Prediction\n\n- Distribution-free, model-agnostic and scalable approach to predictive uncertainty quantification.\n- Conformal prediction is instance-based. So is CE. \n- Take any fitted model and turn it into a conformal model using calibration data.\n- Our approach, therefore, relaxes the restriction on the family of black-box models, at the cost of relying on a subset of the data. Arguably, data is often abundant and in most applications practitioners tend to hold out a test data set anyway. \n\n> Does the coverage guarantee carry over to counterfactuals?\n\n### Generating Conformal Counterfactuals\n\nWhile Conformal Prediction has recently grown in popularity, it does introduce a challenge in the context of classification: the predictions of Conformal Classifiers are set-valued and therefore difficult to work with, since they are, for example, non-differentiable. Fortunately, @stutz2022learning introduced carefully designed differentiable loss functions that make it possible to evaluate the performance of conformal predictions in training. We can leverage these recent advances in the context of gradient-based counterfactual search ...\n\n> Challenge: still need to implement these loss functions. \n\n## Experiments\n\n### Research Questions\n\n- Is CP alone enough to ensure realistic counterfactuals?\n- Do counterfactuals improve further as the models get better?\n- Do counterfactuals get more realistic as coverage\n- What happens as we vary coverage and setsize?\n- What happens as we improve the model robustness?\n- What happens as we improve the model's ability to incorporate predictive uncertainty (deep ensemble, laplace)?\n- What happens if we combine with DiCE, ClaPROAR, Gravitational?\n- What about CE robustness to endogenous shifts [@altmeyer2023endogenous]?\n\n- Benchmarking:\n - add PROBE [@pawelczyk2022probabilistically] into the mix.\n - compare travel costs to domain shits.\n\n> Nice to have: What about using Laplace Approximation, then Conformal Prediction? What about using Conformalised Laplace? \n\n## References\n\n", "supporting": [ "proposal_files/figure-html" ], diff --git a/bib.bib b/bib.bib index 1b1dca6bd5a502a042e3f4f22c7b5d51e0bf6afb..6aab437ec5f1041e804a49acd56ad55a1ff8a5b2 100644 --- a/bib.bib +++ b/bib.bib @@ -2403,4 +2403,26 @@ shorttitle = {Probabilistically {Robust} {Recourse}}, } +@InProceedings{stutz2022learning, + author = {Stutz, David and Dvijotham, Krishnamurthy Dj and Cemgil, Ali Taylan and Doucet, Arnaud}, + date = {2022-05}, + title = {Learning {Optimal} {Conformal} {Classifiers}}, + language = {en}, + url = {https://openreview.net/forum?id=t8O-4LKFVx}, + urldate = {2023-02-13}, + abstract = {Modern deep learning based classifiers show very high accuracy on test data but this does not provide sufficient guarantees for safe deployment, especially in high-stake AI applications such as medical diagnosis. Usually, predictions are obtained without a reliable uncertainty estimate or a formal guarantee. Conformal prediction (CP) addresses these issues by using the classifier's predictions, e.g., its probability estimates, to predict confidence sets containing the true class with a user-specified probability. However, using CP as a separate processing step after training prevents the underlying model from adapting to the prediction of confidence sets. Thus, this paper explores strategies to differentiate through CP during training with the goal of training model with the conformal wrapper end-to-end. In our approach, conformal training (ConfTr), we specifically "simulate" conformalization on mini-batches during training. Compared to standard training, ConfTr reduces the average confidence set size (inefficiency) of state-of-the-art CP methods applied after training. Moreover, it allows to "shape" the confidence sets predicted at test time, which is difficult for standard CP. On experiments with several datasets, we show ConfTr can influence how inefficiency is distributed across classes, or guide the composition of confidence sets in terms of the included classes, while retaining the guarantees offered by CP.}, + file = {:stutz2022learning - Learning Optimal Conformal Classifiers.pdf:PDF}, +} + +@InProceedings{grathwohl2020your, + author = {Grathwohl, Will and Wang, Kuan-Chieh and Jacobsen, Joern-Henrik and Duvenaud, David and Norouzi, Mohammad and Swersky, Kevin}, + date = {2020-03}, + title = {Your classifier is secretly an energy based model and you should treat it like one}, + language = {en}, + url = {https://openreview.net/forum?id=Hkxzx0NtDB}, + urldate = {2023-02-13}, + abstract = {We propose to reinterpret a standard discriminative classifier of p(y{\textbar}x) as an energy based model for the joint distribution p(x, y). In this setting, the standard class probabilities can be easily computed as well as unnormalized values of p(x) and p(x{\textbar}y). Within this framework, standard discriminative architectures may be used and the model can also be trained on unlabeled data. We demonstrate that energy based training of the joint distribution improves calibration, robustness, and out-of-distribution detection while also enabling our models to generate samples rivaling the quality of recent GAN approaches. We improve upon recently proposed techniques for scaling up the training of energy based models and present an approach which adds little overhead compared to standard classification training. Our approach is the first to achieve performance rivaling the state-of-the-art in both generative and discriminative learning within one hybrid model.}, + file = {:grathwohl2020your - Your Classifier Is Secretly an Energy Based Model and You Should Treat It like One.pdf:PDF}, +} + @Comment{jabref-meta: databaseType:biblatex;} diff --git a/build/dev/proposal.html b/build/dev/proposal.html index eea1c51fdb44093cd8598de4d0afbc624a710e4f..afcdf81474b13c00302a378341998419caace1f0 100644 --- a/build/dev/proposal.html +++ b/build/dev/proposal.html @@ -7,7 +7,7 @@ <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes"> -<title>Conformal Counterfactual Explanations</title> +<title>High-Fidelity Counterfactual Explanations through Conformal Prediction</title> <style> code{white-space: pre-wrap;} span.smallcaps{font-variant: small-caps;} @@ -67,18 +67,19 @@ div.csl-indent { <ul> <li><a href="#motivation" id="toc-motivation" class="nav-link active" data-scroll-target="#motivation">Motivation</a> <ul class="collapse"> - <li><a href="#from-adversarial-examples-to-counterfactual-explanations" id="toc-from-adversarial-examples-to-counterfactual-explanations" class="nav-link" data-scroll-target="#from-adversarial-examples-to-counterfactual-explanations">From Adversarial Examples to Counterfactual Explanations</a></li> + <li><a href="#counterfactual-explanations-or-adversarial-examples" id="toc-counterfactual-explanations-or-adversarial-examples" class="nav-link" data-scroll-target="#counterfactual-explanations-or-adversarial-examples">Counterfactual Explanations or Adversarial Examples?</a></li> + <li><a href="#sec-fidelity" id="toc-sec-fidelity" class="nav-link" data-scroll-target="#sec-fidelity">From Plausible to High-Fidelity Counterfactuals</a></li> </ul></li> - <li><a href="#introduction-to-conformal-prediction" id="toc-introduction-to-conformal-prediction" class="nav-link" data-scroll-target="#introduction-to-conformal-prediction">Introduction to Conformal Prediction</a> + <li><a href="#conformal-counterfactual-explanations" id="toc-conformal-counterfactual-explanations" class="nav-link" data-scroll-target="#conformal-counterfactual-explanations">Conformal Counterfactual Explanations</a> <ul class="collapse"> - <li><a href="#post-hoc" id="toc-post-hoc" class="nav-link" data-scroll-target="#post-hoc">Post-hoc</a></li> - <li><a href="#intrinsic-conformal-training-maybe" id="toc-intrinsic-conformal-training-maybe" class="nav-link" data-scroll-target="#intrinsic-conformal-training-maybe">Intrinsic — Conformal Training [MAYBE]</a></li> + <li><a href="#minimizing-predictive-uncertainty" id="toc-minimizing-predictive-uncertainty" class="nav-link" data-scroll-target="#minimizing-predictive-uncertainty">Minimizing Predictive Uncertainty</a></li> + <li><a href="#background-on-conformal-prediction" id="toc-background-on-conformal-prediction" class="nav-link" data-scroll-target="#background-on-conformal-prediction">Background on Conformal Prediction</a></li> + <li><a href="#generating-conformal-counterfactuals" id="toc-generating-conformal-counterfactuals" class="nav-link" data-scroll-target="#generating-conformal-counterfactuals">Generating Conformal Counterfactuals</a></li> </ul></li> - <li><a href="#conformal-counterfactuals" id="toc-conformal-counterfactuals" class="nav-link" data-scroll-target="#conformal-counterfactuals">Conformal Counterfactuals</a> + <li><a href="#experiments" id="toc-experiments" class="nav-link" data-scroll-target="#experiments">Experiments</a> <ul class="collapse"> <li><a href="#research-questions" id="toc-research-questions" class="nav-link" data-scroll-target="#research-questions">Research Questions</a></li> </ul></li> - <li><a href="#experiments" id="toc-experiments" class="nav-link" data-scroll-target="#experiments">Experiments</a></li> <li><a href="#references" id="toc-references" class="nav-link" data-scroll-target="#references">References</a></li> </ul> </nav> @@ -87,7 +88,7 @@ div.csl-indent { <header id="title-block-header" class="quarto-title-block default"> <div class="quarto-title"> -<h1 class="title">Conformal Counterfactual Explanations</h1> +<h1 class="title">High-Fidelity Counterfactual Explanations through Conformal Prediction</h1> <p class="subtitle lead">Research Proposal</p> </div> @@ -103,7 +104,7 @@ div.csl-indent { <div> <div class="abstract"> <div class="abstract-title">Abstract</div> - <p>We propose Conformal Counterfactual Explanations: an effortless and rigorous way to produce realistic and faithful Counterfactual Explanations using Conformal Prediction. To address the need for realistic counterfactuals, existing work has primarily relied on separate generative models to learn the data generating process. While this an effective way to produce plausible and model-agnostic counterfactual explanations, it not only introduces an significant engineering overhead, but also reallocates the task of creating realistic model explanations from the model itsel to the generative model. Recent work has shown that there is no need for any of this when working with probabilistic models that explicitly quantify their own uncertainty. Unfortunately, most models used in practice still do not fulfil that basic requirement, in which case we would like to have a way to quantify predictive uncertainty in a post-hoc fashion.</p> + <p>We propose Conformal Counterfactual Explanations: an effortless and rigorous way to produce realistic and faithful Counterfactual Explanations using Conformal Prediction. To address the need for realistic counterfactuals, existing work has primarily relied on separate generative models to learn the data-generating process. While this is an effective way to produce plausible and model-agnostic counterfactual explanations, it not only introduces a significant engineering overhead but also reallocates the task of creating realistic model explanations from the model itself to the generative model. Recent work has shown that there is no need for any of this when working with probabilistic models that explicitly quantify their own uncertainty. Unfortunately, most models used in practice still do not fulfil that basic requirement, in which case we would like to have a way to quantify predictive uncertainty in a post-hoc fashion.</p> </div> </div> @@ -111,17 +112,31 @@ div.csl-indent { <section id="motivation" class="level2"> <h2 class="anchored" data-anchor-id="motivation">Motivation</h2> -<p>Counterfactual Explanations are a powerful, flexible and intuitive way to not only explain black-box models, but also enable affected individuals to challenge them though the means of Algorithmic Recourse.</p> -<section id="from-adversarial-examples-to-counterfactual-explanations" class="level3"> -<h3 class="anchored" data-anchor-id="from-adversarial-examples-to-counterfactual-explanations">From Adversarial Examples to Counterfactual Explanations</h3> -<p>Most state-of-the-art approaches to generating Counterfactual Explanations (CE) rely on gradient descent in the feature space. The key idea is to perturb inputs <span class="math inline">\(x\in\mathcal{X}\)</span> into a black-box model <span class="math inline">\(f: \mathcal{X} \mapsto \mathcal{Y}\)</span> in order to change the model output <span class="math inline">\(f(x)\)</span> to some pre-specified target value <span class="math inline">\(t\in\mathcal{Y}\)</span>. Formally, this boils down to defining some loss function <span class="math inline">\(\ell(f(x),t)\)</span> and taking gradient steps in the minimizing direction. The so generated counterfactuals are considered valid as soon as the predicted label matches the target label. A stripped down counterfactual explanation is therefore little different from an adversarial example.</p> +<p>Counterfactual Explanations are a powerful, flexible and intuitive way to not only explain black-box models but also enable affected individuals to challenge them through the means of Algorithmic Recourse.</p> +<section id="counterfactual-explanations-or-adversarial-examples" class="level3"> +<h3 class="anchored" data-anchor-id="counterfactual-explanations-or-adversarial-examples">Counterfactual Explanations or Adversarial Examples?</h3> +<p>Most state-of-the-art approaches to generating Counterfactual Explanations (CE) rely on gradient descent in the feature space. The key idea is to perturb inputs <span class="math inline">\(x\in\mathcal{X}\)</span> into a black-box model <span class="math inline">\(f: \mathcal{X} \mapsto \mathcal{Y}\)</span> in order to change the model output <span class="math inline">\(f(x)\)</span> to some pre-specified target value <span class="math inline">\(t\in\mathcal{Y}\)</span>. Formally, this boils down to defining some loss function <span class="math inline">\(\ell(f(x),t)\)</span> and taking gradient steps in the minimizing direction. The so-generated counterfactuals are considered valid as soon as the predicted label matches the target label. A stripped-down counterfactual explanation is therefore little different from an adversarial example. In <a href="#fig-adv">Figure 1</a>, for example, generic counterfactual search as in <span class="citation" data-cites="wachter2017counterfactual">Wachter, Mittelstadt, and Russell (<a href="#ref-wachter2017counterfactual" role="doc-biblioref">2017</a>)</span> has been applied to MNIST data.</p> +<div id="fig-adv" class="quarto-figure quarto-figure-center anchored"> +<figure class="figure"> +<p><img src="www/you_may_not_like_it.png" class="img-fluid figure-img"></p> +<p></p><figcaption class="figure-caption">Figure 1: You may not like it, but this is what stripped-down counterfactuals look like. Here we have used <span class="citation" data-cites="wachter2017counterfactual">Wachter, Mittelstadt, and Russell (<a href="#ref-wachter2017counterfactual" role="doc-biblioref">2017</a>)</span> to generate multiple counterfactuals for turning an 8 (eight) into a 3 (three).</figcaption><p></p> +</figure> +</div> +<p>The crucial difference between adversarial examples and counterfactuals is one of intent. While adversarial examples are typically intended to go unnoticed, counterfactuals in the context of Explainable AI are generally sought to be “plausibleâ€, “realistic†or “feasibleâ€. To fulfil this latter goal, researchers have come up with a myriad of ways. <span class="citation" data-cites="joshi2019realistic">Joshi et al. (<a href="#ref-joshi2019realistic" role="doc-biblioref">2019</a>)</span> were among the first to suggest that instead of searching counterfactuals in the feature space, we can instead traverse a latent embedding learned by a surrogate generative model. Similarly, <span class="citation" data-cites="poyiadzi2020face">Poyiadzi et al. (<a href="#ref-poyiadzi2020face" role="doc-biblioref">2020</a>)</span> use density … Finally, <span class="citation" data-cites="karimi2021algorithmic">Karimi, Schölkopf, and Valera (<a href="#ref-karimi2021algorithmic" role="doc-biblioref">2021</a>)</span> argues that counterfactuals should comply with the causal model that generates them [CHECK IF WE CAN PHASE THIS LIKE THIS]. Other related approaches include … All of these different approaches have a common goal: they aim to ensure that the generated counterfactuals comply with the (learned) data-generating process (DGB).</p> +<div id="def-plausible" class="theorem definition"> +<p><span class="theorem-title"><strong>Definition 1 (Plausible Counterfactuals) </strong></span>Formally, if <span class="math inline">\(x \sim \mathcal{X}\)</span> and for the corresponding counterfactual we have <span class="math inline">\(x^{\prime}\sim\mathcal{X}^{\prime}\)</span>, then for <span class="math inline">\(x^{\prime}\)</span> to be considered a plausible counterfactual, we need: <span class="math inline">\(\mathcal{X} \approxeq \mathcal{X}^{\prime}\)</span>.</p> +</div> +<p>In the context of Algorithmic Recourse, it makes sense to strive for plausible counterfactuals, since anything else would essentially require individuals to move to out-of-distribution states. But it is worth noting that our ambition to meet this goal, may have implications on our ability to faithfully explain the behaviour of the underlying black-box model (arguably our principal goal). By essentially decoupling the task of learning plausible representations of the data from the model itself, we open ourselves up to vulnerabilities. Using a separate generative model to learn <span class="math inline">\(\mathcal{X}\)</span>, for example, has very serious implications for the generated counterfactuals. <a href="#fig-latent">Figure 2</a> compares the results of applying REVISE <span class="citation" data-cites="joshi2019realistic">(<a href="#ref-joshi2019realistic" role="doc-biblioref">Joshi et al. 2019</a>)</span> to MNIST data using two different Variational Auto-Encoders: while the counterfactual generated using an expressive (strong) VAE is compelling, the result relying on a less expressive (weak) VAE is not even valid. In this latter case, the decoder step of the VAE fails to yield values in <span class="math inline">\(\mathcal{X}\)</span> and hence the counterfactual search in the learned latent space is doomed.</p> +<div id="fig-latent" class="quarto-figure quarto-figure-center anchored"> +<figure class="figure"> +<p><img src="www/mnist_9to4_latent.png" class="img-fluid figure-img"></p> +<p></p><figcaption class="figure-caption">Figure 2: Counterfactual explanations for MNIST using a Latent Space generator: turning a nine (9) into a four (4).</figcaption><p></p> +</figure> +</div> <blockquote class="blockquote"> -<p>You may not like it, but this is what counterfactuals look like</p> +<p>Here it would be nice to have another example where we poison the data going into the generative model to hide biases present in the data (e.g. Boston housing).</p> </blockquote> -<p>The crucial difference between adversarial examples and counterfactuals is one of intent. While adversarial examples are typically intened to go unnoticed, counterfactuals in the context of Explainable AI are generally sought to be “plausible†or “realisticâ€. To fulfill this latter goal, researchers have come up with a myriad of ways. <span class="citation" data-cites="joshi2019realistic">Joshi et al. (<a href="#ref-joshi2019realistic" role="doc-biblioref">2019</a>)</span> were among the first to suggest that instead of searching counterfactuals in the feature space, we can instead traverse a latent embedding learned by a surrogate generative model. This ensures that the generated counterfactuals comply with the (learned) data-generating process (DGB). Similarly, <span class="citation" data-cites="poyiadzi2020face">Poyiadzi et al. (<a href="#ref-poyiadzi2020face" role="doc-biblioref">2020</a>)</span> use density …</p> <ul> -<li>Show DiCE for weak MLP</li> -<li>Show Latent for same weak MLP</li> <li>Latent can be manipulated: <ul> <li>train biased model</li> @@ -129,60 +144,84 @@ div.csl-indent { <li>hypothesis: will generate bias-free explanations</li> </ul></li> </ul> +</section> +<section id="sec-fidelity" class="level3"> +<h3 class="anchored" data-anchor-id="sec-fidelity">From Plausible to High-Fidelity Counterfactuals</h3> +<p>In light of the findings, we propose to generally avoid using surrogate models to learn <span class="math inline">\(\mathcal{X}\)</span> in the context of Counterfactual Explanations.</p> <div id="prp-surrogate" class="theorem proposition"> <p><span class="theorem-title"><strong>Proposition 1 (Avoid Surrogates) </strong></span>Since we are in the business of explaining a black-box model, the task of learning realistic representations of the data should not be reallocated from the model itself to some surrogate model.</p> </div> +<p>In cases where the use of surrogate models cannot be avoided, we propose to weigh the plausibility of counterfactuals against their fidelity to the black-box model. In the context of Explainable AI, fidelity is defined as describing how an explanation approximates the prediction of the black-box model <span class="citation" data-cites="molnar2020interpretable">(<a href="#ref-molnar2020interpretable" role="doc-biblioref">Molnar 2020</a>)</span>. Fidelity has become the default metric for evaluating Local Model-Agnostic Models, since they often involve local surrogate models whose predictions need not always match those of the black-box model.</p> +<p>In the case of Counterfactual Explanations, the concept of fidelity has so far been ignored. This is not altogether surprising, since by construction and design, Counterfactual Explanations work with the predictions of the black-box model directly: as stated above, a counterfactual <span class="math inline">\(x^{\prime}\)</span> is considered valid if and only if <span class="math inline">\(f(x^{\prime})=t\)</span>, where <span class="math inline">\(t\)</span> denote some target outcome.</p> +<p>Does fidelity even make sense in the context of CE, and if so, how can we define it? In light of the examples in the previous section, we think it is urgent to introduce a notion of fidelity in this context, that relates to the distributional properties of the generated counterfactuals. In particular, we propose that a high-fidelity counterfactual <span class="math inline">\(x^{\prime}\)</span> complies with the class-conditional distribution <span class="math inline">\(\mathcal{X}_{\theta} = p_{\theta}(X|y)\)</span> where <span class="math inline">\(\theta\)</span> denote the black-box model parameters.</p> +<div id="def-fidele" class="theorem definition"> +<p><span class="theorem-title"><strong>Definition 2 (High-Fidelity Counterfactuals) </strong></span>Let <span class="math inline">\(\mathcal{X}_{\theta}|y = p_{\theta}(X|y)\)</span> denote the class-conditional distribution of <span class="math inline">\(X\)</span> defined by <span class="math inline">\(\theta\)</span>. Then for <span class="math inline">\(x^{\prime}\)</span> to be considered a high-fidelity counterfactual, we need: <span class="math inline">\(\mathcal{X}_{\theta}|t \approxeq \mathcal{X}^{\prime}\)</span> where <span class="math inline">\(t\)</span> denotes the target outcome.</p> +</div> +<p>In order to assess the fidelity of counterfactuals, we propose the following two-step procedure:</p> +<ol type="1"> +<li>Generate samples <span class="math inline">\(X_{\theta}|y\)</span> and <span class="math inline">\(X^{\prime}\)</span> from <span class="math inline">\(\mathcal{X}_{\theta}|t\)</span> and <span class="math inline">\(\mathcal{X}^{\prime}\)</span>, respectively.</li> +<li>Compute the Maximum Mean Discrepancy (MMD) between <span class="math inline">\(X_{\theta}|y\)</span> and <span class="math inline">\(X^{\prime}\)</span>.</li> +</ol> +<p>If the computed value is different from zero, we can reject the null-hypothesis of fidelity.</p> +<blockquote class="blockquote"> +<p>Two challenges here: 1) implementing the sampling procedure in <span class="citation" data-cites="grathwohl2020your">Grathwohl et al. (<a href="#ref-grathwohl2020your" role="doc-biblioref">2020</a>)</span>; 2) it is unclear if MMD is really the right way to measure this.</p> +</blockquote> </section> </section> -<section id="introduction-to-conformal-prediction" class="level2"> -<h2 class="anchored" data-anchor-id="introduction-to-conformal-prediction">Introduction to Conformal Prediction</h2> -<ul> -<li>distribution-free, model-agnostic and scalable approach to predictive uncertainty quantification</li> -</ul> -<section id="post-hoc" class="level3"> -<h3 class="anchored" data-anchor-id="post-hoc">Post-hoc</h3> -<ul> -<li>Take any fitted model and turn it into a conformal model using calibration data.</li> -</ul> -</section> -<section id="intrinsic-conformal-training-maybe" class="level3"> -<h3 class="anchored" data-anchor-id="intrinsic-conformal-training-maybe">Intrinsic — Conformal Training [MAYBE]</h3> +<section id="conformal-counterfactual-explanations" class="level2"> +<h2 class="anchored" data-anchor-id="conformal-counterfactual-explanations">Conformal Counterfactual Explanations</h2> +<p>In <a href="#sec-fidelity">Section 1.2</a>, we have advocated for avoiding surrogate models in the context of Counterfactual Explanations. In this section, we introduce an alternative way to generate high-fidelity Counterfactual Explanations. In particular, we propose Conformal Counterfactual Explanations (CCE), that is Counterfactual Explanations that minimize the predictive uncertainty of conformal models.</p> +<section id="minimizing-predictive-uncertainty" class="level3"> +<h3 class="anchored" data-anchor-id="minimizing-predictive-uncertainty">Minimizing Predictive Uncertainty</h3> +<p><span class="citation" data-cites="schut2021generating">Schut et al. (<a href="#ref-schut2021generating" role="doc-biblioref">2021</a>)</span> demonstrated that the goal of generating realistic (plausible) counterfactuals can also be achieved by seeking counterfactuals that minimize the predictive uncertainty of the underlying black-box model. Similarly, <span class="citation" data-cites="antoran2020getting">Antorán et al. (<a href="#ref-antoran2020getting" role="doc-biblioref">2020</a>)</span> …</p> <ul> -<li>Model explicitly trained for conformal prediction.</li> +<li>Problem: restricted to Bayesian models.</li> +<li>Solution: post-hoc predictive uncertainty quantification. In particular, Conformal Prediction.</li> </ul> </section> -</section> -<section id="conformal-counterfactuals" class="level2"> -<h2 class="anchored" data-anchor-id="conformal-counterfactuals">Conformal Counterfactuals</h2> +<section id="background-on-conformal-prediction" class="level3"> +<h3 class="anchored" data-anchor-id="background-on-conformal-prediction">Background on Conformal Prediction</h3> <ul> -<li>Realistic counterfactuals by minimizing predictive uncertainty <span class="citation" data-cites="schut2021generating">(<a href="#ref-schut2021generating" role="doc-biblioref">Schut et al. 2021</a>)</span>.</li> -<li>Problem: restricted to Bayesian models.</li> -<li>Solution: post-hoc predictive uncertainty quantification.</li> +<li>Distribution-free, model-agnostic and scalable approach to predictive uncertainty quantification.</li> <li>Conformal prediction is instance-based. So is CE.</li> -<li>Does the coverage guarantee carry over to counterfactuals?</li> -</ul> -<section id="research-questions" class="level3"> -<h3 class="anchored" data-anchor-id="research-questions">Research Questions</h3> -<ul> -<li>Is CP alone enough to ensure realistic counterfactuals?</li> -<li>Do counterfactuals improve further as the models get better?</li> -<li>Do counterfactuals get more realistic as coverage</li> -<li>What happens as we vary coverage and setsize?</li> -<li>What happens as we improve the model robustness?</li> -<li>What happens as we improve the model’s ability to incorporate predictive uncertainty (deep ensemble, laplace)?</li> +<li>Take any fitted model and turn it into a conformal model using calibration data.</li> +<li>Our approach, therefore, relaxes the restriction on the family of black-box models, at the cost of relying on a subset of the data. Arguably, data is often abundant and in most applications practitioners tend to hold out a test data set anyway.</li> </ul> +<blockquote class="blockquote"> +<p>Does the coverage guarantee carry over to counterfactuals?</p> +</blockquote> +</section> +<section id="generating-conformal-counterfactuals" class="level3"> +<h3 class="anchored" data-anchor-id="generating-conformal-counterfactuals">Generating Conformal Counterfactuals</h3> +<p>While Conformal Prediction has recently grown in popularity, it does introduce a challenge in the context of classification: the predictions of Conformal Classifiers are set-valued and therefore difficult to work with, since they are, for example, non-differentiable. Fortunately, <span class="citation" data-cites="stutz2022learning">Stutz et al. (<a href="#ref-stutz2022learning" role="doc-biblioref">2022</a>)</span> introduced carefully designed differentiable loss functions that make it possible to evaluate the performance of conformal predictions in training. We can leverage these recent advances in the context of gradient-based counterfactual search …</p> +<blockquote class="blockquote"> +<p>Challenge: still need to implement these loss functions.</p> +</blockquote> </section> </section> <section id="experiments" class="level2"> <h2 class="anchored" data-anchor-id="experiments">Experiments</h2> +<section id="research-questions" class="level3"> +<h3 class="anchored" data-anchor-id="research-questions">Research Questions</h3> <ul> -<li>Maybe: conformalised Laplace</li> -<li>Benchmarking: +<li><p>Is CP alone enough to ensure realistic counterfactuals?</p></li> +<li><p>Do counterfactuals improve further as the models get better?</p></li> +<li><p>Do counterfactuals get more realistic as coverage</p></li> +<li><p>What happens as we vary coverage and setsize?</p></li> +<li><p>What happens as we improve the model robustness?</p></li> +<li><p>What happens as we improve the model’s ability to incorporate predictive uncertainty (deep ensemble, laplace)?</p></li> +<li><p>What happens if we combine with DiCE, ClaPROAR, Gravitational?</p></li> +<li><p>What about CE robustness to endogenous shifts <span class="citation" data-cites="altmeyer2023endogenous">(<a href="#ref-altmeyer2023endogenous" role="doc-biblioref">Altmeyer et al. 2023</a>)</span>?</p></li> +<li><p>Benchmarking:</p> <ul> -<li>add PROBE into the mix</li> +<li>add PROBE <span class="citation" data-cites="pawelczyk2022probabilistically">(<a href="#ref-pawelczyk2022probabilistically" role="doc-biblioref">Pawelczyk et al. 2022</a>)</span> into the mix.</li> <li>compare travel costs to domain shits.</li> </ul></li> </ul> +<blockquote class="blockquote"> +<p>Nice to have: What about using Laplace Approximation, then Conformal Prediction? What about using Conformalised Laplace?</p> +</blockquote> +</section> </section> <section id="references" class="level2 unnumbered"> @@ -190,15 +229,39 @@ div.csl-indent { </section> <div id="quarto-appendix" class="default"><section class="quarto-appendix-contents" role="doc-bibliography"><h2 class="anchored quarto-appendix-heading">References</h2><div id="refs" class="references csl-bib-body hanging-indent" role="doc-bibliography"> +<div id="ref-altmeyer2023endogenous" class="csl-entry" role="doc-biblioentry"> +Altmeyer, Patrick, Giovan Angela, Aleksander Buszydlik, Karol Dobiczek, Arie van Deursen, and Cynthia Liem. 2023. <span>“Endogenous <span>Macrodynamics</span> in <span>Algorithmic</span> <span>Recourse</span>.â€</span> In <em>First <span>IEEE</span> <span>Conference</span> on <span>Secure</span> and <span>Trustworthy</span> <span>Machine</span> <span>Learning</span></em>. +</div> +<div id="ref-antoran2020getting" class="csl-entry" role="doc-biblioentry"> +Antorán, Javier, Umang Bhatt, Tameem Adel, Adrian Weller, and José Miguel Hernández-Lobato. 2020. <span>“Getting a Clue: <span>A</span> Method for Explaining Uncertainty Estimates.â€</span> <a href="https://arxiv.org/abs/2006.06848">https://arxiv.org/abs/2006.06848</a>. +</div> +<div id="ref-grathwohl2020your" class="csl-entry" role="doc-biblioentry"> +Grathwohl, Will, Kuan-Chieh Wang, Joern-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, and Kevin Swersky. 2020. <span>“Your Classifier Is Secretly an Energy Based Model and You Should Treat It Like One.â€</span> In. <a href="https://openreview.net/forum?id=Hkxzx0NtDB">https://openreview.net/forum?id=Hkxzx0NtDB</a>. +</div> <div id="ref-joshi2019realistic" class="csl-entry" role="doc-biblioentry"> Joshi, Shalmali, Oluwasanmi Koyejo, Warut Vijitbenjaronk, Been Kim, and Joydeep Ghosh. 2019. <span>“Towards Realistic Individual Recourse and Actionable Explanations in Black-Box Decision Making Systems.â€</span> <a href="https://arxiv.org/abs/1907.09615">https://arxiv.org/abs/1907.09615</a>. </div> +<div id="ref-karimi2021algorithmic" class="csl-entry" role="doc-biblioentry"> +Karimi, Amir-Hossein, Bernhard Schölkopf, and Isabel Valera. 2021. <span>“Algorithmic Recourse: From Counterfactual Explanations to Interventions.â€</span> In <em>Proceedings of the 2021 <span>ACM Conference</span> on <span>Fairness</span>, <span>Accountability</span>, and <span>Transparency</span></em>, 353–62. +</div> +<div id="ref-molnar2020interpretable" class="csl-entry" role="doc-biblioentry"> +Molnar, Christoph. 2020. <em>Interpretable Machine Learning</em>. <span>Lulu. com</span>. +</div> +<div id="ref-pawelczyk2022probabilistically" class="csl-entry" role="doc-biblioentry"> +Pawelczyk, Martin, Teresa Datta, Johannes van-den-Heuvel, Gjergji Kasneci, and Himabindu Lakkaraju. 2022. <span>“Probabilistically <span>Robust</span> <span>Recourse</span>: <span>Navigating</span> the <span>Trade</span>-Offs Between <span>Costs</span> and <span>Robustness</span> in <span>Algorithmic</span> <span>Recourse</span>.â€</span> <em>arXiv Preprint arXiv:2203.06768</em>. +</div> <div id="ref-poyiadzi2020face" class="csl-entry" role="doc-biblioentry"> Poyiadzi, Rafael, Kacper Sokol, Raul Santos-Rodriguez, Tijl De Bie, and Peter Flach. 2020. <span>“<span>FACE</span>: <span>Feasible</span> and Actionable Counterfactual Explanations.â€</span> In <em>Proceedings of the <span>AAAI</span>/<span>ACM Conference</span> on <span>AI</span>, <span>Ethics</span>, and <span>Society</span></em>, 344–50. </div> <div id="ref-schut2021generating" class="csl-entry" role="doc-biblioentry"> Schut, Lisa, Oscar Key, Rory Mc Grath, Luca Costabello, Bogdan Sacaleanu, Yarin Gal, et al. 2021. <span>“Generating <span>Interpretable Counterfactual Explanations By Implicit Minimisation</span> of <span>Epistemic</span> and <span>Aleatoric Uncertainties</span>.â€</span> In <em>International <span>Conference</span> on <span>Artificial Intelligence</span> and <span>Statistics</span></em>, 1756–64. <span>PMLR</span>. </div> +<div id="ref-stutz2022learning" class="csl-entry" role="doc-biblioentry"> +Stutz, David, Krishnamurthy Dj Dvijotham, Ali Taylan Cemgil, and Arnaud Doucet. 2022. <span>“Learning <span>Optimal</span> <span>Conformal</span> <span>Classifiers</span>.â€</span> In. <a href="https://openreview.net/forum?id=t8O-4LKFVx">https://openreview.net/forum?id=t8O-4LKFVx</a>. +</div> +<div id="ref-wachter2017counterfactual" class="csl-entry" role="doc-biblioentry"> +Wachter, Sandra, Brent Mittelstadt, and Chris Russell. 2017. <span>“Counterfactual Explanations Without Opening the Black Box: <span>Automated</span> Decisions and the <span>GDPR</span>.â€</span> <em>Harv. JL & Tech.</em> 31: 841. +</div> </div></section></div></main> <!-- /main column --> <script id="quarto-html-after-body" type="application/javascript"> diff --git a/build/dev/www/mnist_9to4_latent.png b/build/dev/www/mnist_9to4_latent.png new file mode 100644 index 0000000000000000000000000000000000000000..8f6b3a53c7295ef3678d841e79dc5e0bd38f6e3c Binary files /dev/null and b/build/dev/www/mnist_9to4_latent.png differ diff --git a/build/dev/www/you_may_not_like_it.png b/build/dev/www/you_may_not_like_it.png new file mode 100644 index 0000000000000000000000000000000000000000..4b48e585ebcd285c9aa0b2bfabe1c1ff627bbbba Binary files /dev/null and b/build/dev/www/you_may_not_like_it.png differ diff --git a/dev/proposal.qmd b/dev/proposal.qmd index 0987858cbce35735b33c08560de62100851f91ee..08bdb0e73a427b874d46f3b7a2790ef8cb22bcec 100644 --- a/dev/proposal.qmd +++ b/dev/proposal.qmd @@ -1,41 +1,100 @@ --- -title: Conformal Counterfactual Explanations +title: High-Fidelity Counterfactual Explanations through Conformal Prediction subtitle: Research Proposal abstract: | - We propose Conformal Counterfactual Explanations: an effortless and rigorous way to produce realistic and faithful Counterfactual Explanations using Conformal Prediction. To address the need for realistic counterfactuals, existing work has primarily relied on separate generative models to learn the data generating process. While this an effective way to produce plausible and model-agnostic counterfactual explanations, it not only introduces an significant engineering overhead, but also reallocates the task of creating realistic model explanations from the model itsel to the generative model. Recent work has shown that there is no need for any of this when working with probabilistic models that explicitly quantify their own uncertainty. Unfortunately, most models used in practice still do not fulfil that basic requirement, in which case we would like to have a way to quantify predictive uncertainty in a post-hoc fashion. + We propose Conformal Counterfactual Explanations: an effortless and rigorous way to produce realistic and faithful Counterfactual Explanations using Conformal Prediction. To address the need for realistic counterfactuals, existing work has primarily relied on separate generative models to learn the data-generating process. While this is an effective way to produce plausible and model-agnostic counterfactual explanations, it not only introduces a significant engineering overhead but also reallocates the task of creating realistic model explanations from the model itself to the generative model. Recent work has shown that there is no need for any of this when working with probabilistic models that explicitly quantify their own uncertainty. Unfortunately, most models used in practice still do not fulfil that basic requirement, in which case we would like to have a way to quantify predictive uncertainty in a post-hoc fashion. --- +```{julia} +using CounterfactualExplanations +using CounterfactualExplanations.Data: load_mnist +using CounterfactualExplanations.Models: load_mnist_mlp +using Images +using MLDatasets +using MLDatasets: convert2image +using Plots +www_path = "dev/www" +``` + ## Motivation -Counterfactual Explanations are a powerful, flexible and intuitive way to not only explain black-box models, but also enable affected individuals to challenge them though the means of Algorithmic Recourse. +Counterfactual Explanations are a powerful, flexible and intuitive way to not only explain black-box models but also enable affected individuals to challenge them through the means of Algorithmic Recourse. ### Counterfactual Explanations or Adversarial Examples? -Most state-of-the-art approaches to generating Counterfactual Explanations (CE) rely on gradient descent in the feature space. The key idea is to perturb inputs $x\in\mathcal{X}$ into a black-box model $f: \mathcal{X} \mapsto \mathcal{Y}$ in order to change the model output $f(x)$ to some pre-specified target value $t\in\mathcal{Y}$. Formally, this boils down to defining some loss function $\ell(f(x),t)$ and taking gradient steps in the minimizing direction. The so generated counterfactuals are considered valid as soon as the predicted label matches the target label. A stripped down counterfactual explanation is therefore little different from an adversarial example. - -> You may not like it, but this is what counterfactuals look like ... - -The crucial difference between adversarial examples and counterfactuals is one of intent. While adversarial examples are typically intened to go unnoticed, counterfactuals in the context of Explainable AI are generally sought to be "plausible", "realistic" or "feasible". To fulfill this latter goal, researchers have come up with a myriad of ways. @joshi2019realistic were among the first to suggest that instead of searching counterfactuals in the feature space, we can instead traverse a latent embedding learned by a surrogate generative model. Similarly, @poyiadzi2020face use density ... Finally, @karimi2021algorithmic argue that counterfactuals should comply with the causal model that generates them [CHECK IF WE CAN PHASE THIS LIKE THIS]. Other related approaches include ... All of these different approaches have a common goal: they aim to ensure that the generated counterfactuals comply with the (learned) data-generating process (DGB). +Most state-of-the-art approaches to generating Counterfactual Explanations (CE) rely on gradient descent in the feature space. The key idea is to perturb inputs $x\in\mathcal{X}$ into a black-box model $f: \mathcal{X} \mapsto \mathcal{Y}$ in order to change the model output $f(x)$ to some pre-specified target value $t\in\mathcal{Y}$. Formally, this boils down to defining some loss function $\ell(f(x),t)$ and taking gradient steps in the minimizing direction. The so-generated counterfactuals are considered valid as soon as the predicted label matches the target label. A stripped-down counterfactual explanation is therefore little different from an adversarial example. In @fig-adv, for example, generic counterfactual search as in @wachter2017counterfactual has been applied to MNIST data. + +```{julia} +# Data: +counterfactual_data = load_mnist() +X, y = CounterfactualExplanations.DataPreprocessing.unpack_data(counterfactual_data) +input_dim, n_obs = size(counterfactual_data.X) +M = load_mnist_mlp() +# Target: +factual_label = 8 +x = reshape(X[:,rand(findall(predict_label(M, counterfactual_data).==factual_label))],input_dim,1) +target = 3 +factual = predict_label(M, counterfactual_data, x)[1] +# Search: +n_ce = 3 +generator = GenericGenerator() +ces = generate_counterfactual(x, target, counterfactual_data, M, generator; num_counterfactuals=n_ce) +``` + +```{julia} +image_size = 200 +p1 = plot( + convert2image(MNIST, reshape(x,28,28)), + axis=nothing, + size=(image_size, image_size), + title="Factual" +) +plts = [p1] + +counterfactuals = CounterfactualExplanations.counterfactual(ces) +phat = target_probs(ces) +for x in zip(eachslice(counterfactuals; dims=3), eachslice(phat; dims=3)) + ce, _phat = (x[1],x[2]) + _title = "p(y=$(target)|x′)=$(round(_phat[1]; digits=3))" + plt = plot( + convert2image(MNIST, reshape(ce,28,28)), + axis=nothing, + size=(image_size, image_size), + title=_title + ) + plts = [plts..., plt] +end +plt = plot(plts...; size=(image_size * (n_ce + 1),image_size), layout=(1,(n_ce + 1))) +savefig(plt, joinpath(www_path, "you_may_not_like_it.png")) +``` + +{#fig-adv} + +The crucial difference between adversarial examples and counterfactuals is one of intent. While adversarial examples are typically intended to go unnoticed, counterfactuals in the context of Explainable AI are generally sought to be "plausible", "realistic" or "feasible". To fulfil this latter goal, researchers have come up with a myriad of ways. @joshi2019realistic were among the first to suggest that instead of searching counterfactuals in the feature space, we can instead traverse a latent embedding learned by a surrogate generative model. Similarly, @poyiadzi2020face use density ... Finally, @karimi2021algorithmic argues that counterfactuals should comply with the causal model that generates them [CHECK IF WE CAN PHASE THIS LIKE THIS]. Other related approaches include ... All of these different approaches have a common goal: they aim to ensure that the generated counterfactuals comply with the (learned) data-generating process (DGB). ::: {#def-plausible} ## Plausible Counterfactuals -Since we are in the business of explaining a black-box model, the task of learning realistic representations of the data should not be reallocated from the model itself to some surrogate model. +Formally, if $x \sim \mathcal{X}$ and for the corresponding counterfactual we have $x^{\prime}\sim\mathcal{X}^{\prime}$, then for $x^{\prime}$ to be considered a plausible counterfactual, we need: $\mathcal{X} \approxeq \mathcal{X}^{\prime}$. ::: -Formally, if $x \sim \mathcal{X}$ and for the corresponding counterfactual we have $x^{\prime}\sim\mathcal{X}x^{\prime}$, then +In the context of Algorithmic Recourse, it makes sense to strive for plausible counterfactuals, since anything else would essentially require individuals to move to out-of-distribution states. But it is worth noting that our ambition to meet this goal, may have implications on our ability to faithfully explain the behaviour of the underlying black-box model (arguably our principal goal). By essentially decoupling the task of learning plausible representations of the data from the model itself, we open ourselves up to vulnerabilities. Using a separate generative model to learn $\mathcal{X}$, for example, has very serious implications for the generated counterfactuals. @fig-latent compares the results of applying REVISE [@joshi2019realistic] to MNIST data using two different Variational Auto-Encoders: while the counterfactual generated using an expressive (strong) VAE is compelling, the result relying on a less expressive (weak) VAE is not even valid. In this latter case, the decoder step of the VAE fails to yield values in $\mathcal{X}$ and hence the counterfactual search in the learned latent space is doomed. - +{#fig-latent} + +> Here it would be nice to have another example where we poison the data going into the generative model to hide biases present in the data (e.g. Boston housing). -- Show DiCE for weak MLP -- Show Latent for same weak MLP - Latent can be manipulated: - train biased model - train VAE with biased variable removed/attacked (use Boston housing dataset) - hypothesis: will generate bias-free explanations +### From Plausible to High-Fidelity Counterfactuals {#sec-fidelity} + +In light of the findings, we propose to generally avoid using surrogate models to learn $\mathcal{X}$ in the context of Counterfactual Explanations. + ::: {#prp-surrogate} ## Avoid Surrogates @@ -44,25 +103,56 @@ Since we are in the business of explaining a black-box model, the task of learni ::: -## Introduction to Conformal Prediction +In cases where the use of surrogate models cannot be avoided, we propose to weigh the plausibility of counterfactuals against their fidelity to the black-box model. In the context of Explainable AI, fidelity is defined as describing how an explanation approximates the prediction of the black-box model [@molnar2020interpretable]. Fidelity has become the default metric for evaluating Local Model-Agnostic Models, since they often involve local surrogate models whose predictions need not always match those of the black-box model. -- distribution-free, model-agnostic and scalable approach to predictive uncertainty quantification +In the case of Counterfactual Explanations, the concept of fidelity has so far been ignored. This is not altogether surprising, since by construction and design, Counterfactual Explanations work with the predictions of the black-box model directly: as stated above, a counterfactual $x^{\prime}$ is considered valid if and only if $f(x^{\prime})=t$, where $t$ denote some target outcome. -### Post-hoc +Does fidelity even make sense in the context of CE, and if so, how can we define it? In light of the examples in the previous section, we think it is urgent to introduce a notion of fidelity in this context, that relates to the distributional properties of the generated counterfactuals. In particular, we propose that a high-fidelity counterfactual $x^{\prime}$ complies with the class-conditional distribution $\mathcal{X}_{\theta} = p_{\theta}(X|y)$ where $\theta$ denote the black-box model parameters. -- Take any fitted model and turn it into a conformal model using calibration data. +::: {#def-fidele} + +## High-Fidelity Counterfactuals + +Let $\mathcal{X}_{\theta}|y = p_{\theta}(X|y)$ denote the class-conditional distribution of $X$ defined by $\theta$. Then for $x^{\prime}$ to be considered a high-fidelity counterfactual, we need: $\mathcal{X}_{\theta}|t \approxeq \mathcal{X}^{\prime}$ where $t$ denotes the target outcome. + +::: + +In order to assess the fidelity of counterfactuals, we propose the following two-step procedure: + +1) Generate samples $X_{\theta}|y$ and $X^{\prime}$ from $\mathcal{X}_{\theta}|t$ and $\mathcal{X}^{\prime}$, respectively. +2) Compute the Maximum Mean Discrepancy (MMD) between $X_{\theta}|y$ and $X^{\prime}$. + +If the computed value is different from zero, we can reject the null-hypothesis of fidelity. -### Intrinsic --- Conformal Training [MAYBE] +> Two challenges here: 1) implementing the sampling procedure in @grathwohl2020your; 2) it is unclear if MMD is really the right way to measure this. -- Model explicitly trained for conformal prediction. +## Conformal Counterfactual Explanations -## Conformal Counterfactuals +In @sec-fidelity, we have advocated for avoiding surrogate models in the context of Counterfactual Explanations. In this section, we introduce an alternative way to generate high-fidelity Counterfactual Explanations. In particular, we propose Conformal Counterfactual Explanations (CCE), that is Counterfactual Explanations that minimize the predictive uncertainty of conformal models. + +### Minimizing Predictive Uncertainty + +@schut2021generating demonstrated that the goal of generating realistic (plausible) counterfactuals can also be achieved by seeking counterfactuals that minimize the predictive uncertainty of the underlying black-box model. Similarly, @antoran2020getting ... -- Realistic counterfactuals by minimizing predictive uncertainty [@schut2021generating]. - Problem: restricted to Bayesian models. -- Solution: post-hoc predictive uncertainty quantification. +- Solution: post-hoc predictive uncertainty quantification. In particular, Conformal Prediction. + +### Background on Conformal Prediction + +- Distribution-free, model-agnostic and scalable approach to predictive uncertainty quantification. - Conformal prediction is instance-based. So is CE. -- Does the coverage guarantee carry over to counterfactuals? +- Take any fitted model and turn it into a conformal model using calibration data. +- Our approach, therefore, relaxes the restriction on the family of black-box models, at the cost of relying on a subset of the data. Arguably, data is often abundant and in most applications practitioners tend to hold out a test data set anyway. + +> Does the coverage guarantee carry over to counterfactuals? + +### Generating Conformal Counterfactuals + +While Conformal Prediction has recently grown in popularity, it does introduce a challenge in the context of classification: the predictions of Conformal Classifiers are set-valued and therefore difficult to work with, since they are, for example, non-differentiable. Fortunately, @stutz2022learning introduced carefully designed differentiable loss functions that make it possible to evaluate the performance of conformal predictions in training. We can leverage these recent advances in the context of gradient-based counterfactual search ... + +> Challenge: still need to implement these loss functions. + +## Experiments ### Research Questions @@ -72,14 +162,15 @@ Since we are in the business of explaining a black-box model, the task of learni - What happens as we vary coverage and setsize? - What happens as we improve the model robustness? - What happens as we improve the model's ability to incorporate predictive uncertainty (deep ensemble, laplace)? +- What happens if we combine with DiCE, ClaPROAR, Gravitational? +- What about CE robustness to endogenous shifts [@altmeyer2023endogenous]? -## Experiments - -- Maybe: conformalised Laplace - Benchmarking: - - add PROBE into the mix + - add PROBE [@pawelczyk2022probabilistically] into the mix. - compare travel costs to domain shits. +> Nice to have: What about using Laplace Approximation, then Conformal Prediction? What about using Conformalised Laplace? + ## References diff --git a/dev/www/you_may_not_like_it.png b/dev/www/you_may_not_like_it.png new file mode 100644 index 0000000000000000000000000000000000000000..4b48e585ebcd285c9aa0b2bfabe1c1ff627bbbba Binary files /dev/null and b/dev/www/you_may_not_like_it.png differ