Skip to content
Snippets Groups Projects
Commit da719d42 authored by pat-alt's avatar pat-alt
Browse files

changing format

parent ae9a78ac
No related branches found
No related tags found
1 merge request!8985 overshooting
......@@ -12,7 +12,7 @@ Elements of this explanation are already scattered across the paper, but we agre
2. Core innovation: need more visualizations in 2D/3D space
Following the reviewers suggestion, we have plotted the distance of randomly generated MNIST images from images in the target class against their energy-constrained score. As expected, this relationship is positive: the higher the distance, the higher the corresponding generative loss. The size of this relationship appears to depend positively on the model's generative property: the observed relationships are stronger for joint energy models.
3. Structural clarity: add a flow chart
......@@ -20,15 +20,13 @@ Figure 2 shows our synthetic linearly separable data in the feature space, so th
Adding a systematic flowchart is a great idea. Due to limited scope, may we suggest adding the following flowchart to the appendix? Alternatively, we may swap out Figure 2 for the flowchart.
# Reviewer 2
## Weaknesses:
1. Why the embedding?
We agree that for any type of surrogate model, there is a risk of introducing bias. In exceptional cases, however, it may be necessary to accept some degree of bias in favour of plausibility. Our results for *ECCCo+* demonstrate this tradeoff as we discuss in Section 6.3. In the context of PCA, the introduced bias can be explained intuitively: by constraining the counterfactual search to the space spanned by the first $n_z$ principal components, the search is sensitive only to the variation in the data explained by those components. In other words, we would expect counterfactuals to be less sensitive to small variations in features that do not typically vary much. It is therefore an intuitive finding, that *ECCCo+* tends to generate less noisy counterfactual images, for example (the same is true for *REVISE*). In our mind, restricting the search space to the first $n_z$ components quite literally corresponds to denoising the search space and hence the resulting counterfactuals. We will highlight this rationale in Section 6.3.
We agree that for any type of surrogate model, there is a risk of introducing bias. In exceptional cases, however, it may be necessary to accept some degree of bias in favour of plausibility. Our results for \textit{ECCCo+} demonstrate this tradeoff as we discuss in Section 6.3. In the context of PCA, the introduced bias can be explained intuitively: by constraining the counterfactual search to the space spanned by the first $n_z$ principal components, the search is sensitive only to the variation in the data explained by those components. In other words, we would expect counterfactuals to be less sensitive to small variations in features that do not typically vary much. It is therefore an intuitive finding, that \textit{ECCCo+} tends to generate less noisy counterfactual images, for example (the same is true for \textit{REVISE}). In our mind, restricting the search space to the first $n_z$ components quite literally corresponds to denoising the search space and hence the resulting counterfactuals. We will highlight this rationale in Section 6.3.
We think that the bias introduced by PCA may be acceptable in some cases, precisely because it "will not add any information on the input distribution" as the reviewer correctly points out. To maintain faithfulness, we want to avoid introducing additional information through surrogate models as much as possible. We will make this intuition clearer in Section 6.3.
......@@ -52,11 +50,11 @@ As we mention in the additonal author response, we investigated different distan
4. Faithfulness measure biased?
We have taken measures to not unfairly bias our generator with respect to the unfaithfulness metric: instead of penalizing the unfaithfulness metric directly, we penalize model energy in our preferred implementation. In contrast, *Wachter* penalizes the closeness criterion directly and hence does particularly well in this regard. That being said, *ECCCo* is of course designed to generate faithful explanations first and foremost and therefore has an advantage with respect to our faithfulness metric. In lieue of other established metrics to measure faithfulness, we can only point out that *ECCCo* achieves strong performance for other commonly used metrics as well. With respect to *validity*, for example, which as we have explained corresponds to *fidelity*, *ECCCo* typically outperformans *REVISE* and *Schut*.
We have taken measures to not unfairly bias our generator with respect to the unfaithfulness metric: instead of penalizing the unfaithfulness metric directly, we penalize model energy in our preferred implementation. In contrast, \textit{Wachter} penalizes the closeness criterion directly and hence does particularly well in this regard. That being said, \textit{ECCCo} is of course designed to generate faithful explanations first and foremost and therefore has an advantage with respect to our faithfulness metric. In lieue of other established metrics to measure faithfulness, we can only point out that \textit{ECCCo} achieves strong performance for other commonly used metrics as well. With respect to \textit{validity}, for example, which as we have explained corresponds to \textit{fidelity}, \textit{ECCCo} typically outperformans \textit{REVISE} and \textit{Schut}.
Our joint energy models (JEM) are indeed explicitly trained to model $\mathcal{X}|y$ and the same quantity is used in our proposed faithfulness metric. But the faithfulness metric itself is not computed with respect to samples generated by our JEMs. It is computed with respect to counterfactuals generated by merely constraining model energy and we would therefore argue that it is not unfairly biased. Our empirical findings support this argument: firstly, *ECCCo* achieves high faithfulness also for classifiers that have not been trained to model $\mathcal{X}|y$; secondly, our additional results in the appendix for *ECCCo-L1* show that if we do indeed explicitly penalise the unfaithfulness metric, we achieve even better results in this regard.
Our joint energy models (JEM) are indeed explicitly trained to model $\mathcal{X}|y$ and the same quantity is used in our proposed faithfulness metric. But the faithfulness metric itself is not computed with respect to samples generated by our JEMs. It is computed with respect to counterfactuals generated by merely constraining model energy and we would therefore argue that it is not unfairly biased. Our empirical findings support this argument: firstly, \textit{ECCCo} achieves high faithfulness also for classifiers that have not been trained to model $\mathcal{X}|y$; secondly, our additional results in the appendix for \textit{ECCCo-L1} show that if we do indeed explicitly penalise the unfaithfulness metric, we achieve even better results in this regard.
6. Test with unreliable models
We would argue that the simple multi-layer perceptrons (MLPs) are unreliable, especially compared to ensembles, joint energy models and convolutional neural networks for our image datasets. Simple neural networks have been shown to be vulnerable to adversarial attacks, which makes them susceptible to implausible counterfactual explanations as we point out in Section 3. Our results support this notion, in that they demonstrate faithful model explanations only coincide with high plausibility if the model itself has been trained to be more reliable. Consistent with the idea proposed by the reviewer, we originally considered introducing "poisoned" VAEs as well, to illustrate what we identify as the key vulnerability of *REVISE*. If the underlying VAE is trained on poisoned data, this could be expected to adversely affect counterfactual outcomes as well. We ultimately discarded this idea due to limited scope and because we decided that Section 3 sufficiently illustrates our thinking.
We would argue that the simple multi-layer perceptrons (MLPs) are unreliable, especially compared to ensembles, joint energy models and convolutional neural networks for our image datasets. Simple neural networks have been shown to be vulnerable to adversarial attacks, which makes them susceptible to implausible counterfactual explanations as we point out in Section 3. Our results support this notion, in that they demonstrate faithful model explanations only coincide with high plausibility if the model itself has been trained to be more reliable. Consistent with the idea proposed by the reviewer, we originally considered introducing "poisoned" VAEs as well, to illustrate what we identify as the key vulnerability of \textit{REVISE}. If the underlying VAE is trained on poisoned data, this could be expected to adversely affect counterfactual outcomes as well. We ultimately discarded this idea due to limited scope and because we decided that Section 3 sufficiently illustrates our thinking.
......@@ -12,34 +12,45 @@ include("$(pwd())/experiments/setup_env.jl")
## Counterfactual Path - MNIST
```{julia}
outcome = Serialization.deserialize("results/moons_outcome.jls")
data = outcome.exper.counterfactual_data
```
```{julia}
factual = data.y_levels[2]
target = data.y_levels[1]
data_name = "mnist"
plt_order = ["MLP", "MLP Ensemble", "LeNet-5", "JEM", "JEM Ensemble"]
models = Serialization.deserialize("models/$(data_name)_models.jls")
data = eval(Meta.parse("load_$(data_name)()"))
plt_order = plt_order[[x in collect(keys(models)) for x in plt_order]]
```
```{julia}
using Plots.PlotMeasures
n_samp = 100
n_rand = 500
alpha = 0.2
σ = 0.1
plts = []
for (mod_name, model) in outcome.model_dict
t = get_target_index(data.y_levels, target)
E(x) = -logits(model, x)[t, :]
x_samp = data.X[:,rand(findall(data.output_encoder.labels.==target),n_samp)]
x_rand = Float32.(randn(size(data.X,1), n_rand))
dist = map(eachcol(x_rand)) do x
mean(map(y -> norm(x.-y),eachcol(x_samp)))
for (mod_name, model) in models
Δ = []
L = []
for i in 1:n_rand
factual = rand(data.y_levels)
target = rand(data.y_levels[data.y_levels .!= factual])
t = get_target_index(data.y_levels, target)
E(x) = -logits(model, x)[t, :]
x_samp = data.X[:,rand(findall(data.output_encoder.labels.==target),n_samp)]
x_rand = data.X[:,rand(findall(data.output_encoder.labels.==factual),1)]
x_rand .+= Float32.(randn(size(x_samp,1),1)) .* σ
δ = mean(map(y -> norm(x_rand.-y),eachcol(x_samp)))
push!(Δ,δ)
l = E(x_rand)[1]
push!(L,l)
end
plt = scatter(
E(x_rand), dist;
fillalpha=alpha, label="", title=mod_name, smooth=:true,
Δ, L;
label="", title=mod_name, smooth=:true,
lc=:red, lw=2
)
push!(plts, plt)
end
plot(plts..., layout=(1,length(outcome.model_dict)), size=(1000,250))
plot(
plts[sortperm(collect(keys(models)))[invperm(sortperm(plt_order))]]...,
layout=(1,length(models)), size=(1200,1100/length(models)),
margin=5mm
)
```
\ No newline at end of file
File added
%File: anonymous-submission-latex-2024.tex
\documentclass[letterpaper]{article} % DO NOT CHANGE THIS
\usepackage[submission]{aaai24} % DO NOT CHANGE THIS
\usepackage{times} % DO NOT CHANGE THIS
\usepackage{helvet} % DO NOT CHANGE THIS
\usepackage{courier} % DO NOT CHANGE THIS
\usepackage[hyphens]{url} % DO NOT CHANGE THIS
\usepackage{graphicx} % DO NOT CHANGE THIS
\urlstyle{rm} % DO NOT CHANGE THIS
\def\UrlFont{\rm} % DO NOT CHANGE THIS
\usepackage{natbib} % DO NOT CHANGE THIS AND DO NOT ADD ANY OPTIONS TO IT
\usepackage{caption} % DO NOT CHANGE THIS AND DO NOT ADD ANY OPTIONS TO IT
\frenchspacing % DO NOT CHANGE THIS
\setlength{\pdfpagewidth}{8.5in} % DO NOT CHANGE THIS
\setlength{\pdfpageheight}{11in} % DO NOT CHANGE THIS
%
% These are recommended to typeset algorithms but not required. See the subsubsection on algorithms. Remove them if you don't have algorithms in your paper.
\usepackage{algorithm}
% \usepackage{algorithmic}
%
% These are are recommended to typeset listings but not required. See the subsubsection on listing. Remove this block if you don't have listings in your paper.
% \usepackage{newfloat}
% \usepackage{listings}
% \DeclareCaptionStyle{ruled}{labelfont=normalfont,labelsep=colon,strut=off} % DO NOT CHANGE THIS
% \lstset{%
% basicstyle={\footnotesize\ttfamily},% footnotesize acceptable for monospace
% numbers=left,numberstyle=\footnotesize,xleftmargin=2em,% show line numbers, remove this entire line if you don't want the numbers.
% aboveskip=0pt,belowskip=0pt,%
% showstringspaces=false,tabsize=2,breaklines=true}
% \floatstyle{ruled}
% \newfloat{listing}{tb}{lst}{}
% \floatname{listing}{Listing}
%
% Keep the \pdfinfo as shown here. There's no need
% for you to add the /Title and /Author tags.
\pdfinfo{
/TemplateVersion (2024.1)
}
\usepackage{amsfonts} % blackboard math symbols
\usepackage{amsmath}
\usepackage{amsthm}
\usepackage{caption}
\usepackage{graphicx}
\usepackage{algpseudocode}
\usepackage{import}
\usepackage{booktabs}
\usepackage{longtable}
\usepackage{array}
\usepackage{multirow}
\usepackage{placeins}
% Numbered Environments:
\newtheorem{definition}{Definition}[section]
\newtheorem{question}{Research Question}[section]
% Bibliography
% \bibliographystyle{unsrtnat}
% \setcitestyle{numbers,square,comma}
% Algorithm
\renewcommand{\algorithmicrequire}{\textbf{Input:}}
\renewcommand{\algorithmicensure}{\textbf{Output:}}
% DISALLOWED PACKAGES
% \usepackage{authblk} -- This package is specifically forbidden
% \usepackage{balance} -- This package is specifically forbidden
% \usepackage{color (if used in text)
% \usepackage{CJK} -- This package is specifically forbidden
% \usepackage{float} -- This package is specifically forbidden
% \usepackage{flushend} -- This package is specifically forbidden
% \usepackage{fontenc} -- This package is specifically forbidden
% \usepackage{fullpage} -- This package is specifically forbidden
% \usepackage{geometry} -- This package is specifically forbidden
% \usepackage{grffile} -- This package is specifically forbidden
% \usepackage{hyperref} -- This package is specifically forbidden
% \usepackage{navigator} -- This package is specifically forbidden
% (or any other package that embeds links such as navigator or hyperref)
% \indentfirst} -- This package is specifically forbidden
% \layout} -- This package is specifically forbidden
% \multicol} -- This package is specifically forbidden
% \nameref} -- This package is specifically forbidden
% \usepackage{savetrees} -- This package is specifically forbidden
% \usepackage{setspace} -- This package is specifically forbidden
% \usepackage{stfloats} -- This package is specifically forbidden
% \usepackage{tabu} -- This package is specifically forbidden
% \usepackage{titlesec} -- This package is specifically forbidden
% \usepackage{tocbibind} -- This package is specifically forbidden
% \usepackage{ulem} -- This package is specifically forbidden
% \usepackage{wrapfig} -- This package is specifically forbidden
% DISALLOWED COMMANDS
% \nocopyright -- Your paper will not be published if you use this command
% \addtolength -- This command may not be used
% \balance -- This command may not be used
% \baselinestretch -- Your paper will not be published if you use this command
% \clearpage -- No page breaks of any kind may be used for the final version of your paper
% \columnsep -- This command may not be used
% \newpage -- No page breaks of any kind may be used for the final version of your paper
% \pagebreak -- No page breaks of any kind may be used for the final version of your paperr
% \pagestyle -- This command may not be used
% \tiny -- This is not an acceptable font size.
% \vspace{- -- No negative value may be used in proximity of a caption, figure, table, section, subsection, subsubsection, or reference
% \vskip{- -- No negative value may be used to alter spacing above or below a caption, figure, table, section, subsection, subsubsection, or reference
\setcounter{secnumdepth}{2} %May be changed to 1 or 2 if section numbers are desired.
% The file aaai24.sty is the style file for AAAI Press
% proceedings, working notes, and technical reports.
%
% Title
% Your title must be in mixed case, not sentence case.
% That means all verbs (including short verbs like be, is, using,and go),
% nouns, adverbs, adjectives should be capitalized, including both words in hyphenated terms, while
% articles, conjunctions, and prepositions are lower case unless they
% directly follow a colon or long dash
\title{Faithful Model Explanations through\\
Energy-Constrained Conformal Counterfactuals}
\author{
%Authors
% All authors must be in the same font size and format.
Written by AAAI Press Staff\textsuperscript{\rm 1}\thanks{With help from the AAAI Publications Committee.}\\
AAAI Style Contributions by Pater Patel Schneider,
Sunil Issar,\\
J. Scott Penberthy,
George Ferguson,
Hans Guesgen,
Francisco Cruz\equalcontrib,
Marc Pujol-Gonzalez\equalcontrib
}
\affiliations{
%Afiliations
\textsuperscript{\rm 1}Association for the Advancement of Artificial Intelligence\\
% If you have multiple authors and multiple affiliations
% use superscripts in text and roman font to identify them.
% For example,
% Sunil Issar\textsuperscript{\rm 2},
% J. Scott Penberthy\textsuperscript{\rm 3},
% George Ferguson\textsuperscript{\rm 4},
% Hans Guesgen\textsuperscript{\rm 5}
% Note that the comma should be placed after the superscript
1900 Embarcadero Road, Suite 101\\
Palo Alto, California 94303-3310 USA\\
% email address must be in roman text type, not monospace or sans serif
proceedings-questions@aaai.org
%
% See more examples next
}
%Example, Single Author, ->> remove \iffalse,\fi and place them surrounding AAAI title to use it
% \iffalse
% \title{My Publication Title --- Single Author}
% \author {
% Author Name
% }
% \affiliations{
% Affiliation\\
% Affiliation Line 2\\
% name@example.com
% }
% \fi
% \iffalse
% %Example, Multiple Authors, ->> remove \iffalse,\fi and place them surrounding AAAI title to use it
% \title{My Publication Title --- Multiple Authors}
% \author {
% % Authors
% First Author Name\textsuperscript{\rm 1},
% Second Author Name\textsuperscript{\rm 2},
% Third Author Name\textsuperscript{\rm 1}
% }
% \affiliations {
% % Affiliations
% \textsuperscript{\rm 1}Affiliation 1\\
% \textsuperscript{\rm 2}Affiliation 2\\
% firstAuthor@affiliation1.com, secondAuthor@affilation2.com, thirdAuthor@affiliation1.com
% }
% \fi
\begin{document}
\subsubsection{Experiment results: linguistic explanation of results}
Following the suggestion by the reviewer, we plan to add a the following linguistic explanation in a prominent place of Section 6:
"Overall, our findings demonstrate that \textit{ECCCo} produces plausible counterfactuals if and only if the black-box model itself has learned plausible explanations for the data. Thus, \textit{ECCCo} avoids the risk of generating plausible but potentially misleading explanations for models that are highly susceptible to implausible explanations. We, therefore, believe that \textit{ECCCo} can help researchers and practitioners to generate explanations they can trust and discern unreliable from trustworthy models."
Elements of this explanation are already scattered across the paper, but we agree that it would be useful to highlight this notion in Section 6.
\subsubsection{Core innovation: need more visualizations}
Following the reviewer's suggestion, we have plotted the distance of randomly generated MNIST images from images in the target class against their energy-constrained score. As expected, this relationship is positive: the higher the distance, the higher the corresponding generative loss. The size of this relationship appears to depend positively on the model's generative property: the observed relationships are stronger for joint energy models.
\subsubsection{Structural clarity: add a flow chart}
Adding a systematic flowchart is a great idea. Due to the limited scope, may we suggest adding the following flowchart to the appendix? Alternatively, we may swap out Figure 2 for the flowchart.
\subsubsection{Why use an embedding}
We agree that for any type of surrogate model, there is a risk of introducing bias. In exceptional cases, however, it may be necessary to accept some degree of bias in favour of plausibility. Our results for \textit{ECCCo+} demonstrate this tradeoff as we discuss in Section 6.3. In the context of PCA, the introduced bias can be explained intuitively: by constraining the counterfactual search to the space spanned by the first $n_z$ principal components, the search is sensitive only to the variation in the data explained by those components. In other words, we would expect counterfactuals to be less sensitive to small variations in features that do not typically vary much. It is therefore an intuitive finding, that \textit{ECCCo+} tends to generate less noisy counterfactual images, for example (the same is true for \textit{REVISE}). In our mind, restricting the search space to the first $n_z$ components quite literally corresponds to denoising the search space and hence the resulting counterfactuals. We will highlight this rationale in Section 6.3.
We think that the bias introduced by PCA may be acceptable in some cases, precisely because it "will not add any information on the input distribution" as the reviewer correctly points out. To maintain faithfulness, we want to avoid introducing additional information through surrogate models as much as possible. We will make this intuition clearer in Section 6.3.
Another argument in favour of using a lower-dimensional latent embedding is the reduction in computational costs, which can be prohibitive for high-dimensional input data. We will highlight this in Section 5.
\subsubsection{What is "epsilon" and "s"}
From the paper: "$\mathbf{r}_j \sim \mathcal{N}(\mathbf{0},\mathbf{I})$ is the stochastic term and the step-size $\epsilon_j$ is typically polynomially decayed. [...] To allow for faster sampling, we follow the common practice of choosing the step-size $\epsilon_j$ and the standard deviation of $\mathbf{r}_j$ separately." We go on to explain in the appendix that we use the following biased sampler
$$
\hat{\mathbf{x}}_{j+1} \leftarrow \hat{\mathbf{x}}_j - \frac{\phi}{2} \mathcal{E}_{\theta}(\hat{\mathbf{x}}_j|\mathbf{y}^+) + \sigma \mathbf{r}_j, j=1,...,J
$$
where "consistent with~\citet{grathwohl2020your}, we have specified $\phi=2$ and $\sigma=0.01$ as the default values for all of our experiments". Intuitively, $\epsilon_j$ determines the size of gradient updates and random noise in each iteration of SGLD.
Regarding $s(\cdot)$, this was an oversight, apologies. In the appendix we explain that "[the calibration dataset] is then used to compute so-called nonconformity scores: $\mathcal{S}=\{s(\mathbf{x}_i,\mathbf{y}_i)\}_{i \in \mathcal{D}_{\text{cal}}}$ where $s: (\mathcal{X},\mathcal{Y}) \mapsto \mathbb{R}$ is referred to as \textit{score function}." We will add this in Section 4.2 of the main paper.
\subsubsection{Euclidean distance}
As we mentioned in the additional author response, we investigated different distance metrics. We found that the overall qualitative results were largely independent of the exact metric. In the context of the high-dimensional image data, we still decided to report the results for a dissimilarity metric that is more appropriate in this context. All of our distance-based metrics are computed with respect to features, not latent features. This is because, as the reviewer correctly points out, we would expect certain discrepancies between distances evaluated in the feature space and distances evaluated in the latent space of the VAE, for example. Working in the feature space does come with higher computational costs, but the evaluation of counterfactuals was generally less costly than generating counterfactuals in the first place. In cases where high dimensionality leads to prohibitive computational costs, we would suggest either reducing the number of nearest neighbors or working in a lower-dimensional subspace that is independent of the underlying classifier itself (such as PCA).
\subsubsection{Faithfulness metric: is it fair?}
We have taken measures to not unfairly bias our generator with respect to the unfaithfulness metric: instead of penalizing the unfaithfulness metric directly, we penalize model energy in our preferred implementation. In contrast, \textit{Wachter} penalizes the closeness criterion directly and hence does particularly well in this regard. That being said, \textit{ECCCo} is of course designed to generate faithful explanations first and foremost and therefore has an advantage with respect to our faithfulness metric. In lieue of other established metrics to measure faithfulness, we can only point out that \textit{ECCCo} achieves strong performance for other commonly used metrics as well. With respect to \textit{validity}, for example, which as we have explained corresponds to \textit{fidelity}, \textit{ECCCo} typically outperforms \textit{REVISE} and \textit{Schut}.
Our joint energy models (JEM) are indeed explicitly trained to model $\mathcal{X}|y$ and the same quantity is used in our proposed faithfulness metric. However, the faithfulness metric itself is not computed with respect to samples generated by our JEMs. It is computed with respect to counterfactuals generated by merely constraining model energy and we would therefore argue that it is not unfairly biased. Our empirical findings support this argument: firstly, \textit{ECCCo} achieves high faithfulness also for classifiers that have not been trained to model $\mathcal{X}|y$; secondly, our additional results in the appendix for \textit{ECCCo-L1} show that if we do indeed explicitly penalize the unfaithfulness metric, we achieve even better results in this regard.
\subsubsection{Test with unreliable models}
We would argue that the simple multi-layer perceptrons (MLPs) are unreliable, especially compared to ensembles, joint energy models and convolutional neural networks for our image datasets. Simple neural networks have been shown to be vulnerable to adversarial attacks, which makes them susceptible to implausible counterfactual explanations as we point out in Section 3. Our results support this notion, in that they demonstrate faithful model explanations only coincide with high plausibility if the model itself has been trained to be more reliable. Consistent with the idea proposed by the reviewer, we originally considered introducing "poisoned" VAEs as well, to illustrate what we identify as the key vulnerability of \textit{REVISE}. If the underlying VAE is trained on poisoned data, this could be expected to adversely affect counterfactual outcomes as well. We ultimately discarded this idea due to limited scope and because we decided that Section 3 sufficiently illustrates our thinking.
\FloatBarrier
\bibliography{../bib}
\end{document}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment