more on author response

0a6eb3c5 · pat-alt · c5c9ba0e · 0a6eb3c5 · 0a6eb3c5 · 0a6eb3c5
Commit 0a6eb3c5 authored 1 year ago by pat-alt
--- a/paper/aaai/author_response.md
+++ b/paper/aaai/author_response.md
 # Author Response

 1. Applied to additional commonly used tabular real-world datasets
-2. Energy delta
-   1. Better results, in particular for image data
-   2. No longer biased (addressing reviewer concern)
+2. Constraining energy directly
+   1. Better results across the board, in particular for image data
+   2. Derived from JEM loss function -> more theoretically grounded
+   3. No sampling overhead.
+   4. Energy does not depend on differentiability.
+   5. Benchmarks no longer biased with respect to unfaithfulness metric (addressing reviewer concern).
 3. Counterfactual explanations do not scale well to high-dimensional input data
-   1. We have added native support for multi-processing and multi-threading
-   2. We have run more extensive experiments including fine-tuning hyperparameter choices
+   1. We have added native support for multi-processing and multi-threading.
+   2. We have run more extensive experiments including fine-tuning hyperparameter choices.
+   3. For image data we use PCA to map counterfactuals to a smaller dimenionsional latent space, which not only reduces costs of gradient computations but also leads to higher plausibility.
+   4. PCA is much less costly and interventionist than a VAE: pricipal component merely represent variation in the data; nothing else about the data is learned by the surrogate. 
+      1. ECCCo-$\Delta$ (latent) remains faithful, although not as faithful as standard ECCCo-$\Delta$.
 4. We have revisited the mathematical notation.
 5. We have moved the introduction of conformal prediction forward and added more detail in line with reviewer feedback.
 6. We have extended the limitations section. 
 7. Distance metric
   1. We have revisited the distance metrics and decided to use the L2 Norm for plausibility and faithfulness
   2. Orginially, we used the L1 Norm in line with how the the closeness criterium is commonly evaluated. But in this context the L1 Norm implicitly addresses the desire for sparsity.
-   3. In the case of image data, we also used cosine distance.
+   3. In the case of image data, we investigated various additional distance metrics:
+      1. Cosine similarity
+      2. Euclidean distance
+      3. Ultimately we chose to rely on structural dissimilarity.
   
\ No newline at end of file
--- a/paper/aaai/paper.pdf
+++ b/paper/aaai/paper.pdf
--- a/paper/body.tex
+++ b/paper/body.tex
@@ -152,20 +152,12 @@ We begin by stating our proposed objective function, which involves tailored los

 \begin{equation} \label{eq:eccco}
  \begin{aligned}
-  \mathbf{Z}^\prime &= \arg \min_{\mathbf{Z}^\prime \in \mathcal{Z}^L} \{  {\text{yloss}(M_{\theta}(f(\mathbf{Z}^\prime)),\mathbf{y}^+)}+ \lambda_{1} {\text{dist}(f(\mathbf{Z}^\prime),\mathbf{x}) } \\
-  &+ \lambda_2 \Delta\mathcal{E}_{\theta}(\mathbf{Z}^\prime,\widehat{\mathbf{X}}_{\theta,\mathbf{y}^+}) + \lambda_3 \Omega(C_{\theta}(f(\mathbf{Z}^\prime);\alpha)) \} 
+  \mathbf{Z}^\prime =& \arg \min_{\mathbf{Z}^\prime \in \mathcal{Z}^L} \{  {\text{yloss}(M_{\theta}(f(\mathbf{Z}^\prime)),\mathbf{y}^+)}+ \lambda_{1} {\text{dist}(f(\mathbf{Z}^\prime),\mathbf{x}) } \\
+  &+ \lambda_2 \mathcal{E}_{\theta}(f(\mathbf{Z}^\prime)) + \lambda_3 \Omega(C_{\theta}(f(\mathbf{Z}^\prime);\alpha)) \} 
  \end{aligned} 
 \end{equation}

-The first penalty term involving $\lambda_1$ induces proximity like in~\citet{wachter2017counterfactual}. Our default choice for $\text{dist}(\cdot)$ is the L1 Norm due to its sparsity-inducing properties. The second penalty term involving $\lambda_2$ induces faithfulness by constraining the energy of the generated counterfactual where we have:
-
-\begin{equation} \label{eq:energy-delta}
-  \begin{aligned}
-    \Delta\mathcal{E}_{\theta}&=\mathcal{E}_{\theta}(f(\mathbf{Z}^\prime)|\mathbf{y}^+)-\mathcal{E}_{\theta}(x|\mathbf{y}^+) &&& x \sim \widehat{\mathbf{X}}_{\theta,\mathbf{y}^+}
-  \end{aligned}
-\end{equation}
-
-In particular, this penalty ensures that the energy of the generated counterfactual is in balance with the energy of the generated conditional samples ($\widehat{\mathbf{X}}_{\theta,\mathbf{y}^+}$). The third and final penalty term involving $\lambda_3$ ensures that the generated counterfactual is associated with low predictive uncertainty.
+The first penalty term involving $\lambda_1$ induces proximity like in~\citet{wachter2017counterfactual}. Our default choice for $\text{dist}(\cdot)$ is the L1 Norm due to its sparsity-inducing properties. The second penalty term involving $\lambda_2$ induces faithfulness by constraining the energy of the generated counterfactual. The third and final penalty term involving $\lambda_3$ ensures that the generated counterfactual is associated with low predictive uncertainty.

 \begin{figure}
  \centering