author_response.md



Author Response

Applied to additional commonly used tabular real-world datasets
Constraining energy directly

Better results across the board, in particular for image data
Derived from JEM loss function -> more theoretically grounded
No sampling overhead.
Energy does not depend on differentiability.
Benchmarks no longer biased with respect to unfaithfulness metric (addressing reviewer concern).


Counterfactual explanations do not scale well to high-dimensional input data

We have added native support for multi-processing and multi-threading.
We have run more extensive experiments including fine-tuning hyperparameter choices.
For image data we use PCA to map counterfactuals to a smaller dimenionsional latent space, which not only reduces costs of gradient computations but also leads to higher plausibility.
PCA is much less costly and interventionist than a VAE: pricipal component merely represent variation in the data; nothing else about the data is learned by the surrogate.

ECCCo- $\Delta$ 
 (latent) remains faithful, although not as faithful as standard ECCCo- $\Delta$ 
.


We have revisited the mathematical notation.
We have moved the introduction of conformal prediction forward and added more detail in line with reviewer feedback.
We have extended the limitations section.
Distance metric

We have revisited the distance metrics and decided to use the L2 Norm for plausibility and faithfulness
Orginially, we used the L1 Norm in line with how the the closeness criterium is commonly evaluated. But in this context the L1 Norm implicitly addresses the desire for sparsity.
In the case of image data, we investigated various additional distance metrics:

Cosine similarity
Euclidean distance
Ultimately we chose to rely on structural dissimilarity.