@@ -237,28 +238,36 @@ where $\hat{\mathbf{x}}_{\theta}$ denotes samples generated using SGLD (Equation
The first two terms in Equation~\ref{eq:eccco} correspond to the counterfactual search objective defined in~\citet{wachter2017counterfactual} which merely penalises the distance of counterfactuals from their factual values. The additional two penalties in ECCCo ensure that counterfactuals conform with the model's generative property and lead to minimally uncertain predictions, respectively. The hyperparameters $\lambda_1, ..., \lambda_3$ can be used to balance the different objectives: for example, we may choose to incur larger deviations from the factual in favour of conformity with the model's generative property by choosing lower values of $\lambda_1$ and relatively higher values of $\lambda_2$.
\begin{algorithm}
\caption{An algorithm with caption}\label{alg:cap}
\begin{minipage}[t]{0.45\textwidth}
\begin{algorithmic}
\Require$n \geq0$
\Ensure$y = x^n$
\State$y \gets1$
\State$X \gets x$
\State$N \gets n$
\While{$N \neq0$}
\If{$N$ is even}
\State$X \gets X \times X$
\State$N \gets\frac{N}{2}$\Comment{This is a comment}
\captionof{figure}{Using surrogates can improve plausibility, but also increases vulnerability. Counterfactuals for turning an 8 (eight) into a 3 (three): original image (left); counterfactual produced using REVISE \citep{joshi2019realistic} with a well-specified surrogate (centre); and a counterfactual produced using REVISE \citep{joshi2019realistic} with a poorly specified surrogate (right).}\label{fig:vae}
\end{minipage}
\hfill
\begin{minipage}[c]{0.45\textwidth}
\captionof{algorithm}{Generating ECCCos (For more details, see Appendix~\ref{app:eccco})}\label{alg:eccco}
\begin{algorithmic}[1]
\Require$\mathbf{x}, \mathbf{y}^*, M_{\theta}, f, \Lambda, \alpha, \mathcal{D}, T, \eta, m, M$\linebreak where $M_{\theta}(\mathbf{x})\neq\mathbf{y}^*$
@@ -340,6 +349,8 @@ where $\hat{q}$ denotes the $(1-\alpha)$-quantile of $\mathcal{S}$ and $\alpha$
Observe from Equation~\ref{eq:scp} that Conformal Prediction works on an instance-level basis, much like Counterfactual Explanations are local. The prediction set for an individual instance $\mathbf{x}_i$ depends only on the characteristics of that sample and the specified error rate. Intuitively, the set is more likely to include multiple labels for samples that are difficult to classify, so the set size is indicative of predictive uncertainty. To see why this effect is exacerbated by small choices for $\alpha$ consider the case of $\alpha=0$, which requires that the true label is covered by the prediction set with probability equal to one.