minor things

fcb015a6 · pat-alt · 1c205972 · fcb015a6
Commit fcb015a6 authored 2 years ago by pat-alt
--- a/paper/sections/methodology_2.rmd
+++ b/paper/sections/methodology_2.rmd
@@ -64,7 +64,7 @@ MMD({X}^\prime,\tilde{X}^\prime) &= \frac{1}{m(m-1)}\sum_{i=1}^m\sum_{j\neq i}^m
 \end{aligned}
 \end{equation}

-where $X=\{x_1,...,x_m\}$, $\tilde{X}=\{\tilde{x}_1,...,\tilde{x}_n\}$ represent independent and identically distributed samples drawn from probability distributions $\mathcal{X}$ and $\mathcal{\tilde{X}}$ respectively @gretton2012kernel. MMD is a measure of the distance between the kernel mean embeddings of $\mathcal{X}$ and $\mathcal{\tilde{X}}$ in a Reproducing Kernel Hilbert Space, $\mathcal{H}$ [@berlinet2011reproducing]. An important consideration is the choice of the kernel function $k(\cdot,\cdot)$. In our implementation we make use of a Gaussian kernel with a constant length-scale parameter of $0.5$. As the Gaussian kernel captures all moments of distributions $\mathcal{X}$ and $\mathcal{\tilde{X}}$, we have that $MMD(X,\tilde{X})=0$ if and only if $X=\tilde{X}$. Conversely, larger values $MMD(X,\tilde{X})>0$ indicate that it is more likely that $\mathcal{X}$ and $\mathcal{\tilde{X}}$ are different distributions. In our context, large values therefore indicate that a domain shift indeed seems to have occurred.
+where $X=\{x_1,...,x_m\}$, $\tilde{X}=\{\tilde{x}_1,...,\tilde{x}_n\}$ represent independent and identically distributed samples drawn from probability distributions $\mathcal{X}$ and $\mathcal{\tilde{X}}$ respectively @gretton2012kernel. MMD is a measure of the distance between the kernel mean embeddings of $\mathcal{X}$ and $\mathcal{\tilde{X}}$ in a Reproducing Kernel Hilbert Space, $\mathcal{H}$ [@berlinet2011reproducing]. An important consideration is the choice of the kernel function $k(\cdot,\cdot)$. In our implementation, we make use of a Gaussian kernel with a constant length-scale parameter of $0.5$. As the Gaussian kernel captures all moments of distributions $\mathcal{X}$ and $\mathcal{\tilde{X}}$, we have that $MMD(X,\tilde{X})=0$ if and only if $X=\tilde{X}$. Conversely, larger values $MMD(X,\tilde{X})>0$ indicate that it is more likely that $\mathcal{X}$ and $\mathcal{\tilde{X}}$ are different distributions. In our context, large values, therefore, indicate that a domain shift indeed seems to have occurred.

 To assess the statistical significance of the observed shifts under the null hypothesis that samples $X$ and $\tilde{X}$ were drawn from the same probability distribution, we follow @arcones1992bootstrap. To that end, we combine the two samples and generate a large number of permutations of $X + \tilde{X}$. Then, we split the permuted data into two new samples $X^\prime$ and $\tilde{X}^\prime$ having the same size as the original samples. Then under the null hypothesis, we should have that $MMD(X^\prime,\tilde{X}^\prime)$ be approximately equal to $MMD(X,\tilde{X})$. The corresponding $p$-value can then be calculated by counting how these two quantities are not equal.