Merge branch 'San_Part' into 'main'

San part See merge request !84

Merge branch 'San_Part' into 'main'
11bcc51f · Sandra Verhagen · 2de55fef · 10f7f57e · 11bcc51f · 11bcc51f
Commit 11bcc51f authored 1 year ago by Sandra Verhagen
--- a/book/sandbox/SanPart/ErrorProp/01_ErrorPropagation.md
+++ b/book/sandbox/SanPart/ErrorProp/01_ErrorPropagation.md
+(01_errorprop)=
+# Propagation of uncertainty
+## Functions of random variables
+In engineering and sciences we often work with functions of random variables, since when estimating or modelling something, the output is a function of the random input variables, see {numref}`functions`
+
+```{figure} ../figures/ErrorPropagation/01_Functions.png
+---
+height: 350px
+name: functions
+---
+Output of a model $X$ is function of random input $Y$.
+```
+
+Some simple examples are:
+* conversion of temperature measured in degrees Celsius to temperature in degrees Fahrenheit: $T_f = q(T_c)=\frac{9}{5}T_c+32$
+* taking the mean of $m$ repeated measurements $Y_i$: $\hat{X}=q(Y_1,\ldots,Y_m)=\frac{1}{m}\sum_{i=1}^m Y_i$
+* subsurface temperature $T_z$ as a function of depth $Z$ and surface temperature $T_0$ and known $a$: $T_z = T_0 + aZ$
+* wind loading $F$ on a building as function of area of building face $A$, wind pressure $P$, drag coefficient $C$: $F = A\cdot P\cdot C$
+* Evaporation $Q$ using Bowen Ratio Energy Balance as function of the net radiation $R$, ground heat flux $G$, bowen ratio $B$: $Q =q(R,G,B) =\frac{R-G}{1-B}$
+
+{numref}`temp` shows an example of the distribution of the average July temperature in a city, both in degrees Celsius and degrees Fahrenheit. Due to the change of units, the PDF is transformed, the mean is shifted and the variance changed.
+
+```{figure} ../figures/ErrorPropagation/01_Temp.png
+---
+height: 600px
+name: temp
+---
+Distribution of temperature in degrees Celsius and degrees Fahrenheit.
+```
+The question we are interestein is: **how does the statistical uncertainty in input data propagate in the output variables?**
+
+Here, will only consider the propagation of the mean (= expectation) and (co-)variances, and not the transformation of the full PDF or CDF.
+
+
+
+The general form of our problem will be given as follows. Consider the single function of $m$ random variables
+
+$$
+X = q(Y)=q(Y_1,\ldots,Y_m)
+$$
+
+with the mean and covariance matrix of $Y$ known:
+
+$$
+\mathbb{E}(Y)=\mu_Y, \quad \mathbb{D}(Y)=\Sigma_Y
+$$
+
+**What is then the mean and variance of $X$?**
+
+Let's start with the mean for the case that the function is linear: 
+
+$$
+q(Y)=a_1 Y_1+ a_2 Y_2 +\cdots+ a_m Y_m + c
+$$
+
+with the $a_i$ and $c$ deterministic constants. Since the expectation operator is a linear operator, we have:
+
+$$
+\mathbb{E}(q(Y))=\mathbb{E}(a_1 Y_1+ a_2 Y_2 +\cdots a_m Y_m + c)= a_1 \mathbb{E}(Y_1)+\cdots+ a_m \mathbb{E}(Y_m)+c
+$$
+
+But what if the function is non-linear? Then we can use the [Taylor series](PM_taylor) approximation of $q(Y)$. 
+
+## Function of one random variable 
+
+We will first look at the simplest case, where we have a function of a single random variable, $X=q(Y)$, with the Taylor approximation:
+
+$$
+q(Y)\approx q(\mu_Y) +\left(\frac{\partial q}{\partial Y}\right)_0(Y-\mu_Y) + \frac{1}{2!} \left(\frac{\partial^2 q}{\partial Y^2}\right)_0(Y-\mu_Y)^2 + \text{H.O.T}
+$$
+
+where we take the mean $y_0=\mu_Y$ as the logical initial guess of the random vector $Y$. (H.O.T. stands for higher-order terms). The subscript $_0$ indicates that the partial derivatives are evaluated at $y_0=\mu_Y$.
+
+Due to the linearity of the expectation we then find as a second-order approximation of $\mathbb{E}(q(Y))$, known as *mean propagation law*:
+
+$$
+\begin{align*}
+\mathbb{E}(X)=\mathbb{E}(q(Y))&\approx \mathbb{E}(q(\mu_Y) +\left(\frac{\partial q}{\partial Y}\right)_0(Y-\mu_Y) + \frac{1}{2!} \left(\frac{\partial^2 q}{\partial Y^2}\right)_0(Y-\mu_Y)^2)\\
+&= \mathbb{E}(q(\mu_Y)) +\left(\frac{\partial q}{\partial Y}\right)_0\mathbb{E}\left((Y-\mu_Y)\right) + \frac{1}{2} \left(\frac{\partial^2 q}{\partial Y^2}\right)_0\mathbb{E}\left((Y-\mu_Y)^2\right)\\
+&= q(\mu_Y)+\frac{1}{2} \left(\frac{\partial^2 q}{\partial Y^2}\right)_0\sigma_Y^2
+\end{align*}
+$$
+where in order to arrive at the last equation we should recognize that:
+* $\mathbb{E}(q(\mu_Y))=q(\mu_Y)$ (since $\mu_Y$ is deterministic and known) 
+* $\mathbb{E}(Y-\mu_Y)=\mathbb{E}(Y)-\mu_Y= 0$ 
+* $\mathbb{E}\left((Y-\mu_Y)^2\right)=\sigma_Y^2$.
+
+For the variance of $X=q(Y)$ it generally suffices to use a first-order approximation. The result follows as:
+
+$$
+\sigma_X^2 =\left(\frac{\partial q}{\partial Y}\right)_0^2\sigma_Y^2
+$$
+
+and is referred to as the *variance propagation law*.
+
+### Example $X=Y^2$
+
+$$
+\begin{align*}
+\mathbb{E}(X)&\approx \mu_Y^2 + \frac{1}{2}\cdot 2 \cdot \sigma_Y^2= \mu_Y^2+\sigma_Y^2\\
+\sigma_X^2 &\approx \left( 2\mu_Y\right)^2\sigma_Y^2 = 4\mu_Y^2\sigma_Y^2
+\end{align*}
+$$
+
+## Function of two random variables 
+Let's consider the case that we have one function of two random variables, $Y = [Y_1\; \;Y_2]^T$ with known mean and covariance matrix:
+
+$$
+\mathbb{E}(Y)=\mu_Y =\begin{bmatrix}\mu_1\\ \mu_2 \end{bmatrix}, \quad \Sigma_Y= \begin{bmatrix}\sigma_1^2 & Cov(Y_1,Y_2)\\ Cov(Y_1,Y_2)&\sigma_2^2 \end{bmatrix}
+$$
+
+The Taylor series approximations of $X=q(Y_1,Y_2)$ follow as:
+
+$$
+\begin{align*}
+\mathbb{E}(X)&\approx q(\mu_Y)+\frac{1}{2} \left(\frac{\partial^2 q}{\partial Y_1^2}\right)_0\sigma_1^2 +\frac{1}{2} \left(\frac{\partial^2 q}{\partial Y_2^2}\right)_0\sigma_2^2 + \left(\frac{\partial^2 q}{\partial Y_1 \partial Y_2}\right)_0Cov(Y_1,Y_2)\\
+\sigma_X^2 &\approx \left(\frac{\partial q}{\partial Y_1}\right)_0^2\sigma_1^2 + \left(\frac{\partial q}{\partial Y_2}\right)_0^2\sigma_2^2 + 2\left(\frac{\partial q}{\partial Y_1}\right)_0\left(\frac{\partial q}{\partial Y_2}\right)_0Cov(Y_1,Y_2)
+\end{align*}
+$$
+
+These are thus the mean and variance propagation laws for a function of two random variables. Pay attention to the $\approx$-sign.
+
+If $Y_1$ and $Y_2$ are independent, we have that $Cov(Y_1,Y_2)=0$, such that the last term in both equations disappears.
+
+
+:::{card} Exercise mathematical pendulum
+
+We will measure the period on one oscillation $T$ of a pendulum, and also the length $L$ of the pendulum. Both measurements are affected by random errors, and therefore the 'calculated' gravitational acceleration $G$ is a function of two random variables:
+
+$G=q(L,T)= 4\pi \frac{L}{T^2}$ 
+
+Apply the mean and variance propagation laws to approximate the mean and variance of $G$ given that the expectations 
+$\mathbb{E}(L)= \mu_L$ and $\mathbb{E}(T)= \mu_T$, as well as the variances $\sigma^2_L$ and $\sigma^2_T$ are known, and the covariance $Cov(L,T)=0$.
+
+```{admonition} Solution
+:class: tip, dropdown
+First determine the first- and second-order partial derivatives of the function $q(L,T)$ to obtain:
+
+$$
+\mathbb{E}(X)\approx 4\pi \frac{\mu_L}{\mu_T^2} + 12\pi^2 \frac{\mu_L}{\mu_T^4}\sigma_T^2
+$$
+
+
+$$
+\sigma^2_G \approx 16\pi^4\frac{1}{\mu_T^4}\sigma_L^2 +64\pi^4 \frac{\mu_L^2}{\mu_T^6}\sigma_T^2
+$$
+
+```
+:::
+## Functions of $n$ random variables
+The propagation laws for functions of $n$ random variables are as follows:
+
+$$
+\mathbb{E}(X)\approx q(\mu_Y)+\frac{1}{2} \sum_{i=1}^{n}\left(\frac{\partial^2 q}{\partial Y_i^2}\right)_0\sigma_i^2 + \frac{1}{2} \sum_{i=1}^{n}\sum_{j=1,j\neq i}^{n} \left(\frac{\partial^2 q}{\partial Y_i \partial Y_j}\right)_0Cov(Y_1,Y_2)
+$$
+
+$$
+\sigma_X^2 \approx \sum_{i=1}^{n}\left(\frac{\partial q}{\partial Y_i}\right)_0^2\sigma_i^2  + \sum_{i=1}^{n}\sum_{j=1,j\neq i}^{n}\left(\frac{\partial q}{\partial Y_i}\right)_0\left(\frac{\partial q}{\partial Y_j}\right)_0Cov(Y_i,Y_j)
+$$
+
+
+
+
+(01_LinearProp)=
+## Linear propagation laws of mean and covariance
+
+### Linear function of two random variables
+Consider a linear function of two random variables
+
+$$
+X = q(Y)=a_1 Y_1+ a_2 Y_2 + c
+$$
+
+We can now show that $\mathbb{E}(q(Y))= a_1 \mathbb{E}(Y_1)+a_2 \mathbb{E}(Y_2)+c$ using our Taylor approximations. The first-order partial derivatives namely follow as
+
+$$
+\frac{\partial q}{\partial Y_1}= a_1, \; \frac{\partial q}{\partial Y_2}= a_2
+$$
+
+and all the higher-order derivatives are zero, and consequently all higher-order terms in the Taylor series will be zero. The expectation of $q(Y)$ follows therefore as
+
+$$
+\mathbb{E}(q(Y))= q(\mu_1,\mu_2)=a_1 \mu_1 + a_2\mu_2 + c
+$$
+
+which is exact (i.e., not an approximation anymore).
+
+### Exercise
+
+In a similar fashion derive the variance of $X$, which is also an exact result.
+
+ ```{admonition} Solution
+:class: tip, dropdown
+First determine the first- and second-order partial derivatives of the function $q(L,T)$ to obtain:
+
+$$
+\sigma_X^2 = a_1^2 \sigma_1^2 + a_2^2 \sigma_2^2 + 2a_1 a_2 Cov(Y_1,Y_2)
+$$
+
+Note that it does not depend on the deterministic constant $c$.
+```
+
+### Linear functions of $n$ random variables 
+Note that the linear function of two random variables can also be written as $q(Y) = \begin{bmatrix} a_1 & a_2\end{bmatrix}\begin{bmatrix}Y_1 \\ Y_2\end{bmatrix}+c$. We will now generalize to the case where we have $m$ linear functions of $n$ variables, which can be written as a linear system of equations:
+
+$$ 
+X= \begin{bmatrix} X_1\\ X_2 \\ \vdots \\ X_m \end{bmatrix}= \begin{bmatrix} a_{11}&a_{12}&\dots&a_{1n}\\a_{21}&a_{22}&\dots&a_{2n} \\ \vdots&\vdots&\vdots&\vdots \\ a_{m1}&a_{m2}&\dots&a_{mn} \end{bmatrix} \begin{bmatrix} Y_1\\ Y_2 \\ \vdots \\ Y_n \end{bmatrix} +\begin{bmatrix} c_1\\ c_2 \\ \vdots \\ c_n \end{bmatrix}=\mathrm{A}Y+\mathrm{c}
+$$
+
+with known $\mathbb{E}(Y)$ and covariance matrix $\Sigma_Y$, and $\mathrm{c}$ a vector with deterministic variables.
+
+The linear propagation laws of the mean and covariance matrix are given by
+
+$$
+\mathbb{E}(X) = \mathrm{A}\mathbb{E}(Y)+\mathrm{c}
+$$ 
+
+$$
+\Sigma_{X} =\mathrm{A}\Sigma_Y \mathrm{A}^T
+$$
+
+These is an exact results, since for linear functions the higher-order terms of the Taylor approximation become zero and thus the approximation error is zero.
+
+
+
+### Exercise
+Consider the linear system of equations
+
+$$
+X=\begin{bmatrix}1&1 \\ 1&-2\end{bmatrix}\begin{bmatrix}Y_1 \\ Y_2\end{bmatrix}
+$$
+
+with 
+
+$$
+\mu_Y = \begin{bmatrix}0 \\ 0\end{bmatrix},\; \Sigma_Y= \begin{bmatrix}3&0 \\ 0&3\end{bmatrix}
+$$
+
+Apply the linear propagation laws to find $\mathbb{E}(X)=\mu_X$ and $\Sigma_X$.
+
+ ```{admonition} Solution
+:class: tip, dropdown
+
+$$
+\mu_X=\begin{bmatrix}1&1 \\ 1&-2\end{bmatrix}\begin{bmatrix}0 \\ 0\end{bmatrix}=\begin{bmatrix}0 \\ 0\end{bmatrix}
+$$
+
+$$
+\Sigma_X = \begin{bmatrix}1&1 \\ 1&-2\end{bmatrix}\begin{bmatrix}3&0 \\ 0&3\end{bmatrix}\begin{bmatrix}1&1 \\ 1&-2\end{bmatrix}=\begin{bmatrix}6&-3 \\ -3&15\end{bmatrix}
+$$
+
+```
--- a/book/sandbox/SanPart/ErrorProp/image.png
+++ b/book/sandbox/SanPart/ErrorProp/image.png
--- a/book/sandbox/SanPart/ObservationTheory/01_Introduction.md
+++ b/book/sandbox/SanPart/ObservationTheory/01_Introduction.md
@@ -22,7 +22,7 @@ $$
 where $\mathrm{x}$ is a vector with the unknown model parameters, and $\mathrm{A}$ is referred to as the *design matrix*. $\mathbb{E}(.)$ is the expectation operator. The random errors are given by $\epsilon$ and are assumed to have zero mean: $\mathbb{E}(\epsilon)=0$.
 ```

-MMM
+MMMMM

 EXERCISE : show that the two expressions in the Definition above are indeed the same.

@@ -42,7 +42,7 @@ Parameter estimation requires specification of the underlying functional and sto

 (01_funcmodel)=
 ## Functional and stochastic model: examples
-MMM
+MMMMM
 Relate to 'modelling' in MUDE.

 introduce redundancy

--- a/book/sandbox/SanPart/ObservationTheory/03_WeightedLSQ.md
+++ b/book/sandbox/SanPart/ObservationTheory/03_WeightedLSQ.md
@@ -14,13 +14,13 @@ $$
 \mathrm{y=Ax} \; \rightarrow \; \mathrm{A^T\;  y = A^T\; A x }
 $$

-In the weighted least-squares approach, we now need to add weight matrix $W$ to this pre-multiplication factor, i.e., $ \mathrm{A^T W}$, to obtain the normal equation:
+In the weighted least-squares approach, we now need to add weight matrix $\mathrm{W}$ to this pre-multiplication factor, i.e., $ \mathrm{A^T W}$, to obtain the normal equation:

 $$ 
 \mathrm{y=Ax} \; \rightarrow \; \mathrm{A^T W \; y = A^TW \; A x}
 $$

-The normal matrix is now defined as $\mathrm{N=A^T W A}$. From this, assuming that the normal matrix $N$ is invertible (non-singular) we find the weighted least-squares estimate $ \mathrm{\hat{x}} $, 
+The normal matrix is now defined as $\mathrm{N=A^T W A}$. From this, assuming that the normal matrix $\mathrm{N}$ is invertible (non-singular) we find the weighted least-squares estimate $ \mathrm{\hat{x}} $, 

 $$ 
 \begin{align*}

--- a/book/sandbox/SanPart/ObservationTheory/06_MLE.md
+++ b/book/sandbox/SanPart/ObservationTheory/06_MLE.md
@@ -12,9 +12,9 @@ The principle of Maximum Likelihood estimation (MLE) is to find the *most likely

 This boils down to estimating the unknown parameters $\alpha$ of the underlying distribution, which means that the probability density function (PDF) is known apart from the $n$ parameters in $\alpha$. We will now distinguish between a PDF and likelihood function.

-A *probability density function* $f_Y(\mathrm{y}|\alpha)$ is given as function of $\mathrm{y}$ and with $\alpha$ known.
+A *probability density function* $f_Y(\mathrm{y}|\theta)$ is given as function of $\mathrm{y}$ and with $\theta$ known.

-A *likelihood function* $f_Y(\mathrm{y}|\alpha)$ is given for a certain realization $\mathrm{y}$ as function of all possible values of $\alpha$.
+A *likelihood function* $L(\theta|\mathrm{y})$ is given for a certain realization $\mathrm{y}$ as function of all possible values of $\alpha$.

 Wiht MLE, the goal is to find the $\alpha$ which maximizes the likelihood function for the given realization $\mathrm{y}$.

@@ -45,24 +45,24 @@ It is also possible to consider the case that $\Sigma_Y$ is not (completely) kno
 The likelihood function of the multivariate normal distribution is given by:

 $$
-f_{Y}(y|\mathrm{Ax},\Sigma_Y)=\underset{c}{\underbrace{(\det{2\pi \Sigma_Y})^{-0.5}}}\exp(-\frac{1}{2}\mathrm{(y-Ax)^T} \Sigma_Y^{-1}(\mathrm{y-Ax})) 
+L(\mathrm{Ax},\Sigma_Y|\mathrm{y})=\underset{c}{\underbrace{(\det{2\pi \Sigma_Y})^{-0.5}}}\exp(-\frac{1}{2}\mathrm{(y-Ax)^T} \Sigma_Y^{-1}(\mathrm{y-Ax})) 
 $$

 Maximizing this likelihood function for $\mathrm{x}$ means that we have to find the $\mathrm{x}$ such that:
-* the first-order partial derivatives ([gradient](PM_gradient)) are zero: $\partial_{\mathrm{x} }f_{Y}(y|\mathrm{Ax},\Sigma_Y)=0$
+* the first-order partial derivatives ([gradient](PM_gradient)) are zero: $\partial_{\mathrm{x} }L(\mathrm{Ax},\Sigma_Y|\mathrm{y})=0$
 * the second-order partial derivatives are negative.

 Instead of working with the likelihood function, we prefer to work with the *loglikelihood* function:

 $$
-\ln f_{Y}(y|\mathrm{Ax},\Sigma_Y)=\ln c -\frac{1}{2}\mathrm{(y-Ax)^T} \Sigma_Y^{-1}(\mathrm{y-Ax}) 
+\ln L(\mathrm{Ax},\Sigma_Y|\mathrm{y})=\ln c -\frac{1}{2}\mathrm{(y-Ax)^T} \Sigma_Y^{-1}(\mathrm{y-Ax}) 
 $$

 since that is easier and results in the same maximum. Setting the gradient to zero gives:

 $$
 \begin{align*}
-\frac{\partial \ln f_{Y}(y|\mathrm{Ax},\Sigma_Y)}{\partial \mathrm{x}}&= \frac{\partial (-\frac{1}{2}\mathrm{(y-Ax)^T} \Sigma_Y^{-1}(\mathrm{y-Ax}))}{\partial \mathrm{x}}\\
+\frac{\partial \ln L(\mathrm{Ax},\Sigma_Y|\mathrm{y})}{\partial \mathrm{x}}&= \frac{\partial (-\frac{1}{2}\mathrm{(y-Ax)^T} \Sigma_Y^{-1}(\mathrm{y-Ax}))}{\partial \mathrm{x}}\\
 &= -\frac{1}{2}(-2\mathrm{A^T} \Sigma_Y^{-1} Y+2\mathrm{A^T} \Sigma_Y^{-1} \mathrm{A}\mathrm{x})=0\\
 \mathrm{A^T} \Sigma_Y^{-1} \mathrm{A}\mathrm{x} &= \mathrm{A^T} \Sigma_Y^{-1} \mathrm{y}
 \end{align*}

--- a/book/sandbox/SanPart/figures/.DS_Store
+++ b/book/sandbox/SanPart/figures/.DS_Store
--- a/book/sandbox/SanPart/figures/ErrorPropagation/01_Functions.png
+++ b/book/sandbox/SanPart/figures/ErrorPropagation/01_Functions.png
--- a/book/sandbox/SanPart/figures/ErrorPropagation/01_Temp.png
+++ b/book/sandbox/SanPart/figures/ErrorPropagation/01_Temp.png
--- a/book/sandbox/SanPart/premath/00_03_PreMath.md
+++ b/book/sandbox/SanPart/premath/00_03_PreMath.md
@@ -6,10 +6,10 @@
 Taylor’s theorem can be used to approximate a function $f(x)$ with the so called $p$-th order Taylor polynomial:

 $$
-f(x) \approx f(x_0) +\frac{\partial}{\partial x}f(x_0)\Delta x + \frac{1}{2!} \frac{\partial^2}{\partial x^2}f(x_0)\Delta x^2 + \ldots +\frac{1}{p!} \frac{\partial^p}{\partial x^p}f(x_0)\Delta x^p=P_k (x)
+f(x) \approx f(x_0) +\frac{\partial}{\partial x}f(x_0)(x-x_0) + \frac{1}{2!} \frac{\partial^2}{\partial x^2}f(x_0)(x-x_0)^2 + \ldots +\frac{1}{p!} \frac{\partial^p}{\partial x^p}f(x_0)(x-x_0)^p=P_k (x)
 $$

-where it is required that the function $f: \mathbb{R}\mapsto \mathbb{R}$ is $p$-times differentiable at the point $x_0 \in \mathbb{R}$, and $\Delta x = x-x_0$.
+where it is required that the function $f: \mathbb{R}\mapsto \mathbb{R}$ is $p$-times differentiable at the point $x_0 \in \mathbb{R}$.

 The approximation error is equal to 

@@ -24,16 +24,16 @@ Example:
 A linear approximation (also called linearization) of $f(x) = \cos(x)$ at $x_0$ is obtained by the 1st order Taylor polynomial as:

 $$
-f(x) \approx \cos x_0 – \sin x_0 \Delta x
+f(x) \approx \cos x_0 – \sin x_0 \cdot(x-x_0)
 $$

 ## First-order Taylor polynomial for linearizing a function of $n$ variables

-In this course we will only need first-order Taylor approximations for linearizing non-linear functions of $x$ being a vector with $n$ variables. The corresponding Taylor polynomial is then given by:
+For linearizing non-linear functions of $x$ being a vector with $n$ variables, we need the first-order Taylor polynomial, which is then given by:

 $$
 f(x) \approx f(x_0)  + \partial_{x_1} f(x_0) \Delta x_0+ \partial_{x_2} f(x_0) \Delta x_0+ \ldots + \partial_{x_n} f(x_0) \Delta x_0
 $$

-where we need the $n$ partial derivatives of function $f$ evaluated at $x_0$.
+where $\Delta x_0=(x-x_0)$ and we need the $n$ partial derivatives of function $f$ evaluated at $x_0$.