Skip to content
Snippets Groups Projects
Commit 11bcc51f authored by Sandra Verhagen's avatar Sandra Verhagen
Browse files

Merge branch 'San_Part' into 'main'

San part

See merge request !84
parents 2de55fef 10f7f57e
No related branches found
No related tags found
2 merge requests!90Publish Week 1,!84San part
Pipeline #207339 passed
(01_errorprop)=
# Propagation of uncertainty
## Functions of random variables
In engineering and sciences we often work with functions of random variables, since when estimating or modelling something, the output is a function of the random input variables, see {numref}`functions`
```{figure} ../figures/ErrorPropagation/01_Functions.png
---
height: 350px
name: functions
---
Output of a model $X$ is function of random input $Y$.
```
Some simple examples are:
* conversion of temperature measured in degrees Celsius to temperature in degrees Fahrenheit: $T_f = q(T_c)=\frac{9}{5}T_c+32$
* taking the mean of $m$ repeated measurements $Y_i$: $\hat{X}=q(Y_1,\ldots,Y_m)=\frac{1}{m}\sum_{i=1}^m Y_i$
* subsurface temperature $T_z$ as a function of depth $Z$ and surface temperature $T_0$ and known $a$: $T_z = T_0 + aZ$
* wind loading $F$ on a building as function of area of building face $A$, wind pressure $P$, drag coefficient $C$: $F = A\cdot P\cdot C$
* Evaporation $Q$ using Bowen Ratio Energy Balance as function of the net radiation $R$, ground heat flux $G$, bowen ratio $B$: $Q =q(R,G,B) =\frac{R-G}{1-B}$
{numref}`temp` shows an example of the distribution of the average July temperature in a city, both in degrees Celsius and degrees Fahrenheit. Due to the change of units, the PDF is transformed, the mean is shifted and the variance changed.
```{figure} ../figures/ErrorPropagation/01_Temp.png
---
height: 600px
name: temp
---
Distribution of temperature in degrees Celsius and degrees Fahrenheit.
```
The question we are interestein is: **how does the statistical uncertainty in input data propagate in the output variables?**
Here, will only consider the propagation of the mean (= expectation) and (co-)variances, and not the transformation of the full PDF or CDF.
The general form of our problem will be given as follows. Consider the single function of $m$ random variables
$$
X = q(Y)=q(Y_1,\ldots,Y_m)
$$
with the mean and covariance matrix of $Y$ known:
$$
\mathbb{E}(Y)=\mu_Y, \quad \mathbb{D}(Y)=\Sigma_Y
$$
**What is then the mean and variance of $X$?**
Let's start with the mean for the case that the function is linear:
$$
q(Y)=a_1 Y_1+ a_2 Y_2 +\cdots+ a_m Y_m + c
$$
with the $a_i$ and $c$ deterministic constants. Since the expectation operator is a linear operator, we have:
$$
\mathbb{E}(q(Y))=\mathbb{E}(a_1 Y_1+ a_2 Y_2 +\cdots a_m Y_m + c)= a_1 \mathbb{E}(Y_1)+\cdots+ a_m \mathbb{E}(Y_m)+c
$$
But what if the function is non-linear? Then we can use the [Taylor series](PM_taylor) approximation of $q(Y)$.
## Function of one random variable
We will first look at the simplest case, where we have a function of a single random variable, $X=q(Y)$, with the Taylor approximation:
$$
q(Y)\approx q(\mu_Y) +\left(\frac{\partial q}{\partial Y}\right)_0(Y-\mu_Y) + \frac{1}{2!} \left(\frac{\partial^2 q}{\partial Y^2}\right)_0(Y-\mu_Y)^2 + \text{H.O.T}
$$
where we take the mean $y_0=\mu_Y$ as the logical initial guess of the random vector $Y$. (H.O.T. stands for higher-order terms). The subscript $_0$ indicates that the partial derivatives are evaluated at $y_0=\mu_Y$.
Due to the linearity of the expectation we then find as a second-order approximation of $\mathbb{E}(q(Y))$, known as *mean propagation law*:
$$
\begin{align*}
\mathbb{E}(X)=\mathbb{E}(q(Y))&\approx \mathbb{E}(q(\mu_Y) +\left(\frac{\partial q}{\partial Y}\right)_0(Y-\mu_Y) + \frac{1}{2!} \left(\frac{\partial^2 q}{\partial Y^2}\right)_0(Y-\mu_Y)^2)\\
&= \mathbb{E}(q(\mu_Y)) +\left(\frac{\partial q}{\partial Y}\right)_0\mathbb{E}\left((Y-\mu_Y)\right) + \frac{1}{2} \left(\frac{\partial^2 q}{\partial Y^2}\right)_0\mathbb{E}\left((Y-\mu_Y)^2\right)\\
&= q(\mu_Y)+\frac{1}{2} \left(\frac{\partial^2 q}{\partial Y^2}\right)_0\sigma_Y^2
\end{align*}
$$
where in order to arrive at the last equation we should recognize that:
* $\mathbb{E}(q(\mu_Y))=q(\mu_Y)$ (since $\mu_Y$ is deterministic and known)
* $\mathbb{E}(Y-\mu_Y)=\mathbb{E}(Y)-\mu_Y= 0$
* $\mathbb{E}\left((Y-\mu_Y)^2\right)=\sigma_Y^2$.
For the variance of $X=q(Y)$ it generally suffices to use a first-order approximation. The result follows as:
$$
\sigma_X^2 =\left(\frac{\partial q}{\partial Y}\right)_0^2\sigma_Y^2
$$
and is referred to as the *variance propagation law*.
### Example $X=Y^2$
$$
\begin{align*}
\mathbb{E}(X)&\approx \mu_Y^2 + \frac{1}{2}\cdot 2 \cdot \sigma_Y^2= \mu_Y^2+\sigma_Y^2\\
\sigma_X^2 &\approx \left( 2\mu_Y\right)^2\sigma_Y^2 = 4\mu_Y^2\sigma_Y^2
\end{align*}
$$
## Function of two random variables
Let's consider the case that we have one function of two random variables, $Y = [Y_1\; \;Y_2]^T$ with known mean and covariance matrix:
$$
\mathbb{E}(Y)=\mu_Y =\begin{bmatrix}\mu_1\\ \mu_2 \end{bmatrix}, \quad \Sigma_Y= \begin{bmatrix}\sigma_1^2 & Cov(Y_1,Y_2)\\ Cov(Y_1,Y_2)&\sigma_2^2 \end{bmatrix}
$$
The Taylor series approximations of $X=q(Y_1,Y_2)$ follow as:
$$
\begin{align*}
\mathbb{E}(X)&\approx q(\mu_Y)+\frac{1}{2} \left(\frac{\partial^2 q}{\partial Y_1^2}\right)_0\sigma_1^2 +\frac{1}{2} \left(\frac{\partial^2 q}{\partial Y_2^2}\right)_0\sigma_2^2 + \left(\frac{\partial^2 q}{\partial Y_1 \partial Y_2}\right)_0Cov(Y_1,Y_2)\\
\sigma_X^2 &\approx \left(\frac{\partial q}{\partial Y_1}\right)_0^2\sigma_1^2 + \left(\frac{\partial q}{\partial Y_2}\right)_0^2\sigma_2^2 + 2\left(\frac{\partial q}{\partial Y_1}\right)_0\left(\frac{\partial q}{\partial Y_2}\right)_0Cov(Y_1,Y_2)
\end{align*}
$$
These are thus the mean and variance propagation laws for a function of two random variables. Pay attention to the $\approx$-sign.
If $Y_1$ and $Y_2$ are independent, we have that $Cov(Y_1,Y_2)=0$, such that the last term in both equations disappears.
:::{card} Exercise mathematical pendulum
We will measure the period on one oscillation $T$ of a pendulum, and also the length $L$ of the pendulum. Both measurements are affected by random errors, and therefore the 'calculated' gravitational acceleration $G$ is a function of two random variables:
$G=q(L,T)= 4\pi \frac{L}{T^2}$
Apply the mean and variance propagation laws to approximate the mean and variance of $G$ given that the expectations
$\mathbb{E}(L)= \mu_L$ and $\mathbb{E}(T)= \mu_T$, as well as the variances $\sigma^2_L$ and $\sigma^2_T$ are known, and the covariance $Cov(L,T)=0$.
```{admonition} Solution
:class: tip, dropdown
First determine the first- and second-order partial derivatives of the function $q(L,T)$ to obtain:
$$
\mathbb{E}(X)\approx 4\pi \frac{\mu_L}{\mu_T^2} + 12\pi^2 \frac{\mu_L}{\mu_T^4}\sigma_T^2
$$
$$
\sigma^2_G \approx 16\pi^4\frac{1}{\mu_T^4}\sigma_L^2 +64\pi^4 \frac{\mu_L^2}{\mu_T^6}\sigma_T^2
$$
```
:::
## Functions of $n$ random variables
The propagation laws for functions of $n$ random variables are as follows:
$$
\mathbb{E}(X)\approx q(\mu_Y)+\frac{1}{2} \sum_{i=1}^{n}\left(\frac{\partial^2 q}{\partial Y_i^2}\right)_0\sigma_i^2 + \frac{1}{2} \sum_{i=1}^{n}\sum_{j=1,j\neq i}^{n} \left(\frac{\partial^2 q}{\partial Y_i \partial Y_j}\right)_0Cov(Y_1,Y_2)
$$
$$
\sigma_X^2 \approx \sum_{i=1}^{n}\left(\frac{\partial q}{\partial Y_i}\right)_0^2\sigma_i^2 + \sum_{i=1}^{n}\sum_{j=1,j\neq i}^{n}\left(\frac{\partial q}{\partial Y_i}\right)_0\left(\frac{\partial q}{\partial Y_j}\right)_0Cov(Y_i,Y_j)
$$
(01_LinearProp)=
## Linear propagation laws of mean and covariance
### Linear function of two random variables
Consider a linear function of two random variables
$$
X = q(Y)=a_1 Y_1+ a_2 Y_2 + c
$$
We can now show that $\mathbb{E}(q(Y))= a_1 \mathbb{E}(Y_1)+a_2 \mathbb{E}(Y_2)+c$ using our Taylor approximations. The first-order partial derivatives namely follow as
$$
\frac{\partial q}{\partial Y_1}= a_1, \; \frac{\partial q}{\partial Y_2}= a_2
$$
and all the higher-order derivatives are zero, and consequently all higher-order terms in the Taylor series will be zero. The expectation of $q(Y)$ follows therefore as
$$
\mathbb{E}(q(Y))= q(\mu_1,\mu_2)=a_1 \mu_1 + a_2\mu_2 + c
$$
which is exact (i.e., not an approximation anymore).
### Exercise
In a similar fashion derive the variance of $X$, which is also an exact result.
```{admonition} Solution
:class: tip, dropdown
First determine the first- and second-order partial derivatives of the function $q(L,T)$ to obtain:
$$
\sigma_X^2 = a_1^2 \sigma_1^2 + a_2^2 \sigma_2^2 + 2a_1 a_2 Cov(Y_1,Y_2)
$$
Note that it does not depend on the deterministic constant $c$.
```
### Linear functions of $n$ random variables
Note that the linear function of two random variables can also be written as $q(Y) = \begin{bmatrix} a_1 & a_2\end{bmatrix}\begin{bmatrix}Y_1 \\ Y_2\end{bmatrix}+c$. We will now generalize to the case where we have $m$ linear functions of $n$ variables, which can be written as a linear system of equations:
$$
X= \begin{bmatrix} X_1\\ X_2 \\ \vdots \\ X_m \end{bmatrix}= \begin{bmatrix} a_{11}&a_{12}&\dots&a_{1n}\\a_{21}&a_{22}&\dots&a_{2n} \\ \vdots&\vdots&\vdots&\vdots \\ a_{m1}&a_{m2}&\dots&a_{mn} \end{bmatrix} \begin{bmatrix} Y_1\\ Y_2 \\ \vdots \\ Y_n \end{bmatrix} +\begin{bmatrix} c_1\\ c_2 \\ \vdots \\ c_n \end{bmatrix}=\mathrm{A}Y+\mathrm{c}
$$
with known $\mathbb{E}(Y)$ and covariance matrix $\Sigma_Y$, and $\mathrm{c}$ a vector with deterministic variables.
The linear propagation laws of the mean and covariance matrix are given by
$$
\mathbb{E}(X) = \mathrm{A}\mathbb{E}(Y)+\mathrm{c}
$$
$$
\Sigma_{X} =\mathrm{A}\Sigma_Y \mathrm{A}^T
$$
These is an exact results, since for linear functions the higher-order terms of the Taylor approximation become zero and thus the approximation error is zero.
### Exercise
Consider the linear system of equations
$$
X=\begin{bmatrix}1&1 \\ 1&-2\end{bmatrix}\begin{bmatrix}Y_1 \\ Y_2\end{bmatrix}
$$
with
$$
\mu_Y = \begin{bmatrix}0 \\ 0\end{bmatrix},\; \Sigma_Y= \begin{bmatrix}3&0 \\ 0&3\end{bmatrix}
$$
Apply the linear propagation laws to find $\mathbb{E}(X)=\mu_X$ and $\Sigma_X$.
```{admonition} Solution
:class: tip, dropdown
$$
\mu_X=\begin{bmatrix}1&1 \\ 1&-2\end{bmatrix}\begin{bmatrix}0 \\ 0\end{bmatrix}=\begin{bmatrix}0 \\ 0\end{bmatrix}
$$
$$
\Sigma_X = \begin{bmatrix}1&1 \\ 1&-2\end{bmatrix}\begin{bmatrix}3&0 \\ 0&3\end{bmatrix}\begin{bmatrix}1&1 \\ 1&-2\end{bmatrix}=\begin{bmatrix}6&-3 \\ -3&15\end{bmatrix}
$$
```
book/sandbox/SanPart/ErrorProp/image.png

8.76 KiB

......@@ -22,7 +22,7 @@ $$
where $\mathrm{x}$ is a vector with the unknown model parameters, and $\mathrm{A}$ is referred to as the *design matrix*. $\mathbb{E}(.)$ is the expectation operator. The random errors are given by $\epsilon$ and are assumed to have zero mean: $\mathbb{E}(\epsilon)=0$.
```
MMM
MMMMM
EXERCISE : show that the two expressions in the Definition above are indeed the same.
......@@ -42,7 +42,7 @@ Parameter estimation requires specification of the underlying functional and sto
(01_funcmodel)=
## Functional and stochastic model: examples
MMM
MMMMM
Relate to 'modelling' in MUDE.
introduce redundancy
......
......@@ -14,13 +14,13 @@ $$
\mathrm{y=Ax} \; \rightarrow \; \mathrm{A^T\; y = A^T\; A x }
$$
In the weighted least-squares approach, we now need to add weight matrix $W$ to this pre-multiplication factor, i.e., $ \mathrm{A^T W}$, to obtain the normal equation:
In the weighted least-squares approach, we now need to add weight matrix $\mathrm{W}$ to this pre-multiplication factor, i.e., $ \mathrm{A^T W}$, to obtain the normal equation:
$$
\mathrm{y=Ax} \; \rightarrow \; \mathrm{A^T W \; y = A^TW \; A x}
$$
The normal matrix is now defined as $\mathrm{N=A^T W A}$. From this, assuming that the normal matrix $N$ is invertible (non-singular) we find the weighted least-squares estimate $ \mathrm{\hat{x}} $,
The normal matrix is now defined as $\mathrm{N=A^T W A}$. From this, assuming that the normal matrix $\mathrm{N}$ is invertible (non-singular) we find the weighted least-squares estimate $ \mathrm{\hat{x}} $,
$$
\begin{align*}
......
......@@ -12,9 +12,9 @@ The principle of Maximum Likelihood estimation (MLE) is to find the *most likely
This boils down to estimating the unknown parameters $\alpha$ of the underlying distribution, which means that the probability density function (PDF) is known apart from the $n$ parameters in $\alpha$. We will now distinguish between a PDF and likelihood function.
A *probability density function* $f_Y(\mathrm{y}|\alpha)$ is given as function of $\mathrm{y}$ and with $\alpha$ known.
A *probability density function* $f_Y(\mathrm{y}|\theta)$ is given as function of $\mathrm{y}$ and with $\theta$ known.
A *likelihood function* $f_Y(\mathrm{y}|\alpha)$ is given for a certain realization $\mathrm{y}$ as function of all possible values of $\alpha$.
A *likelihood function* $L(\theta|\mathrm{y})$ is given for a certain realization $\mathrm{y}$ as function of all possible values of $\alpha$.
Wiht MLE, the goal is to find the $\alpha$ which maximizes the likelihood function for the given realization $\mathrm{y}$.
......@@ -45,24 +45,24 @@ It is also possible to consider the case that $\Sigma_Y$ is not (completely) kno
The likelihood function of the multivariate normal distribution is given by:
$$
f_{Y}(y|\mathrm{Ax},\Sigma_Y)=\underset{c}{\underbrace{(\det{2\pi \Sigma_Y})^{-0.5}}}\exp(-\frac{1}{2}\mathrm{(y-Ax)^T} \Sigma_Y^{-1}(\mathrm{y-Ax}))
L(\mathrm{Ax},\Sigma_Y|\mathrm{y})=\underset{c}{\underbrace{(\det{2\pi \Sigma_Y})^{-0.5}}}\exp(-\frac{1}{2}\mathrm{(y-Ax)^T} \Sigma_Y^{-1}(\mathrm{y-Ax}))
$$
Maximizing this likelihood function for $\mathrm{x}$ means that we have to find the $\mathrm{x}$ such that:
* the first-order partial derivatives ([gradient](PM_gradient)) are zero: $\partial_{\mathrm{x} }f_{Y}(y|\mathrm{Ax},\Sigma_Y)=0$
* the first-order partial derivatives ([gradient](PM_gradient)) are zero: $\partial_{\mathrm{x} }L(\mathrm{Ax},\Sigma_Y|\mathrm{y})=0$
* the second-order partial derivatives are negative.
Instead of working with the likelihood function, we prefer to work with the *loglikelihood* function:
$$
\ln f_{Y}(y|\mathrm{Ax},\Sigma_Y)=\ln c -\frac{1}{2}\mathrm{(y-Ax)^T} \Sigma_Y^{-1}(\mathrm{y-Ax})
\ln L(\mathrm{Ax},\Sigma_Y|\mathrm{y})=\ln c -\frac{1}{2}\mathrm{(y-Ax)^T} \Sigma_Y^{-1}(\mathrm{y-Ax})
$$
since that is easier and results in the same maximum. Setting the gradient to zero gives:
$$
\begin{align*}
\frac{\partial \ln f_{Y}(y|\mathrm{Ax},\Sigma_Y)}{\partial \mathrm{x}}&= \frac{\partial (-\frac{1}{2}\mathrm{(y-Ax)^T} \Sigma_Y^{-1}(\mathrm{y-Ax}))}{\partial \mathrm{x}}\\
\frac{\partial \ln L(\mathrm{Ax},\Sigma_Y|\mathrm{y})}{\partial \mathrm{x}}&= \frac{\partial (-\frac{1}{2}\mathrm{(y-Ax)^T} \Sigma_Y^{-1}(\mathrm{y-Ax}))}{\partial \mathrm{x}}\\
&= -\frac{1}{2}(-2\mathrm{A^T} \Sigma_Y^{-1} Y+2\mathrm{A^T} \Sigma_Y^{-1} \mathrm{A}\mathrm{x})=0\\
\mathrm{A^T} \Sigma_Y^{-1} \mathrm{A}\mathrm{x} &= \mathrm{A^T} \Sigma_Y^{-1} \mathrm{y}
\end{align*}
......
No preview for this file type
book/sandbox/SanPart/figures/ErrorPropagation/01_Functions.png

41.7 KiB

book/sandbox/SanPart/figures/ErrorPropagation/01_Temp.png

19.7 KiB

......@@ -6,10 +6,10 @@
Taylor’s theorem can be used to approximate a function $f(x)$ with the so called $p$-th order Taylor polynomial:
$$
f(x) \approx f(x_0) +\frac{\partial}{\partial x}f(x_0)\Delta x + \frac{1}{2!} \frac{\partial^2}{\partial x^2}f(x_0)\Delta x^2 + \ldots +\frac{1}{p!} \frac{\partial^p}{\partial x^p}f(x_0)\Delta x^p=P_k (x)
f(x) \approx f(x_0) +\frac{\partial}{\partial x}f(x_0)(x-x_0) + \frac{1}{2!} \frac{\partial^2}{\partial x^2}f(x_0)(x-x_0)^2 + \ldots +\frac{1}{p!} \frac{\partial^p}{\partial x^p}f(x_0)(x-x_0)^p=P_k (x)
$$
where it is required that the function $f: \mathbb{R}\mapsto \mathbb{R}$ is $p$-times differentiable at the point $x_0 \in \mathbb{R}$, and $\Delta x = x-x_0$.
where it is required that the function $f: \mathbb{R}\mapsto \mathbb{R}$ is $p$-times differentiable at the point $x_0 \in \mathbb{R}$.
The approximation error is equal to
......@@ -24,16 +24,16 @@ Example:
A linear approximation (also called linearization) of $f(x) = \cos(x)$ at $x_0$ is obtained by the 1st order Taylor polynomial as:
$$
f(x) \approx \cos x_0 – \sin x_0 \Delta x
f(x) \approx \cos x_0 – \sin x_0 \cdot(x-x_0)
$$
## First-order Taylor polynomial for linearizing a function of $n$ variables
In this course we will only need first-order Taylor approximations for linearizing non-linear functions of $x$ being a vector with $n$ variables. The corresponding Taylor polynomial is then given by:
For linearizing non-linear functions of $x$ being a vector with $n$ variables, we need the first-order Taylor polynomial, which is then given by:
$$
f(x) \approx f(x_0) + \partial_{x_1} f(x_0) \Delta x_0+ \partial_{x_2} f(x_0) \Delta x_0+ \ldots + \partial_{x_n} f(x_0) \Delta x_0
$$
where we need the $n$ partial derivatives of function $f$ evaluated at $x_0$.
where $\Delta x_0=(x-x_0)$ and we need the $n$ partial derivatives of function $f$ evaluated at $x_0$.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment