Skip to content
Snippets Groups Projects
Commit a1fc67fe authored by Patricia Mares Nasarre's avatar Patricia Mares Nasarre
Browse files

Merge branch 'week8-robert' into 'week8'

Week 8 adjustments from Robert

See merge request !394
parents 00b96945 0e3a1ec8
No related branches found
No related tags found
3 merge requests!399Publish Week 1.8,!398week 8 ready,!394Week 8 adjustments from Robert
......@@ -91,12 +91,25 @@ parts:
- file: probability/GOF
- file: probability/Loc-scale.ipynb
- file: multivariate/Introduction
title: Multivariate distributions
- file: multivariate/overview
title: Multivariate Distributions
sections:
- file: multivariate/AND_OR.md
- file: multivariate/Multi_gaussian.md
- file: multivariate/Other_multi.md
- file: multivariate/events
title: Dependent Events
- file: multivariate/variables
title: Continuous Variables
- file: multivariate/gaussian
title: Multivariate Gaussian
- file: multivariate/nongaussian
title: Non-Gaussian Distributions
- file: multivariate/functions
title: Functions of Random Variables
sections:
- file: rr/prob-design/one-random-variable
- file: rr/prob-design/two-random-variables
- file: rr/prob-design/exercise
sections:
- file: rr/prob-design/exercise_solutions
# - caption: Q2 Topics
# numbered: 2
......@@ -441,7 +454,11 @@ parts:
- file: external/learn-python/book/08/handling_errors.ipynb
title: Handling Errors
- file: external/learn-python/book/08/asserts.ipynb
- file: programming/week_1_7
sections:
- file: external/learn-programming/book/python/oop/classes
- caption: Fundamental Concepts
numbered: 2
chapters:
......
Subproject commit d44691b9b3e567fa23ea28062f8bee3af9a4cfd2
Subproject commit 1e53d4f545797d14fd57f8ef28fec4bcc2fb14ea
Subproject commit 9edbcc9bb17dac02a3616bd63c704f10406e603d
Subproject commit f4c10f32e1e161d93b6018b387a85c3294055f87
# Multivariate continuous distributions
Challenges in Civil engineering and Geosciences involve working in data scarce scenarios (epistemic uncertainty) with natural phenomena of stochastic nature (aleatoric uncertainty). Univariate continuous distributions assists us in modelling the uncertainty of a variable to later include that uncertainty in the decision-making process or risk analysis, between others. However, these problems are typically complex and, usually, they involve more than one variable. For instance, when assessing the algae bloom in a water body different variables need to be considered, such as the amount of nutrients in the water (nitrogen and phosphorus), the dissolved oxigen in the water or the temperature of the water body. Sometimes (although not frequently) these variables are not related to each other, so we can consider them independent. However, frequently, these variables are generated by the same drivers and, thus, they are dependent to each other. For intance, the algae bloom is influenced by the temperature and availability of nutrients, but also the availability of nutrients can be influenced by the temperature of water. Therefore, there can be a relationship between those variables that we need to consider in an accurate analysis. There is where multivariate probability distributions are helpful as they allow us to model the distribution of not only one variable but several at the same time thus accounting for their dependence.
## Revisiting correlation
Probabilistic dependence between random variables is usually quantified through correlation coefficients. Previous, you have already been introduced to the concept of [correlation](correl) and Pearson correlation coefficient, which is given by
$$
\rho(X_1,X_2)=\frac{Cov(X_1,X_2)}{\sigma_{X_1} \sigma_{X_2}}
$$
where $X_1$ and $X_2$ are two random variables, $Cov(X_1,X_2)$ is their covariance, and $\sigma_{X_1}$ and $\sigma_{X_2}$ are the standard deviations of $X_1$ and $X_2$. $-1 \leq \rho(X_1,X_2) \leq 1$, being $\rho(X_1,X_2)=-1$ a perfect negative linear correlation, and $\rho(X_1,X_2)=1$ a perfect positive linear correlation. If $\rho(X_1,X_2) \to 0$, we say that $X_1$ and $X_2$ are independent[^note]. This is, that having information about $X_1$ does not provide us with information about $X_2$. The interactive element below allows you to play around with the correlation value yourself. Observe how the distribution's density contours, or the scattered data, changes when you adapt the correlation value.
<iframe src="../_static/elements/element_correlation.html" width="600" height="400" frameborder="0"></iframe>
[^note]: That is an intuituve definition of independence. For a more formal definition of independence, visit the next page of the chapter.
\ No newline at end of file
(multivariate_events)=
# Events: Fundamentally Refreshing
# Basic concepts to start
Before going further into _continuous multivariate distributions,_ we will start with a reminder of some basic concepts you have seen previously: independence and conditional probability, illustrated by considering the probability of _events._
Before going further into continuous multivariate distributions, we will start with a reminder of some basic concepts you have seen previously: independence, AND and OR probabilities, and conditional probability.
```{admonition} Event
:class: tip
In probability theory, an _event_ is considered to be the outcome of an experiment with a specific probability.
**some other stuff**
If you also need further practice or to revisit other concepts such as mutually exclusive events or collectively exhaustive, you can go [here](https://teachbooks.github.io/learn-probability/section_01/Must_know_%20probability_concepts.html).
```
## Discrete Events
As we are working towards multivariate _continuous_ distributions we will these events will be referred to _discrete_ events to distinguish them from.
In this case our sample space is:
- still 1
- each event is a random variable
- to facilitate the venn diagram and "event-based" analogies we will only consider binary cases for each event, so $\leq$ and $>$ cases (can illustrate for more than binary cases, but why bother)
-
## AND and OR probabilities: Venn diagrams
Great. Now let's review a few key concepts (quickly!).
Let's move back to discrete events to explain what AND and OR probabilities are. Imagine two events, A and B. These can be, for instance, the fact that it rains today (A) and the fact that the temperature is below 10 degrees (B). Each of these events will have a probability of ocurring, denoted here as $P(A)$ and $P(B)$, respectively.
**idea**: simply list condition, total probability, independence rule, Bayes rule (lays out terms and usage), then the following sections briefly illustrate these things with the flood example and Venn diagrams.
## Case Study
```{figure} ../events.png
Imagine two events, A and B:
- A represents river A flooding
- B represents river B flooding
Each of these events will have a probability of ocurring, denoted here as $P(A)$ and $P(B)$, respectively.
```{figure} ./figures/venn-events.png
---
......@@ -20,7 +46,7 @@ Venn diagram of the events A and B.
The AND probility or intersection of the events A and B, $P(A \cap B)$, is defined as the probability that both events happen at the same time and, thus, it would be represented in our diagram as shown in the figure below.
```{figure} ../intersection.png
```{figure} ./figures/venn-intersection.png
---
......@@ -28,6 +54,16 @@ The AND probility or intersection of the events A and B, $P(A \cap B)$, is defin
Venn diagram of the events A and B, and AND probability.
```
Thus there are two ways we have of describing the same probability:
- intersection
- AND
we will use these interchangeably.
Keep an eye out for related (English) words: both, together, joint, ...
###
The OR probability or union of the events A and B, $P(A \cup B)$, is defined as the probability that either one of the two events happen or both of them. This probability can be computed as
$$
......@@ -36,25 +72,14 @@ $$
This is, we add the probabilities of occurrence of the event A and B and we deduct the intersection of them, to avoid counting them twice.
## AND and OR probabilities from samples
## Independence
When two random variables, X and Y, are independent, it means that the occurrence or value of one variable does not influence the occurrence or value of the other variable.
Formally, X and Y are considered independent **if and only if** the joint probability function (or cumulative distribution function) can be factorized into the product of their marginal probability functions (or cumulative distribution functions). This is,
$F(x, y) = P(x<X \bigcap y<Y ) = P(x<X)P(y<Y) = F(x)F(y)$
The different relationships above highlights the connection between the joint cumulative distribution function (CDF) and the marginal CDFs of two independent random variables, X and Y.
Definition of independence
Definition of And and OR probabilities using Venn diagrams
## AND and OR probabilities from samples
Move to continuous distributions and compute them from samples
**we can illustrate samples in the venn diagram as dots with labels:
- simple counting exercises will illustrate the probabilities
-
## Conditional probability
......
(multivar_functions)=
# Functions of Random Variables
In contrast to earlier weeks, this is where there is dependence between the input random variables.
binary cases are **boring**
Up to this point we have only done binary cases with two RV's (both or either river flooding). In other words, when the problem of interest is based on some set of A and B. What about when this is described as a continuous function? Then we have a (familiar) function of random variables! We will illustrate this case in the next section. **this is where the 1- and 2-random variable pages from 2023 can be included.
**1- and 2-random variable cases from 2023.**
\ No newline at end of file
......@@ -5,4 +5,6 @@ Definition of bivariate Gaussian
Move to 3D
Analytical conditionalization of the 3D Gaussian: 2D margin!
\ No newline at end of file
Analytical conditionalization of the 3D Gaussian: 2D margin!
**case study**: return to the river flooding case and illustrate the effect of dependence. figure and table.
\ No newline at end of file
File moved
(mulitivar)=
# Multivariate Distributions
Challenges in all branches of science and engineering involve working in situations where uncertainty plays a significant role: prior chapters of this book focused a lot on _error_, but often we must also deal with data scarce scenarios (often categorized as _epistemic_ uncertainty) and natural phenomena with a significant _stochastic_ nature (often categorized as _aleatoric_ uncertainty). As seen in the previous chapter, univariate continuous distributions can assist us in modelling uncertainty associated with a specific variable in order to quantitatively account for uncertainty in general; the distribution helps inform the decision-making process or risk analysis. However, our problems of interest are typically complex and usually involve more than one variable: a _multivariate_ situation.
Consider, for example, when assessing algae blooms in a water body (e.g., a freshwater lake), different variables must be be considered, such as nutrients in the water (nitrogen and phosphorus) and their concentration, dissolved oxygen and temperature of the water body. Sometimes (although not frequently), these variables are not related to each other, so we can consider them _independent._ For example, the amount of nitrogen and phosphorous that reaches the lake may not be related to the water temperature of the lake. However, truly _independent_ situations rarely occur in reality, and the variables are _dependent_ on each other. For example, the concentration of nitrogen and phosphorous changes with time and the reaction rates are dependent on the temperature of the water; thus if you are interested in these quantities over time, temperature is certainly a _dependent variable_ of interest.
Although dependent relationships can often be quantified using deterministic relationships (e.g., mechanistic or phenomenological models), probability distributions are also capable of capturing this behavior. This is where _multivariate probability distributions_ are helpful, as they allow us to model the distribution of not only one variable but several at the same time, thus accounting for their dependence.
**Revisiting correlation**
Probabilistic dependence between random variables is usually quantified through correlation coefficients. Previous, you have already been introduced to the concept of [correlation](correl) and Pearson correlation coefficient, which is given by
$$
\rho(X_1,X_2)=\frac{Cov(X_1,X_2)}{\sigma_{X_1} \sigma_{X_2}}
$$
where $X_1$ and $X_2$ are random variables, $Cov(X_1,X_2)$ is their covariance, and $\sigma_{X_1}$ and $\sigma_{X_2}$ are the standard deviations of $X_1$ and $X_2$. $-1 \leq \rho(X_1,X_2) \leq 1$, being $\rho(X_1,X_2)=-1$ a perfect negative linear correlation, and $\rho(X_1,X_2)=1$ a perfect positive linear correlation. If $\rho(X_1,X_2) \to 0$, we say that $X_1$ and $X_2$ are independent[^note]. This is, that having information about $X_1$ does not provide us with information about $X_2$. The interactive element below allows you to play around with the correlation value yourself. Observe how the distribution's _density_ contours, or a scatter plot of _samples,_ change when you adjust the correlation.
<iframe src="../_static/elements/element_correlation.html" width="600" height="400" frameborder="0"></iframe>
**Overview of this Chapter**
Our ultimate goal is to construct and validate a model to quantify probability for combinations of more than one random variable of interest (i.e., to quantify various types of uncertainty). Specifically,
$$
f_X(x) \;\; \textrm{and} \;\; F_X(x)
$$
where $X$ is a vector of continuous random variables and $f$ and $F$ are the multivariate probability density function (PDF) and cumulative distribution functions (CDF), respectively. Often we will use _bivariate_ situations (two random variables) to illustrate key concepts, for example:
$$
f_{X_1,X_2}(x_1,x_2) \;\; \textrm{and} \;\; F_{X_1,X_2}(x_1,x_2)
$$
This chapter begins with a refresher on some fundamental aspects of probability theory that are typically covered in BSc courses on the subject, for example, dependence/independence, probability of binary events and conditional probability. Using the _bivariate_ paradigm, we will build a foundation on which to apply the multivariate Gaussian distribution (introduced in earlier chapters), as well as introduce alternative _multivariate distributions._ The chapter ends with a brief introduction to _copulas_ as a straightforward approach for evaluating two random variables that are dependent _and_ described by non-Gaussian marginal distributions (the previous chapter).
[^note]: That is an intuitive definition of independence. For a more formal definition of independence, visit the next page of the chapter.
\ No newline at end of file
(multivariate_variables)=
# Multivariate Random Variables
## Independence
When two random variables, X and Y, are independent, it means that the occurrence or value of one variable does not influence the occurrence or value of the other variable.
Formally, X and Y are considered independent **if and only if** the joint probability function (or cumulative distribution function) can be factorized into the product of their marginal probability functions (or cumulative distribution functions). This is,
$F(x, y) = P(x<X \bigcap y<Y ) = P(x<X)P(y<Y) = F(x)F(y)$
The different relationships above highlights the connection between the joint cumulative distribution function (CDF) and the marginal CDFs of two independent random variables, X and Y.
Definition of independence
### Illustration
Definition of And and OR probabilities using bivariate plots and compared to the Venn diagrams.
## Samples
Compute probabilities from samples.
Illustrate the difference between the theoretical and empirical probabilities. Include a table that summarizes them and describe how this can be used to validate the multivariate distribution (**obviously** we should illustrate a case where dependence is important: many observations where _both_ rivers flood).
**Illustrate explicitly that this is the thing that is inaccurate in the example:**
$$
F_{X_1,X_2}(X_1>x_1,X_2>x_2)
$$
**So now we need a way to describe dependence!**
\ No newline at end of file
# Week 1.7: OOP
This chapter contains a lot of information that is useful for improving your programming skills, however you are not required to learn all of it, and not required to memorize everything for the exam.
You should be able to understand the fundamental concepts of classes and object-oriented programming (OOP) in Python as well as the key principles of encapsulation, inheritance, and polymorphism in OOP, which will enable you to better understand and use the classes that are everywhere in Python packages. For example, the class `rv_continuous` in `scipy.stats`, which is used for defining probability distributions, are used heavily in MUDE!
One way to check if you understand this material sufficiently: by the end of Week 1.7 you should realize why OOP is so useful for the continuous parametric distributions we use. For example, recognizing that all distributions can be defined and used in the same way. For example:
- methods like `.pdf()` and `.cdf()` can be used predictably
- you can easily define distributions in terms of parameters or moments
- fitting distributions is also straightforward
For non-probability topics, this will help you recognize why objects in Python packages like Numpy have syntax like `object.mean()` or `object.shape`.
We hope you enjoy this eye-opening experience!
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment