Case 1: Wave impacts on a crest wall¶
**What's the propagated uncertainty? *How large will be the horizontal force?***
In this project, you have chosen to work on the uncertainty of wave periods and wave heights in the Alboran sea to estimate the impacts on a crest wall: a concrete element installed on top of mound breakwater. You have observations from buoys of the significant wave height ($H$) and the peak wave period ($T$) each hour for several years. As you know, $H$ and $T$ are hydrodynamic variables relevant to estimate wave impacts on the structure. The maximum horizontal force (exceeded by 0.1% of incoming waves) can be estimated using the following equation (USACE, 2002).
$$ F_h = \left( A_1 + A_2 \frac{H}{A_c} \right) \rho g C_h L_{0p} $$
where $A_1=-0.016$ and $A_2=0.025$ are coefficients that depend on the geometry of the structure, $A_c=3m$ is the elevation of the frontal berm of the structure, $\rho$ is the density of water, $g$ is the gravity acceleration, $C_h=2m$ is the crown wall height, and $L_{0p}=\frac{gT^2}{2\pi}$ is the wave length in deep waves. Thus, the previous equation is reduced to
$$ F_h = 255.4 H T^2 -490.4 T^2 $$
The goal of this project is:
- Choose a reasonable distribution function for $H$ and $T$.
- Fit the chosen distributions to the observations of $H$ and $T$.
- Assuming $H$ and $d$ are independent, propagate their distributions to obtain the distribution of $F_h$.
- Analyze the distribution of $F_h$.
Importing packages¶
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from math import ceil, trunc
plt.rcParams.update({'font.size': 14})
1. Explore the data¶
First step in the analysis is exploring the data, visually and through its statistics.
# Import
T10, T90 = np.genfromtxt('dataset_T.csv', delimiter=",", unpack=True, skip_header=True)
# plot time series
fig, ax = plt.subplots(2, 1, figsize=(10, 7), layout = 'constrained')
ax[0].plot(T10,'k')
ax[0].set_xlabel('Time')
ax[0].set_ylabel('Min temperature (degrees)')
ax[0].grid()
ax[1].plot(T90,'k')
ax[1].set_xlabel('Time')
ax[1].set_ylabel('Max temperature (degrees)')
ax[1].grid()
# Statistics for H
print(stats.describe(T10))
# Statistics for d
print(stats.describe(T90))
Task 1: Describe the data based on the previous statistics:
2. Empirical distribution functions¶
Now, we are going to compute and plot the empirical PDF and CDF for each variable. Note that you have the pseudo-code for the empirical CDF in the reader.
Task 2: Define a function to compute the empirical CDF.
def ecdf(var):
x = np.sort(var) # sort the values from small to large
n = x.size # determine the number of datapoints\
y = np.arange(1, n+1) / (n+1)
return [y, x]
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
axes[0].hist(T10, edgecolor='k', linewidth=0.2,
color='cornflowerblue', label='Min temperature (degrees)', density = True)
axes[0].set_xlabel('Random variable (X)')
axes[0].set_ylabel('density')
axes[0].hist(T90, edgecolor='k', linewidth=0.2, alpha = 0.5,
color='grey', label='Max temperature (degrees)', density = True)
axes[0].set_title('PDF', fontsize=18)
axes[0].grid()
axes[0].legend()
axes[1].step(ecdf(T10)[1], ecdf(T10)[0],
color='cornflowerblue', label='Min temperature (degrees)')
axes[1].set_xlabel('Random variable (X)')
axes[1].set_ylabel('${P[X \leq x]}$')
axes[1].step(ecdf(T90)[1], ecdf(T90)[0],
color='grey', label='Max temperature (degrees)')
axes[1].set_title('CDF', fontsize=18)
axes[1].legend()
axes[1].grid()
Task 3: TO UPDATE Based on the results of Task 1 and the empirical PDF and CDF, select one distribution to fit to each variable. For $H$, select between Exponential or Gaussian distribution, while for $T$ choose between Uniform or Gumbel.
3. Fitting a distribution¶
Task 4: Fit the selected distributions to the observations using MLE.
Hint: Use Scipy built in functions (watch out with the parameters definition!).
params_T10 = stats.uniform.fit(T10)
params_T90 = stats.uniform.fit(T90)
4. Assessing goodness of fit¶
Task 5: Assess the goodness of fit of the selected distribution using:
Hint: You have Kolmogorov-Smirnov test implemented in Scipy.
#Graphical method
#Logscale
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
axes[0].step(ecdf(T10)[1], 1-ecdf(T10)[0],
color='k', label='Min temperature (degrees)')
axes[0].plot(ecdf(T10)[1], 1-stats.uniform.cdf(ecdf(T10)[1], *params_T10),
'--', color = 'grey', label='Uniform')
axes[0].set_xlabel('Min temperature (degrees)')
axes[0].set_ylabel('${P[X > x]}$')
axes[0].set_title('Min temperature', fontsize=18)
axes[0].set_yscale('log')
axes[0].legend()
axes[0].grid()
axes[1].step(ecdf(T90)[1], 1-ecdf(T90)[0],
color='k', label='Max temperature (degrees)')
axes[1].plot(ecdf(T90)[1], 1-stats.uniform.cdf(ecdf(T90)[1], *params_T90),
'--', color = 'grey', label='Uniform')
axes[1].set_xlabel('Max temperature (degrees)')
axes[1].set_ylabel('${P[X > x]}$')
axes[1].set_title('Max temperature', fontsize=18)
axes[1].set_yscale('log')
axes[1].legend()
axes[1].grid()
# QQplot
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
axes[0].plot([trunc(min(H)), ceil(max(H))], [trunc(min(H)), ceil(max(H))], 'k')
axes[0].scatter(ecdf(H)[1], stats.expon.ppf(ecdf(H)[0], *params_H),
color='grey', label='Exponential')
axes[0].set_xlabel('Observed H (m)')
axes[0].set_ylabel('Estimated H (m)')
axes[0].set_title('H', fontsize=18)
axes[0].set_xlim([trunc(min(H)), ceil(max(H))])
axes[0].set_ylim([trunc(min(H)), ceil(max(H))])
axes[0].legend()
axes[0].grid()
axes[1].plot([trunc(min(T)), ceil(max(T))], [trunc(min(T)), ceil(max(T))], 'k')
axes[1].scatter(ecdf(T)[1], stats.gumbel_r.ppf(ecdf(T)[0], *params_T),
color='grey', label='Gumbel')
axes[1].set_xlabel('Observed T (s)')
axes[1].set_ylabel('Estimated T (s)')
axes[1].set_title('T', fontsize=18)
axes[1].set_xlim([trunc(min(T)), ceil(max(T))])
axes[1].set_ylim([trunc(min(T)), ceil(max(T))])
axes[1].legend()
axes[1].grid()
#KStest
_, p_H = stats.kstest(H,stats.expon.cdf, args=params_H)
_, p_T = stats.kstest(T,stats.gumbel_r.cdf, args=params_T)
print('The p-value for the fitted Gumbel distribution to H is:', round(p_H, 3))
print('The p-value for the fitted Uniform distribution to d is:', round(p_T, 3))
Task 6: Interpret the results of the GOF techniques. How does the selected parametric distribution perform?
5. Propagating the uncertainty¶
Using the fitted distributions, we are going to propagate the uncertainty from $H$ and $T$ to $F_h$ assuming that $H$ and $T$ are independent.
Task 7:
Draw 10,000 random samples from the fitted distribution functions for $H$ and $T$.
Compute $F_h$ for each pair of samples.
Compute $F_h$ for the observations.
Plot the PDF and exceedance curve in logscale of $F_h$ computed using both the simulations and the observations.
# Here, the solution is shown for the Lognormal distribution
# Draw random samples
rs_H = stats.expon.rvs(*params_H, size = 10000)
rs_T = stats.gumbel_r.rvs(*params_T, size = 10000)
#Compute Fh
rs_Fh = 255.4 * rs_H * rs_T**2 - 490.4*rs_T**2
#repeat for observations
Fh = 255.4 * H * T**2 - 490.4*T**2
#plot the PDF and the CDF
fig, axes = plt.subplots(1, 2, figsize=(12, 7))
axes[0].hist(rs_Fh, edgecolor='k', linewidth=0.2, density = True, label = 'From simulations')
axes[0].hist(Fh, edgecolor='k', facecolor = 'orange', alpha = 0.5, linewidth=0.2,
density = True, label = 'From observations')
axes[0].set_xlabel('Horizontal force (kN)')
axes[0].set_ylabel('density')
axes[0].set_title('PDF', fontsize=18)
axes[0].legend()
axes[0].grid()
axes[1].step(ecdf(rs_Fh)[1], 1-ecdf(rs_Fh)[0], label = 'From simulations')
axes[1].step(ecdf(Fh)[1], 1-ecdf(Fh)[0], color = 'orange', label = 'From observations')
axes[1].set_xlabel('Horizontal force (kN)')
axes[1].set_ylabel('${P[X > x]}$')
axes[1].set_title('Exceedance plot', fontsize=18)
axes[1].set_yscale('log')
axes[1].legend()
axes[1].grid()
Task 8: Interpret the figures above, answering the following questions:
If you run the code in the cell below, you will obtain a scatter plot of both variables. Explore the relationship between both variables and answer the following questions:
Task 9:
Observe the plot below. What differences do you observe between the generated samples and the observations?
Compute the correlation between $H$ and $T$ for the samples and for the observartions. Are there differences?
What can you improve into the previous analysis? Do you have any ideas/suggestions on how to implement those suggestions?
fig, axes = plt.subplots(1, 1, figsize=(7, 7))
axes.scatter(rs_H, rs_T, 40, 'k', label = 'Simulations')
axes.scatter(H, T, 40, 'r','x', label = 'Observations')
axes.set_xlabel('Wave height, H (m)')
axes.set_ylabel('Wave period, T (s)')
axes.legend()
axes.grid()
End of notebook.