GA 1.3: Modelling Road Deformation using Non-Linear Least-Squares¶

No description has been provided for this image No description has been provided for this image

CEGM1000 MUDE: Week 1.3. Due: Friday, September 20, 2024.

Notes:

  • Don't forget to read the "Assignment Context" section of the README, it contains important information to understand this analysis.
  • The Markdown questions and answers in this notebook are not graded; they are to help you make observations and describe results, which will eventually go in your Report.md.

In [1]:
import numpy as np
from scipy import interpolate
from scipy.stats import norm
import pandas as pd
import matplotlib.pyplot as plt


from functions import *

np.set_printoptions(precision=3)

Part 0: Dictionary Review¶

As described above, several functions in this assignment require the use of a Python dictionary to make it easier to keep track of important data, variables and results for the various models we will be constructing and validating.

It may be useful to revisit PA 1.1, where there was a brief infroduction to dictionaires. That PA contains all the dictionary info you need for GA 1.3. A read-only copy is here and the source code (notebook) is here.

$\textbf{Task 0.1}$

Read and run the cell below to make sure you remember how to use a dictionary.

Modify the function to print some of the other key-value pairs of the dictionary.

It may also be useful to use the cell below when working on later tasks in this assignment.

In [ ]:
my_dictionary = {'key1': 'value1',
                 'key2': 'value2',
                 'name': 'Dictionary Example',
                 'a_list': [1, 2, 3],
                 'an_array': np.array([1, 2, 3]),
                 'a_string': 'hello'
                 }

def function_that_uses_my_dictionary(d):
    print(d['key1'])

    # SOLUTION:
    print(d['name'])
    print(d['a_list'])
    print(d['an_array'])
    print(d['a_string'])

    if 'new_key' in d:
        print('new_key exists and has value:', d['new_key'])
    return

function_that_uses_my_dictionary(my_dictionary)

$\textbf{Task 0.2}$

Test your knowledge by adding a new key new_key and then executing the function to print the value.

In [ ]:
YOUR_CODE_HERE
function_that_uses_my_dictionary(my_dictionary)

Task 1: Preparing the data¶

Within this assignment you will work with two types of data: InSAR data and GNSS data. The cell below will load the data and visualize the observed displacements time. In this task we use the package pandas, which is really useful for handling time series. We will learn how to use it later in the quarter; for now, you only need to recognize that it imports the data as a dataframe object, which we then convert into a numpy array using the code below.

Tip: note that we have converted all observations to millimeters.

In [4]:
gnss = pd.read_csv('./data/gnss_observations.csv')
times_gnss = pd.to_datetime(gnss['times'])
y_gnss = (gnss['observations[m]']).to_numpy()*1000

insar = pd.read_csv('./data/insar_observations.csv')
times_insar = pd.to_datetime(insar['times'])
y_insar = (insar['observations[m]']).to_numpy()*1000

gw = pd.read_csv('./data/groundwater_levels.csv')
times_gw = pd.to_datetime(gw['times'])
y_gw = (gw['observations[mm]']).to_numpy()

Task 1.1:

Once you have used the cell above to import the data, investigate the data sets using the code cell below. Then provide some relevant summary information in the Markdown cell.

Hint: at the least, you should be able to tell how many data points are in each data set and get an understanding of the mean and standard deviation of each. Make sure you compare the different datasets and use consistent units.

In [ ]:
YOUR_CODE_HERE

You may have noticed that the groundwater data is available for different times than the GNSS and InSAR data. You will therefore have to interpolate the data to the same times for a further analysis. You can use the SciPy function interpolate.interp1d (read its documentation).

The cells below do the following:

  1. Define a function to convert the time unit
  2. Convert the time stamps for all data
  3. Use interp1d to interpolate the groundwater measurements at the time of the satellite measurements
In [7]:
def to_days_years(times):
    '''Convert the observation times to days and years.'''
    
    times_datetime = pd.to_datetime(times)
    time_diff = (times_datetime - times_datetime[0])
    days_diff = (time_diff / np.timedelta64(1,'D')).astype(int)
    
    days = days_diff.to_numpy()
    years = days/365
    
    return days, years
In [8]:
days_gnss,  years_gnss  = to_days_years(times_gnss)
days_insar, years_insar = to_days_years(times_insar)
days_gw,    years_gw    = to_days_years(times_gw)

interp = interpolate.interp1d(days_gw, y_gw)

GW_at_GNSS_times = interp(days_gnss)
GW_at_InSAR_times = interp(days_insar)

Task 1.2:

Answer/complete the code and Markdown cells below:

  1. What is interp? (what kind of object is it, and how does it work?)
  2. How did the groundwater observation array change? Be quantitative.

In [ ]:
YOUR_CODE_HERE

Write your answer in this Markdown cell.

Task 1.3:

Create a single plot to compare observed displacement for the GNSS and InSAR data sets.

In [ ]:
plt.figure(figsize=(15,5))
plt.plot(YOUR_CODE_HERE, YOUR_CODE_HERE,
         'o', mec='black', label = 'GNSS')
plt.plot(YOUR_CODE_HERE, YOUR_CODE_HERE,
         'o', mec='black', label = 'InSAR')
plt.legend()
plt.ylabel('Displacement [mm]')
plt.xlabel('Time')
plt.show()

Task 1.4: Describe the datasets based on the figure above and your observations from the previous tasks. What kind of deformation do you see? And what are the differences between both datasets? Be quantitative.

Before we move on, it is time to do a little bit of housekeeping.

Have you found it confusing to keep track of two sets of variables---one for each data type? Let's use a dictionary to store relevant information about each model. We will use this in the plotting functions for this task (and again next week), so make sure you take the time to see what is happening. Review also Part 0 at the top of this notebook if you need a refresher on dictionaries.

Task 1.5:

Run the cell below to define a dictionary for storing information about the two (future) models.

In [11]:
model_insar = {'data_type': 'InSAR',
               'y':y_insar,
               'times':times_insar,
               'groundwater': GW_at_InSAR_times
               }

model_gnss = {'data_type': 'GNSS',
               'y':y_gnss,
               'times':times_gnss,
               'groundwater': GW_at_GNSS_times
               }

Task 2: Set-up linear functional model¶

We want to investigate how we could model the observed displacements of the road. Because the road is built in the Green Heart we expect that the observed displacements are related to the groundwater level. Furthermore, we assume that the displacements can be modeled using a constant velocity. The model is defined as $$ d = d_0 + vt + k \ \textrm{GW}, $$ where $d$ is the displacement, $t$ is time and $\textrm{GW}$ is the groundwater level (that we assume to be deterministic).

Therefore, the model has 3 unknowns:

  1. $d_0$, as the initial displacement at $t_0$;
  2. $v$, as the displacement velocity;
  3. $k$, as the 'groundwater factor', which can be seen as the response of the soil to changes in the groundwater level.

As a group you will construct the functional model that is defined as $$ \mathbb{E}(Y) = \mathrm{A x}. $$

Task 2.1:

Construct the design matrix $A$ (for both InSAR and GNSS observations), then show the first 5 observations and confirm the dimensions of $A$.

In [ ]:
YOUR_CODE_HERE

$\textbf{Task 2.2}$

Answer the following questions:

  • What is the dimension of the observables' vector $Y$?
  • What are the unknowns of the functional model?
  • What is the redundancy for this model?

Task 2.3:

Add the A matrix to the dictionaries for each model. This will be used to plot results later in the notebook.

In [ ]:
model_insar['A'] = YOUR_CODE_HERE
model_gnss['A'] = YOUR_CODE_HERE

3. Set-up stochastic model¶

We will use the Best Linear Unbiased Estimator (BLUE) to solve for the unknown parameters. Therefore we also need a stochastic model, which is defined as $$ \mathbb{D}(Y) = \Sigma_{Y}. $$ where $\Sigma_{Y}$ is the covariance matrix of the observables' vector.

Task 3.1:

Construct the covariance matrix for each type of data and assume that

  • the observables are independent

  • the observables are normally distributed

  • the observables' standard deviation is

    • $\sigma_\textrm{InSAR} = 2$ mm
    • $\sigma_\textrm{GNSS} = 15$ mm

In [ ]:
# YOUR_CODE_HERE

$\textbf{Task 3.2}$

Answer the following questions:

  • What information is contained in the covariance matrix?
  • How do you implement the assumption that all observations are independent?
  • What is the dimension of $\Sigma_{Y}$?
  • How do you create $\Sigma_{Y}$?

Write your answer in this cell.

Task 3.3:

Add Sigma_Y to the dictionaries for each model.

In [17]:
model_insar['Sigma_Y'] = YOUR_CODE_HERE
model_gnss['Sigma_Y'] = YOUR_CODE_HERE

4. Apply best linear unbiased estimation¶

Task 4.1:

Write a function to apply BLUE in the cell below and use the function to estimate the unknowns for the model using the data.

Compute the modeled displacements ($\hat{\mathrm{y}}$), and corresponding residuals ($\hat{\mathrm{\epsilon}}$), as well as associated values (as requested by the blank code lines).

Note on code implementation: you'll see that the functions in this assignment use a dictionary; this greatly reduces the number of input/output variables needed in a function. However, it can make the code inside the function more difficult to read due to the key syntax (e.g., dict['variable_1'] versus variable _1). To make this assignment easier for you to implement we have split these functions into three parts: 1) define variables from the dictionary, 2) perform analysis, 3) add results to the dictionary. Note that this is not the most efficient way to write this code; it is done here specifically for clarity and to help you focus on writing the equations properly and understanding the meaning of each term.

In [18]:
def BLUE(d):
    """Calculate the Best Linear Unbiased Estimator
    
    Uses dict as input/output:
      - inputs defined from existing values in dict
      - outputs defined as new values in dict
    """

    y = d['y']
    A = d['A']
    Sigma_Y = d['Sigma_Y']

    Sigma_X_hat = YOUR_CODE_HERE
    x_hat = YOUR_CODE_HERE
    
    y_hat = YOUR_CODE_HERE

    e_hat = YOUR_CODE_HERE

    Sigma_Y_hat = YOUR_CODE_HERE
    std_Y_hat = YOUR_CODE_HERE

    Sigma_e_hat = YOUR_CODE_HERE
    std_e_hat = YOUR_CODE_HERE

    d['Sigma_X_hat'] = Sigma_X_hat
    d['x_hat'] = x_hat
    d['y_hat'] = y_hat
    d['e_hat'] = e_hat
    d['Sigma_Y_hat'] = Sigma_Y_hat
    d['std_Y_hat'] = std_Y_hat
    d['Sigma_e_hat'] = Sigma_e_hat
    d['std_e_hat'] = std_e_hat

    return d

Task 4.2:

Now that you have completed the function, apply it to our two models and then print values for the estimated parameters.

In [ ]:
model_insar = BLUE(model_insar)
x_hat_insar = model_insar['x_hat']

YOUR_CODE_HERE

model_gnss = BLUE(model_gnss)
x_hat_gnss = model_gnss['x_hat']

YOUR_CODE_HERE

Task 4.3: Do the values that you just estimated make sense? Explain, using quantitative results.

Hint: all you need to do is use the figures created above to verify that the parameter values printed above are reasonable (e.g., order of magnitude, units, etc).

5. Evaluate the precision¶

Task 5:

What is the precision of the final estimates?

Print the full covariance matrix of your estimates, and give an interpretation of the numbers in the covariance matrix.

In [ ]:
Sigma_X_hat_insar = model_insar['Sigma_X_hat']

YOUR_CODE_HERE

Sigma_X_hat_gnss = model_gnss['Sigma_X_hat']

YOUR_CODE_HERE

6. Present and reflect on estimation results¶

Task 6.1:

Complete the function below to help us compute the confidence intervals, then apply the function. Use a confidence interval of 96% in your analysis.

Hint: it can be used in exactly the same way as the BLUE function above, although it has one extra input.

In [ ]:
def get_CI(d, alpha):
    """Compute the confidence intervals.
    
    Uses dict as input/output:
      - inputs defined from existing values in dict
      - outputs defined as new values in dict
    """

    std_e_hat = d['std_e_hat']
    std_Y_hat = d['std_Y_hat']

    k = YOUR_CODE_HERE
    CI_Y_hat = YOUR_CODE_HERE
    CI_res = YOUR_CODE_HERE

    d['alpha'] = alpha
    d['CI_Y_hat'] = CI_Y_hat
    d['CI_res'] = CI_res

    return d
In [30]:
model_insar = YOUR_CODE_HERE
model_gnss = YOUR_CODE_HERE

At this point we have all the important results entered in our dictionary and we will be able to use the plots that have been written for you in the next Tasks. In case you would like to easily see all of the key-value pairs that have been added to the dictionary, you can run the cell below:

In [ ]:
print("Keys and Values (type) for model_insar:")
for key, value in model_insar.items():
    print(f"{key:16s} -->    {type(value)}")
print("\nKeys and Values (type) for model_gnss:")
for key, value in model_gnss.items():
    print(f"{key:16s} -->    {type(value)}")

Task 6.2:

Read the contents of file functions.py and identify what it is doing: you should be able to recognize that they use our model dictionary as an input and create three different figures. Note also that the function to create the figures have already been imported at the top of this notebook.

Use the functions provided to visualize the results of our two models.

Note: remember that you will have to use the same function to look at both models when writing your interpretation in the Report.

In [ ]:
_, _ = plot_model(YOUR_CODE_HERE)
In [ ]:
_, _ = plot_residual(YOUR_CODE_HERE)
In [ ]:
_, _ = plot_residual_histogram(YOUR_CODE_HERE)

End of notebook.

Creative Commons License TU Delft MUDE

© Copyright 2024 MUDE TU Delft. This work is licensed under a CC BY 4.0 License.