GA 1.4: Modelling Road Deformation using Non-Linear Least-Squares¶

No description has been provided for this image No description has been provided for this image

CEGM1000 MUDE: Week 1.4. Due: Friday, September 27, 2024.

Part 0: Recap of GA 1.3¶

As you (hopefully!) recall, last week we

Task 2: Set-up linear functional model¶

We want to investigate observed displacements of a in the Green Heart of the Netherlands, where groundwater levels play a significant role. The model was defined as:

$$ d = d_0 + vt + k \ \textrm{GW}, $$

where $d$ is the displacement, $t$ is time and $\textrm{GW}$ is the groundwater level (that we assume to be deterministic).

Therefore, the model has 3 unknowns:

  1. $d_0$, as the initial displacement at $t_0$;
  2. $v$, as the displacement velocity;
  3. $k$, as the 'groundwater factor', which can be seen as the response of the soil to changes in the groundwater level.

We used BLUE to create a linear model, which we conveniently stored in a dictionary.

In [1]:
import numpy as np
from scipy import interpolate
from scipy.stats import norm
import pandas as pd
import matplotlib.pyplot as plt
import ipywidgets as widgets
from ipywidgets import interact, fixed
from scipy.stats.distributions import chi2

from functions import *

np.set_printoptions(precision=3)

Task 0.1:

Run the cell below to load two dictionaries (objects) into the Python variable space, m1 and m2. You can uncomment and run the code to check the key:value pairs.

Note that the objects are stored in and loaded from "pickle" files, a native Python serialization format (in non-technical speak: a pickle file saves our variables to use them later, like we do here). The model dictionaries were defined and saved in the file setup.py, which you are welcome to read if you are curious.

In [2]:
m1_blue = load_pickle_file('m1_blue.pickle')
m2_blue = load_pickle_file('m2_blue.pickle')

It's not very easy to get a sense of the contents of the dictionary, so we made a simple function to help summarize it:

In [3]:
model_summary(m1_blue)
model_summary(m2_blue)
Summary of Model
----------------
  Data type: InSAR
  Model type: BLUE
  Number of observations: 61
  Model parameters:
    x_0 =  9.174  +/-  2.128
    x_1 = -0.024  +/-  0.001
    x_2 =  0.202  +/-  0.016
----------------

Summary of Model
----------------
  Data type: GNSS
  Model type: BLUE
  Number of observations: 730
  Model parameters:
    x_0 =  1.181  +/-  4.647
    x_1 = -0.021  +/-  0.003
    x_2 =  0.160  +/-  0.035
----------------

In [4]:
for key in m1_blue.keys():
    print(key)
data_type
model_type
times
y
days
groundwater
groundwater_data
A
std
Sigma_Y
Sigma_X_hat
x_hat
y_hat
e_hat
Sigma_Y_hat
std_y
Sigma_e_hat
std_e_hat
k
CI_Y
CI_res
CI_Y_hat
alpha

Task 0.2:

Run the cell below to activate the widget. Play with the parameter values until you are completely familiar with what each one does, and confirm that the model results from last week are indeed those that minimize the mean of the squared residuals.

In [5]:
x0_slider = widgets.FloatSlider(value=0, min=-10, max=10, step=0.1, description='x0')
x1_slider = widgets.FloatSlider(value=0, min=-0.1, max=0.1, step=0.001, description='x1')
x2_slider = widgets.FloatSlider(value=1, min=-1, max=1, step=0.01, description='x2')

interact(model_widget,
         x0=x0_slider, x1=x1_slider, x2=x2_slider, x3=fixed(None),
         m=[('Data Type 1', m1_blue), ('Data Type 2', m2_blue)]);
interactive(children=(FloatSlider(value=0.0, description='x0', max=10.0, min=-10.0), FloatSlider(value=0.0, de…

Preparing for Another Set of Models¶

Hopefully you found it useful for keeping track of our models in dictionaries. If not, well---maybe you will after this GA, because we are going to do it again! The next cell takes care of initializing a new set of dictionaries for our work

Tip: if you would like to see a list of the keys and values in a dictionary, you can try adapting this code:

print("Keys and Values:")
for key, value in YOUR_DICTIONARY_NAME.items():
    print(f"{key:16s} -->    {value}")

Task 0.3:

Run the cell below to initialize the dictionaries for our non-linear least squares models. Once again, we will have a model for GNSS and another for INSAR data.

In [6]:
def initialize_new_dict(d_old):
    d = {}
    d['data_type'] = d_old['data_type']
    d['model_type'] = 'Non-Linear Least Squares'
    d['times'] = d_old['times']
    d['y'] = d_old['y']
    d['Sigma_Y'] = d_old['Sigma_Y']
    d['days'] = d_old['days']
    d['groundwater'] = d_old['groundwater']
    d['groundwater_data'] = d_old['groundwater_data']
    return d

m1 = initialize_new_dict(m1_blue)
m2 = initialize_new_dict(m2_blue)

Task 0.4:

Confirm that the stochastic model is transferred properly by printing the appropriate key-value pair in the dictionary.

In [7]:
# YOUR_CODE_HERE

# SOLUTION:
# First uncomment and run this to quickly view the keys:
# for key in m1.keys():
#     print(key)
# Clearly Sigma_Y is what we need:
print(m1['Sigma_Y'])
print(m2['Sigma_Y'])
[[4. 0. 0. ... 0. 0. 0.]
 [0. 4. 0. ... 0. 0. 0.]
 [0. 0. 4. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 4. 0. 0.]
 [0. 0. 0. ... 0. 4. 0.]
 [0. 0. 0. ... 0. 0. 4.]]
[[225.   0.   0. ...   0.   0.   0.]
 [  0. 225.   0. ...   0.   0.   0.]
 [  0.   0. 225. ...   0.   0.   0.]
 ...
 [  0.   0.   0. ... 225.   0.   0.]
 [  0.   0.   0. ...   0. 225.   0.]
 [  0.   0.   0. ...   0.   0. 225.]]

Part 1: Set-up Non-Linear Functional Model¶

In the model we fitted using BLUE, we only considered a linear velocity and a dependency on the groundwater level. However, when a heavy construction is built on 'soft' soil layers, compaction of the upper layers will be observed. This compaction is relatively large in the beginning but decreases when time passes. We can approach the behavior with a simplified model, assuming an exponential decay.

Please note that this is very simplified model that does not necessarily rely on physics.

The new model is defined as $$ d = d_0 + R \ \left(1-\exp\left(\frac{-t}{a}\right)\right) + k \ \textrm{GW}, $$ where $d$ is the displacement, $t$ is the time and $\textrm{GW}$ is the groundwater level.

Therefore, the new model has 4 unknowns:

  1. $d_0$, as the initial displacement at $t_0$;
  2. $R$, which can be seen as the response of the soil layers to the extra weight of the road. Notice that when it is zero, there would be no compaction due to the extra loading;
  3. $a$, as scaling parameter that represents the memory of the system;
  4. $k$, as 'groundwater factor' that can be seen as the response of the layers due to changes in the groundwater level.

It can be seen that the model is non-linear. We will therefore use non-linear least-squares to solve for the unknown parameters.

Functional Model: Python Implementation¶

We will use a Python function to evaluate the non-linear function. Recall that the general form of the functional model is $\mathrm{Y}=q(\mathrm{X})$, where $\mathrm{X}$ represents the vector of model parameters. Note also that to implement this model in practice often additional (deterministic!) parameters are required. This can be written in a Python function with the following form of input arguments:

Y = functional_model(X, <auxiliary_arguments>)

Where <auxiliary_arguments> will be different in type and/or number on a case-by-case basis. Your code will generally be more compact and adaptable to other cases if the parameters are specified in a list, array or tuple.

Task 1.1:

Read the code below to understand the functional model (and fill in the missing parameter values). Be sure to understand which objects are the parameters of interest, and what the auxiliary arguments are (type and number).

In [8]:
def functional_model(X, d):
    """Functional model, q: ground surface displacement.

    Inputs:
      X: tuple, list or array of parameters
         (d, R, a, k)
      d: dictionary of model parameters

    Outputs: ndarray of ground level

    Note: "times" is not included as an argument because
    the model is only configured to compute the functional
    model at times where the groundwater measurements have
    been interpolated (i.e., the times of the satellite
    observations). To use this model for prediction, the
    interpolation function must be incorporated in the
    dictionary and/or the function. 
    """
    
    # Hint: use d['days'] and d['groundwater'] for
    #       the deterministic parameters

    # y = (YOUR_CODE_HERE
    #      + YOUR_CODE_HERE*(1 - np.exp(-d['days']/YOUR_CODE_HERE))
    #      + YOUR_CODE_HEREX[3]*d['groundwater']
    #      )
    
    # SOLUTION:
    y = (X[0]
         + X[1]*(1 - np.exp(-d['days']/X[2]))
         + X[3]*d['groundwater']
         )
    
    return y

m1['functional_model'] = functional_model
m2['functional_model'] = functional_model

Task 1.2:

Use the widget below to gain understanding of the non-linear model and its parameters, as well as the role that the deterministic parameters (observation times and groundwater measurements) play in the functional model.

In [9]:
x0_slider = widgets.FloatSlider(value=0, min=-10, max=10, step=0.1, description='x0')
x1_slider = widgets.FloatSlider(value=0, min=-50, max=50, step=1, description='x1')
x2_slider = widgets.FloatSlider(value=1, min=10, max=1000, step=10, description='x2')
x3_slider = widgets.FloatSlider(value=1, min=-1, max=1, step=0.01, description='x3')

interact(model_widget,
         x0=x0_slider, x1=x1_slider, x2=x2_slider, x3=x3_slider,
         m=[('Data Type 1', m1), ('Data Type 2', m2)]);
interactive(children=(FloatSlider(value=0.0, description='x0', max=10.0, min=-10.0), FloatSlider(value=0.0, de…

Task 1.3:

Choose initial values for the model parameters. Use the code and Markdown cells below to justify your decision. We suggest two possible approaches: a) use the functional model and make a plot to see if you can get it in the right order of magnitude, or b) make an inference about what the values might be using knowledge about each term in the model.

In [10]:
# d_i = YOUR_CODE_HERE
# R_i = YOUR_CODE_HERE
# a_i = YOUR_CODE_HERE
# k_i = YOUR_CODE_HERE

# SOLUTION (values don't need to be exactly the same)
d_i = 9
R_i = -25
a_i = 300
k_i = 0.15

initial_guess = (d_i, R_i, a_i, k_i)

Write your answer in this Markdown cell.

$\textbf{Solution}$

  • For $d_0$ and $k$ you could use the estimated values from the linear model

  • For $R$: realize that it is the difference between displacement at start and end of the observation interval (look at plot with data).

  • For $a$: you could plot an exponential function $R \left(1-\exp\left(\frac{-t}{a}\right)\right)$ and try different values of $a$ to see which one would fit well here.

Task 8.2:

Set up the Jacobian matrix for the non-linear least-squares by completing the function below.

In [11]:
def jacobian(X, d):
    """Compute Jacobian of the functional model.

    Functional model, q: ground surface displacement.

    Inputs:
      X: tuple, list or array of parameters
         (d, R, a, k)
      d: dictionary of model parameters

    Outputs: The Jacobian
             (partial derivatives w.r.t. parameters)

    Note: "times" is not included as an argument because
    the model is only configured to compute the functional
    model at times where the groundwater measurements have
    been interpolated (i.e., the times of the satellite
    observations). To use this model for prediction, the
    interpolation function must be incorporated in the
    dictionary and/or the function. 
    """
    
    # Hint: use d['days'] and d['groundwater'] for
    #       the deterministic parameters


    # J1 = 
    # J2 = 
    # J3 = 
    # J4 = 
    # j = YOUR_CODE_HERE

    # SOLUTION
    J1 = np.ones(len(d['days']))
    J2 = 1 - np.exp(-d['days']/X[2])
    J3 = -X[1]*d['days']/X[2]**2 * np.exp(-d['days']/X[2])
    J4 = np.ones(len(d['days']))*d['groundwater']
    J = np.column_stack((J1, J2, J3, J4))
    
    return J

Task 8.2:

Confirm that the Jacobian function works properly by using it with the initial parameter values established above. If the function is called successfully, check the result by printing the first few rows of the Jacobian matrix and checking the dimensions.

Tip: it would be useful for the Report if you also printed the dimensions and/or redundancy of the model.

In [12]:
# YOUR_CODE_HERE

# SOLUTION:
test_J = jacobian((d_i, R_i, a_i, k_i), m1)

print ('The first 5 rows of the Jacobian matrix (INSAR):')
print (test_J[0:5,:])

n_2 = np.shape(test_J)[1]
print(f'\nThe number of unknowns is {n_2}')
print(f'The redundancy (InSAR) is {m1["y"].shape[0] - n_2}')
print(f'The redundancy (GNSS) is {m2["y"].shape[0] - n_2}')
The first 5 rows of the Jacobian matrix (INSAR):
[[ 1.000e+00  0.000e+00  0.000e+00 -1.097e+02]
 [ 1.000e+00  3.921e-02  3.203e-03 -1.067e+02]
 [ 1.000e+00  7.688e-02  6.154e-03 -1.038e+02]
 [ 1.000e+00  1.131e-01  8.869e-03 -1.065e+02]
 [ 1.000e+00  1.479e-01  1.136e-02 -1.173e+02]]

The number of unknowns is 4
The redundancy (InSAR) is 57
The redundancy (GNSS) is 726

9. Set-up Gauss-Newton iteration algorithm¶

Task 9:

Set up a Gauss-Newton iteration algorithm by completing the function below.

Important Note about Implmentation

The iteration scheme reuses the function BLUE() from last week, which has been included in the files functions.py (imported at the top of this notebook). During the iteration scheme, specific values of the dictionary are "repurposed" to create a linear system of equations. This facilitates the use of BLUE, which otherwise can only solve for linear functional models.

After the iteration is completed successfully, the function should reset the "repurposed" values in the dictionary to their true value (i.e., values representing the non-linear system).

In [13]:
def gauss_newton_iteration(X0, d):
    """Use Gauss-Newton iteration to find non-linear parameters.
    
    Inputs:
      X0: initial guess for the parameters (d, R, a, k)
      d: dictionary of model parameters

    Outputs: dictionary with the non-linear model results.
    """

    x_norm = 1000 # initialize stop criteria

    x_hat_i = np.zeros((50, 4))
    x_hat_i[0,:] = X0
    iteration = 0

    # Define observations as an array
    # (we will overwrite it temporarily)
    y_obs = d['y']

    while x_norm >= 1e-12 and iteration < 49:

        # y_i = YOUR_CODE_HERE
        # dy = YOUR_CODE_HERE
        # J = YOUR_CODE_HERE

        # SOLUTION
        y_i = functional_model(x_hat_i[iteration, :], d)
        dy = y_obs - y_i
        J = jacobian(x_hat_i[iteration, :], d)

        # d[YOUR_CODE_HERE] = YOUR_CODE_HERE
        # d[YOUR_CODE_HERE] = YOUR_CODE_HERE
        # Hints for previous line:
        #   - re-use your function BLUE
        #   - you will need to repurpose two dictionary
        #     keys to utilize the solution scheme of BLUE
        #     that can solve linear equations

        # SOLUTION
        d['y'] = dy
        d['A'] = J

        d = BLUE(d)
        
        # x_hat_i[iteration+1,:] = YOUR_CODE_HERE
        # Hints for previous line:
        #   - now repurpose a result stored in the dictionary

        # SOLUTION
        x_hat_i[iteration+1,:] = x_hat_i[iteration,:] + d['x_hat'].T

        # x_norm = YOUR_CODE_HERE
        
        # SOLUTION
        x_norm = d['x_hat'].T @ np.linalg.inv(d['Sigma_X_hat']) @ d['x_hat']

        # Update the iteration number
        iteration += 1

        if iteration==49:
            print("Number of iterations too large, check initial values.")

    # # Store general results from the iterative process
    # d['x_hat_all_iterations'] = YOUR_CODE_HERE
    # d['iterations_completed'] = YOUR_CODE_HERE

    # # Store the linear values and "Reset" the non-linear ones
    # # Two sets of values correspond to Y and X
    # d['dy'] = YOUR_CODE_HERE
    # d['y'] = YOUR_CODE_HERE
    
    # d['dx'] = YOUR_CODE_HERE
    # d['x_hat'] = YOUR_CODE_HERE

    # SOLUTION
    # Store general results from the iterative process
    d['x_hat_all_iterations'] = x_hat_i[0:iteration+1, :]
    d['iterations_completed'] = iteration

    # Store the linear values and "Reset" the non-linear ones
    # Two sets of values correspond to Y and X
    d['dy'] = d['y']
    d['y'] = y_obs
    
    d['dx'] = d['x_hat']
    d['x_hat'] = d['x_hat_all_iterations'][iteration,:]
    
    return d

Write your answer in this Markdown cell.

Solution:

We will use the 'weighted squared norm' of $\Delta \hat{\mathrm{x}}_{[i]}$, with the inverse covariance matrix $\Sigma_{\hat{X}}^{-1}$ as the weight matrix. In this way we account for different precisions of the parameters (high precision means we want the deviation to be smaller), as well as different order of magnitudes of the parameters.

10. Apply Gauss-Newton iteration¶

Task 10.1:

Apply Gauss-Newton iteration on your model (run the code you completed above).

For each unknown parameter, plot your estimates versus the iteration number (horizontal axis: iteration number, vertical axis: your estimate per iteration).

For this Task, your output should include 8 plots at minimum (presenting additional values is also fine if it is relevant to your interpretation answers below).

In [14]:
m1 = gauss_newton_iteration(initial_guess, m1)
m2 = gauss_newton_iteration(initial_guess, m2)

print('\n InSAR Reults for each iteration (#Interations =',
      m1['iterations_completed'], ')')
print(m1['x_hat_all_iterations'])

print('\n GNSS Reults for each iteration (#Interations =',
      m2['iterations_completed'], ')')
print(m2['x_hat_all_iterations'])
 InSAR Reults for each iteration (#Interations = 7 )
[[ 9.000e+00 -2.500e+01  3.000e+02  1.500e-01]
 [ 1.273e+01 -1.982e+01  1.407e+02  1.726e-01]
 [ 1.296e+01 -2.162e+01  1.768e+02  1.719e-01]
 [ 1.297e+01 -2.192e+01  1.793e+02  1.713e-01]
 [ 1.297e+01 -2.192e+01  1.794e+02  1.713e-01]
 [ 1.297e+01 -2.192e+01  1.794e+02  1.713e-01]
 [ 1.297e+01 -2.192e+01  1.794e+02  1.713e-01]
 [ 1.297e+01 -2.192e+01  1.794e+02  1.713e-01]]

 GNSS Reults for each iteration (#Interations = 10 )
[[ 9.000e+00 -2.500e+01  3.000e+02  1.500e-01]
 [ 3.747e+00 -1.777e+01  2.435e+02  1.425e-01]
 [ 3.895e+00 -1.812e+01  2.269e+02  1.418e-01]
 [ 3.939e+00 -1.815e+01  2.247e+02  1.416e-01]
 [ 3.945e+00 -1.815e+01  2.242e+02  1.415e-01]
 [ 3.946e+00 -1.815e+01  2.241e+02  1.415e-01]
 [ 3.946e+00 -1.815e+01  2.241e+02  1.415e-01]
 [ 3.946e+00 -1.815e+01  2.241e+02  1.415e-01]
 [ 3.946e+00 -1.815e+01  2.241e+02  1.415e-01]
 [ 3.946e+00 -1.815e+01  2.241e+02  1.415e-01]
 [ 3.946e+00 -1.815e+01  2.241e+02  1.415e-01]]

Tip

Remember we stored all iterations in the dictionary as an ndarray with key x_hat_all_iterations. To slice an ndarray inside a dictionary you can use the following syntax: my_dict['key_to_ndarray'][1, :], which would return the second row in the ndarray.

In [15]:
# def plot_fit_iteration(d):
#     """Plot value of each parameter, each iteration."""
#     plt.figure(figsize = (15,4))
#     plt.subplots_adjust(top = 2)

#     plt.subplot(2,2,1)
#     YOUR_CODE_HERE
    
#     plt.subplot(2,2,2)
#     YOUR_CODE_HERE
    
#     plt.subplot(2,2,3)
#     YOUR_CODE_HERE
    
#     plt.subplot(2,2,4)
#     YOUR_CODE_HERE
    
def plot_fit_iteration(d):
    """Plot value of each parameter, each iteration."""
    plt.figure(figsize = (15,4))
    plt.subplots_adjust(top = 2)

    plt.subplot(2,2,1)
    plt.plot(d['x_hat_all_iterations'][:,0], linewidth=4)
    plt.title('Estimated offset')
    plt.ylabel('Offset [mm]')
    plt.xlabel('Number of iterations [-]')

    plt.subplot(2,2,2)
    plt.plot(d['x_hat_all_iterations'][:,1], linewidth=4)
    plt.title('Estimated R value')
    plt.ylabel('Estimated R value [mm]')
    plt.xlabel('Number of iterations [-]')

    plt.subplot(2,2,3)
    plt.plot(d['x_hat_all_iterations'][:,2], linewidth=4)
    plt.title('Estimated $a$ value')
    plt.ylabel('a value [days]')
    plt.xlabel('Number of iterations [-]')

    plt.subplot(2,2,4)
    plt.plot(d['x_hat_all_iterations'][:,3], linewidth=4)
    plt.title('Estimated GW factor')
    plt.ylabel('Estimated GW factor [-]')
    plt.xlabel('Number of iterations [-]')
In [16]:
plot_fit_iteration(m1)
No description has been provided for this image
In [17]:
plot_fit_iteration(m2)
No description has been provided for this image

Task 10.2:

  • Does your iteration converge? If not, find out why and provide an explanation.

  • After how many iterations does it converge?

Write your answer in this Markdown cell.

$\textbf{Solution}$

It converged! With InSAR only 5 iterations, with GNSS 8 iterations. This might be due to the difference in precision and number of observations.

11. Assess the precision of the estimates¶

Task 11:

What is the quality of the final estimates?

Provide the full covariance matrix of your estimates, and give an interpretation of the numbers in the covariance matrix.

In [18]:
def show_std(Sigma_Xhat, data_type):
    print ('The standard deviation for',
           data_type + '-offset is',
           np.round(np.sqrt(Sigma_Xhat[0,0]),2), 'mm')
    print ('The standard deviation for',
           data_type + '-R is',
           np.round(np.sqrt(Sigma_Xhat[1,1]),2), 'mm')
    print ('The standard deviation for',
           data_type + '-a is',
           np.round(np.sqrt(Sigma_Xhat[2,2]),2), 'days')
    print ('The standard deviation for',
           data_type + '-the ground water factor',
           np.round(np.sqrt(Sigma_Xhat[3,3]),3), '[-]')
In [19]:
print ('Covariance matrix of estimated parameters (InSAR):')
print (m1['Sigma_X_hat'], '\n')
show_std(m1['Sigma_X_hat'], 'InSAR')
print ('\nCovariance matrix of estimated parameters (GNSS):')
print (m2['Sigma_X_hat'], '\n')
show_std(m2['Sigma_X_hat'], 'GNSS')
Covariance matrix of estimated parameters (InSAR):
[[ 4.771e+00 -5.522e-01 -7.605e+00  3.183e-02]
 [-5.522e-01  1.017e+00  4.726e+00  2.657e-03]
 [-7.605e+00  4.726e+00  4.461e+02  6.328e-02]
 [ 3.183e-02  2.657e-03  6.328e-02  2.771e-04]] 

The standard deviation for InSAR-offset is 2.18 mm
The standard deviation for InSAR-R is 1.01 mm
The standard deviation for InSAR-a is 21.12 days
The standard deviation for InSAR-the ground water factor 0.017 [-]

Covariance matrix of estimated parameters (GNSS):
[[ 2.306e+01 -2.474e+00 -7.557e+01  1.514e-01]
 [-2.474e+00  4.674e+00  2.167e+00  6.718e-03]
 [-7.557e+01  2.167e+00  6.317e+03  4.459e-01]
 [ 1.514e-01  6.718e-03  4.459e-01  1.289e-03]] 

The standard deviation for GNSS-offset is 4.8 mm
The standard deviation for GNSS-R is 2.16 mm
The standard deviation for GNSS-a is 79.48 days
The standard deviation for GNSS-the ground water factor 0.036 [-]

Write your answer in this Markdown cell.

$\textbf{Solution}$

The precision of the estimated offset $\hat{d}_0$ and $\hat{k}$ is approximately 2 mm and 0.02, respectively. With GNSS the of those parameters is a approximately a factor 2 worse due to the higher noise in the data. However, due to the high number of observations, the precision is still rather good. Note that the outliers do not have an impact on the precision, since the covariance matrix does not depend on the data (and outliers are non-random errors that are not accounted for in the covariance matrix). Precision of parameters is good if compared to estimated values, except for $a$ and for GNSS also $R$ is less precise (std of 35 days, while estimated value is -15 days). This can only be explained by the higher noise in GNSS data, such that the exponential signal 'drowns' in the noise.

12. Present the estimation results¶

Task 12.1:

Compute the modeled displacements ($\hat{\mathrm{y}}$), and corresponsing residuals ($\hat{\mathrm{\epsilon}}$). Visualize the results in two graphs and add the confidence bounds ($t$-versus-displacement and $t$-versus-residuals).

Also create a histogram of the residuals where you plot the normal distribution (which you can estimate from the histogram) as well and report the mean and sigma of the residuals.

In [20]:
m1['y_hat'] = functional_model(m1['x_hat'], m1)
m1 = get_CI(m1, 0.04)
plot_model(m1)
plot_residual(m1);
No description has been provided for this image
No description has been provided for this image
In [21]:
m2['y_hat'] = functional_model(m2['x_hat'], m2)
m2 = get_CI(m2, 0.04)
plot_model(m2)
plot_residual(m2);
No description has been provided for this image
No description has been provided for this image

Task 12.2:

Answer the following questions:

  1. Do you see any systematic effect?
  2. Give your interpretation for any discrepancy between observations and the model?
  3. What is the mean value of the residuals and what does this value tells you?
  4. And what is the empirical standard deviation of the residuals? Do you recognise this value?

Write your answer in this Markdown cell.

$\textbf{Solution}$

The InSAR discrepancies look as expected, no systematic effects: the residuals have a zero-mean and standard deviation close to the precision of the observables. This tells us that the precision that we used for the observations corresponds to the true noise level.

For GNSS there are still many residuals at the start of the observation period which are outside the confidence bounds, also resulting in slightly larger empirical standard deviation of the residuals. The effect of the outliers is also visible when comparing the fitted model with GNSS and InSAR : for GNSS it is pulled 'downwards'.

13. Strategies to improve?¶

Task 13:

In order to get a better fit to the data (smaller residuals) for this case study, which of the following strategies could help? (elaborate on your answer)

  1. better observations?
  2. a more complicated geophysical model?
  3. better initial values?
  4. more observations?

Write your answer in this Markdown cell.

$\textbf{Solution}$

  1. better observations will help, and should result in smaller residuals.
  2. a more complicated geophysical model will help if it is able to capture the signal. However, since we don't see any systematic effects in the InSAR residuals, it is not expected that much gain can be expected. Including more parameters in the model will help to get smaller residuals, but is there still a geophysical explanation...?
  3. better initial values will not help, since solution converged to valid results.
  4. more observations generally helps, as long as they are not corrupted by outliers or systematic effects.

14. Apply hypothesis test¶

In the assignment we used two different models:

  • A linear model
  • A model with linear and power components

Now we are going to test which model fits the data better. We will do this with the Generalized Likelihood Ratio (GLR) test for both the GNSS and InSAR observations

Task 14:

The critical value is computed with level of significance $\alpha=0.005$ the test statistics is computed in the code below.

Answer the following questions:

  1. What is the null hypothesis $H_0$ and alternative hypothesis $H_a$ in this test?
  2. How can you compute the test statistic? What is its distribution?

In [22]:
q = 1 #Degree of freedom
alpha = 0.005 

# Critical value
k = chi2.ppf(1 - alpha, df=q)
print(f'The critical value is {np.round(k, 3)}')
The critical value is 7.879
In [23]:
t1_insar = m1_blue['e_hat'].T @ np.linalg.inv(m1_blue['Sigma_Y']) @ m1_blue['e_hat']
t2_insar = m1['e_hat'].T @ np.linalg.inv(m1['Sigma_Y']) @ m1['e_hat']
t_insar = t1_insar - t2_insar
print(f'The test statistic for InSAR data is {np.round(t_insar, 3)}')
The test statistic for InSAR data is 95.616
In [24]:
t1_gnss = m2_blue['e_hat'].T @ np.linalg.inv(m2_blue['Sigma_Y']) @ m2_blue['e_hat']
t2_gnss = m2['e_hat'].T @ np.linalg.inv(m2['Sigma_Y']) @ m2['e_hat']
t_gnss = t1_gnss - t2_gnss
print(f'The test statistic for GNSS data is {np.round(t_gnss, 3)}')
The test statistic for GNSS data is 7.788

Write your answer in this Markdown cell.

$\textbf{Solution}$

  1. The null hypothesis is that we assume that the linear model is correct, the alternative hypothesis that the model is incorrect.
  2. The test statistic is the difference of the weighted squared norms of residuals, and has a Chi-squared distribution with 1 degree of freedom, since there is 1 extra parameter in the alternative hypothesis model as compared to the null hypothesis.

15. Interpretation of test outcomes¶

Task 15:

Answer the following questions:

  1. What are the test statistic values for the two models (linear model and exponential model) for both the InSAR and the GNSS observations?
  2. What is to be concluded based on the individual test outcomes (i.e., for GNSS and InSAR)?
  3. Please compare the different test outcomes with InSAR and GNSS and discuss/explain them here.

Write your answer in this Markdown cell.

$\textbf{Solution}$

  • For InSAR the the test statistic is 115.5 which is significantly larger than the critical value. Therefore the exponential model is accepted in favour of the linear one.

  • For GNSS the outcomes is different: test statistic is equal to 6.4, which is smaller than the critical value, resulting in acceptance of the null hypothesis (i.e., linear model). The reason is that the GNSS data is much noisier and contains many outliers, such that an exponential trend cannot be distinguished.

16. How to deal with 2 datasets?¶

Data acquisition and processing comes with a price. Note that in a real situation you would not look at a time series of only one point. For Sentinel-1 data you may have to pay, collecting GNSS data at different locations also costs money.

Task 16: How will you monitor the deformation if you have both GNSS and InSAR data at your disposal?

Write your answer in this Markdown cell.

Solution:

  • Use the observations together (i.e., estimate the unknown parameters using the GNSS and InSAR observations at the same time, which would result in 791 observations). With BLUE we would of course apply proper weights, taking into account the different precisions.
  • Disclaimer: the outliers in the GNSS data were added manually, and do not necessarily represent reality, which means that you cannot conclude from this assignment that InSAR is better than GNSS. Without the outliers, GNSS would have given you different results and then the larger noise would be compensated by the higher sampling rate.

    End of notebook.

    Creative Commons License TU Delft MUDE

    © Copyright 2024 MUDE TU Delft. This work is licensed under a CC BY 4.0 License.