A. What does the acronym FAIR stand for?
Model answer:
B. What does the adjective "Interoperable" stand for in FAIR data?
Model answer:
C. Is FAIR data Open Data?
Model answer:
D. Can FAIR principles prevent the fabrication of data (i.e., deliberate creation of data to propound a particular hypothesis with greater conviction)?
Model answer:
A. Using the finite difference method, this PDE can be rewritten in the following discretized mathematical formulation:
What kind of approximation of the time derivative is used in the equation above?
Model answer:
B. The same equation can be discretized with different choices for the finite difference approximations. Below a vectorized implementation is given:
Give a mathematical expression for the update of un+1i,j from relevant u values at time step n that is implemented in the code block above. The expected answer is an equation in a similar notation as the equation given in part (a) above.
Model answer:
C. If nothing is added to the code above, what kind of boundary conditions are applied by default? Motivate your answer.
Model answer:
Dirichlet boundary conditions where u is fixed at the initial values, becuase the u at the boundary is always copied from the previous time step and not updated.
Recall the weak form of the Poisson equation is given as:
$$ \int_\Omega \nu \nabla w \cdot \nabla u d\Omega - \int_\Gamma w\nu \nabla u \cdot \mathbf{n} d\Gamma = \int_\Omega w f d \Omega $$Assume a Robin boundary condition is applied to the boundary $\Gamma$:
$$ \alpha u + \nu \nabla u \cdot \mathbf{n} = \beta $$Following the conventional notation, let the $\mathbf{N}$-matrix contain shape functions and the $\mathbf{B}$-matrix contain shape function derivatives. In the discretized form with the finite element method, terms with the coefficients $\nu$,$f$,$\alpha$ and $\beta$ appear. Select the correct form of the following terms.
A. With $\nu$??
Model answer:
Substitute $\nu \nabla u\cdot n=\beta - \alpha u$ into the expression and having $u=\mathbf{Nu}$, $w=\mathbf{Nw}$, $\nabla u=\mathbf{Bu}$, $\nabla w=\mathbf{Bw}$. We can eliminate $\mathbf{w}$ in all terms and move $\mathbf{u}$ out of the integrals: $$\left(\int_\Omega \mathbf{B}^T\nu\mathbf{B}\mathrm{d}\Omega + \int_\Gamma \mathbf{N}^T \alpha\mathbf{N} \mathrm{d}\Gamma\right) \mathbf{u} = \int_\Omega \mathbf{N}^T f \mathrm{d}\Omega + \int_\Gamma \mathbf{N}^T \beta \mathrm{d}\Gamma$$
So the answer is: $\int_{\Omega}\mathbf{B}{^T}\nu \mathbf{B} d\Omega$
B. With α?
Model answer: See model answer in part a, so answer is $\int_\Gamma\mathbf{N}{^T}\alpha \mathbf{N} d\Gamma$
C. For a 4-node quadrilateral (2D) element, what is the size of the B-matrix?
Model answer:
Three cities($C_1,C_2,C_3$) are supplied with water from three different sources ($S_1,S_2,S_3$). The first ($S_1$) is a major reservoir, the other two are local sources ($S_2,S_3$). Sources $S_2$ can supply cities $C_1$ and $C_3$ and $S_1$ and $S_3$ can supply all cities. The cities have a consumption of aminimum of $R_1, R_2, R_3$ respectively. The local sources can only supply a maximum of $Q_2$ and $Q_3$ of water volume. The reservoir can supply a maximum of $Q_1$, but there is a minimum supply of Qmin to be imposed.
Establish the model that allows obtaining the optimum solution for the problem of supplying the cities in the most economical way knowing that the cost of supplying city j from source i is given by cij expressed in monetary units (m.u.) per unit of water volume.
Model answer:
Alternatively, this question may be solved as follows:
Consider the following table of the SIMPLEX method for solving an LP maximization
problem:
Solve the problem
Model answer:
X2 enters the basis and S1 leaves the basis:
In the next table we choose S3 to leave but it could have been S2 as well because they have the same ration of the independent term and the coefficient in the column of the variable that is going to enter the basis.
In the next diagram you can see the solving process of the branch and bound for the minimization of an integer programming problem with two decision variables. The number in the upper right corner
represents the solving order in the tree:
Is the process finished? That is, are there more nodes to be explored? Why?
Model answer:
The process is finished. This is a minimization problem. Three nodes have been explored. Node 2, which was found after the relaxed solution was branched on variable X1, has produced an integer solution x1=5, x2=0. This solution has an objective function value of 60. The next solution (3) X1=4 X2=3.3 is not an integer solution and at the same time, it results in a wost objective function value (65) which means that it is not worth branching the problem on variable X2.
A continuous time signal $x(t)$ is given as:
With $A_1 = 1$, $A_2 = 0.1$, $f_1 = 10$ Hz and $f_2 = 80$ Hz
The signal has been sampled in three experiments, each time using a different sampling frequency $f_s$ and a different measurement duration Tmeas. The frequency domain plots (magnitude spectrum in logarithmic scale, as a result of the DFT) are shown below; the spectrum is double sided, but only shown for positive frequencies and, as commonly done in practice, up to the Nyquist frequency. The values of $X_k$, straight from the fft-implementation, have been divided by N, the number of samples.
Determine, for each experiment, the sampling frequency $f_s$, as well as the measurement duration Tmeas. Only the final numerical answers are asked!
Some useful formulas: Tnyquist = $f_s/2$ , Tmeas = $1/f_0$
A. Experiment A
Model answer:
B. Experiment B
Model answer:
C. Experiment C
Model answer:
A tide gauge station has been installed to measure the hourly sea-level variations relative to a vertical datum (referencesystem). These measurements are usually connected to a stable benchmark next to the tide gauge station (just to show variations with respect to a reference system). Therefore, there is a shift of approximately 3 m (the correct value should be determined) between the mean sea level(MSL) and the bench mark.
Sea level is subject to variations due to many variables like wind speed, pressure, and global warming (sea-levelrise). One of such variations is caused by the forces induced by celestial bodies like the Moon and Sun (main contributors), called tide. The two major tidal constituents are the so-called $M_2$ (semi-lunar constituent) and $S_2$ (semi-solar constituent). Their periods are TM2 =12.4206 hour and TS2=12 hour. We only use these two major constituents, and therefore ignore others.
A high-end tide gauge has been installed to measure the sea-level variations in the vertical direction. Up to now, we have collected 45 days of hourly data(so $m$ =24 ⋅45 =1080 observations). We assume that the measurements have been collected independently with the precision of σ =5 cm (independent and normally distributed). The time series of the measurements y = [y1,...,ym]$^T$ at time instances t =[1,...,m]$^T$ (so t in hour) is as follows:
Based on the information provided above, and the fact that sea-level rise can be neglected here because it cannot be determined by this time series (45 days of data are too short and usually much longer time spans are required), we are interested in the functional and stochastic model $y = Ax + e, D(y)=Qyy$
A. Specify the first and last row of the A-matrix and its dimensions. Also, specify Qyy.
Model answer:
The model of observation equations should include an intercept, a sum of sine and cosine for $M_2$, a sum of sine and cosine for $S_2$, and noise. From this, we find as observation equation $y(t) = y_0 + a_1 cos(2 \pi $fM2$ t) + b_1 sin(2 \pi $fM2$ t) + a_2 cos (2 \pi $fS2$ t) + b_2 sin(2 \pi $fS2$ t) + e_t$. In here, the vector of unknowns is x = [$y_0, a_1, b_1, a_2, b_2$]$^T$ of size n=5. The terms fM2 and fS2 are the tidal constituents M_2 and S_2, respectively.
As a result, we can write the first row of A as the elements of the observation equation for the first time point t=1, so $A_1$ = [$1, cos(2 \pi $fM2$), sin(2 \pi $fM2$), cos(2 \pi $fS2$), sin(2 \pi $fS2$)$] The last row of A is for the last time point t=1080, so A1080 = [$1, cos(2160 \pi $fM2$), sin(2160 \pi $fM2$), cos(2160 \pi $fS2$), sin(2160 \pi $fS2$)$].
Dimension of A is 1080×5.
The covariance matrix is of size m×m=1080×1080 as follows:
Qyy=σ$^2$·$I$1080=25·I1080·$cm^2$=0.0025·$I$1080·$m^2$, with $I$1080 an identity matrix of size 1080.
Assume that the two tidal constituents are not known a-priori, so we want to use spectral analysis technique to identify them. We want to compute the power spectral densities (PSD) using the least-squares harmonic estimation (LS-HE).
B. Sketch a plot of the expected PSD from the data set, where the horizontal axis is the frequency (cycle/hour). Add relevant numerical values on the horizontal axis.
Model answer:
We will have two signals with periods of TM2 = 12.4206 hour and TS2. This leads to the frequencies of fM2 = 1/12.4206 = 0.080511 cycle/hour and fS2 = 1/12 = 0.083333 cycle/hour. Therefore we have two peaks at these two frequencies:
Based on the observations in the past 45 days, we are now going to predict sea levels for the future. The functional part of the prediction comes from the settings of question a. For the stochastic part, we have
computed the normalized Auto-Covariance Function (ACF) and partial ACF (PACF) of the least squares
residuals $\hat e$. They are as follows:
We want to use the available data of the time series to predict the sea levels for the coming day (so 24 hours from $t = m + 1$ to $t = m + 24$). Two functional and stochastic parts can contribute to the prediction ($y_P = y_F + y_S$).
C. For the stochastic part, we need to specify the ARMA(p,q) process. How do you determine appropriate orders for the ARMA(p,q) process? So p,q=? What kinds of parameters $\beta_i 's$ or $\theta_i 's$ of the ARMA process should we estimate?
Model answer:
There is a tail-off in the ACF and a cut-off in the PACF at lag 3 (two non-zero lags). Therefore the best ARMA model is ARMA(2,0), which is indeed AR(2). Therefore, there is no $\theta_i$ to estimate. We just need to estimate $\beta_1$ and $\beta_2$.
D. Based on the results for Question d, write an expression for $y_S$ at the time instance t=m+1, i.e. $y_S(m+1)=$?
Model answer:
The AR(2) process for the time instance t=m+1 is $y_S(m+1) = \beta_1 y_s(m) + \beta_2 y_s(m-1)$. (This is the process as described in the hint of 8c, dropping out all terms except for those containing $\beta_1$ and $\beta_2$.)
A. Assume you have a dataset with N=100 data points and would like to train a linear basis function model with weights w which could potentially be too complex and overfit the data. You then decide to introduce an $L_2$ regularization term λ to the loss function and do a model selection study. Assume the number of basis functions is fixed and you cannot afford to collect more data.
Model answer:
B. A regularization term λ is added to the loss function of a neural network and a model selection study is performed by computing the mean squared error (MSE) over a validation dataset for different values of λ. The results of this study are shown above.
Regarding these results, mark all the options that are TRUE; consider that each wrong answer will result in negative points, but the lowest score for this sub-question is 0 (we will not subtract points from the rest of the exam):
Model answer:
C. Consider the dataset with five data samples {$x_1, x_2, x_3, x_4, x_5$} = {−1.6,−0.2,0,1.6,2.2} shown above. Using the Euclidean distance, whe perform K-means clustering to find the global optimal (minimum objective) when the cluster number K=3. Which single data sample forms one cluster? (Euclidean distance between a and b: d = $\sqrt(a-b)^2$.
Model answer:
D. Consider again the previous dataset. This time K-means clustering with Euclidean distance is used to find the global optimal (minimum objective) for K=2. What are the centroids of the final clusters?
Model answer:
E. We perform principal component analysis on a given dataset. Consider the explained variance ratio with respect to the principal component number shown in the figure. What is the lowest dimension that guarantees a total explained variance ratio of 95%?
Model answer:
You are asked to evaluate the system reliability of a 2m diameter oil pipe line that is currently operating in an earthquake region. The main objective is to evaluate the probability of failure, which in this case is defined as: the annual probability of a leak from the pipeline due to one of three different failure modes caused by an earthquake, $M_i$:
Each failure mode is dependent on whether or not an earthquake occurs, which has an annual probability of occurrence of 10%. For simplicity, consider each failure mode to be mutually exclusive, and that damage can only occur once per year per failure scenario. In other words:
$ P(M_1 \cup M_2 \cup M_3) = P(M_1) + P(M_2) + P(M_3)$
The probabilities of each failure mode have already been assessed and are summarized in the following table:
A. Construct the FD curve (i.e., the FN curve, except with damage in place of fatalities on the x-axis) for leakage of the pipeline segment due to each of the 3 failure modes. Don’t worry about the scale of your plot being precise, so long as the FD values are clearly indicated at each point.
Model answer:
B. Failure of one of the pipeline segments is a function of the horizontal and vertical acceleration, $X_1$ and $X_2$, respectively. The limit state can be described by a function, illustrated in the figure, where the failure region is represented by $\Omega$ . If f$X_1,X_2$($x_1, x_2$) is the multivariate probability distribution of the random variables $X_1$ and $X_2$. Which of the following best defines the probability of failure:
Model answer:
The correct answer is 4. An short explanation for each choice is provided here:
As the pipeline is made up of many individual segments, you would like to perform a system reliability analysis to evaluate the probability of failure for the entire pipeline. Should you consider the pipeline to be a series or parallel system, and what will be the role of dependence between segments on the calculated failure probability for the entire pipeline? (use this information for the next 2 questions)
C. Should you consider the pipeline to be a series or parallel system?
Model answer:
A multi-segment pipeline is a series system, since if one segment fails, the entire pipeline has a leak and no longer functions as designed.
D. What is the quantitative effect of positive dependence between segments on the calculated failure probability? (Choose only one)
Model answer:
Consider 2 events, A and B, using the independent case as a frame of reference, i.e., when $P(A\cap B)=P(A)P(B)$. Positive dependence will cause an increase in the joint ("and") probability, $P(A\cap B)$, which in turns leads to a decrease in the union ("or") probability, $P(A\cup B)=P(A) + P(B) - P(A\cap B)$. This means that the series failure probability would decrease with positive dependence. Thus answer B is correct.
Answer 3. is incorrect because the problem is asking about dependence, why would you assume an independent case?!
Answer 4. is incorrect because: 1) failures in the segments are probably not mutually exclusive (can have multiple leaks), and 2) even if they were mutually exclusive the probability would increase, since this can be considered as an extreme case of negative dependence between elements. Note that this increase only applies to the case of mutually exclusive events being quantified with the limit of ρ = -1; you should not interpret this as meaning 1. is correct!
The annual probability of one or more leaks is $P_1 = 0.2$ and is based on the current operating procedure of inspecting the pipeline once per year (n = 1). However, experience within the pipeline industry indicates the failure probability can be reduced with additional inspections, such that $P_n =P_1/n$. Environmental consequences of a leak have been estimated to be D=100,000 euros, and each inspection costs 1,000 euros. Repair costs are negligible.
E. Find the optimal number of inspections per year, n, that minimizes total annual expected cost due to a pipeline leak.
Model answer:
Total annual expected cost is given by: $ $Ctot$(n) = P_1/n * D + (n-1) * C_I$
Where $C_I$ is the inspection cost. Optimum is found by solving:
$ d$Ctot$(n)/dn = - P_1/n^2 * D + C_l = 0$
$n = \sqrt P_1 D / C_l = 4.47$ ==> 5 inspections
Technically the problem should be formulated with $(n-1) * C_I$ instead of $n * C_I$, but this makes almost no difference when finding the optimum, and is not explicitly stated in the exam question so no points were taken off for this.
Also, to determine whether the number of inspections should be 4 or 5 the best approach would be to compare the total expected costs for both and choose the lower. Rounding up or down does not guarantee that 4 or 5 is the most optimal choice. n=4 is actually better since the total investment is less. No points were taken off for making a proper choice of n when deciding between 4 and 5.
Note: if calculation is done for a long project lifetime, an interest rate r can be assumed and D and $C_I$ should be multiplied by 1/r. Terms cancel, resulting in 5 inspections. It is incorrect to compare the investment to either total risk or change in total risk.