MLT | OPPE | Sample Questions

MLT | OPPE | Sample Questions#

Instructions#

The duration of the exam is \(2\) hours.
There are \(17\) questions in this OPPE.
\(16\) of them are NAT and \(1\) is MCQ. The MCQ question is question-(5).
You have to read the question in colab, enter the solution code in the colab and then enter the answer in the portal.
For most cells, the data is given in a cell called the DATA CELL. You have to run the data cell first before running the solution cell. Do not edit the data-cell at any cost.
After completing the exam you will have to do two things:
- Click the submit button on the portal. If you do not submit on the portal, you will get zero marks.
- Upload the colab as a .ipynb file using the form that is provided. If you do not upload the file, you will get zero marks.
Make sure that you run all the cells before the current cell you are working with. Then run the current cell. This can be done using Ctr + F8. Just running the current cell repeatedly might cause a problem. Ctr + F8 runs all the cells starting from the first one in sequence until the current cell. If this doesn’t work for you, click on Runtime in the toolbar and click Run before.
Note that some questions have random numbers generated with specific seed values. So, it is important that you run the cells in the sequence in which they are presented. For such questions, you will find the following message at the end of the cell:

# RUN THE DATA CELL BEFORE RUNNING THE SOLUTION CELL

# RUN THIS CELL WITHOUT FAIL
# ONLY THEN PROCEED TO THE QUESTIONS
import numpy as np
import matplotlib.pyplot as plt

Question-1#

The matrix M is of shape (m, n). Enter \(3m - 2n\) as your answer.

The variable M is defined in the cell given below.

# DATA CELL
# DO NOT EDIT THIS CELL
rng = np.random.default_rng(seed = 1001)
m, k, n = rng.integers(100, 1000, 3)
A = rng.integers(0, 5, (m, k))
B = rng.integers(0, 5, (k, n))
M = A @ B

# SOLUTION
# RUN THE DATA CELL BEFORE RUNNING THE SOLUTION CELL

Question-2#

Matrix M is of shape (n, n). Find the dot product of the \(230^{th}\) row of \(M\) and the \(158^{th}\) column of \(M\). Your answer should be an integer.

NOTE: A note regarding the terminology. While talking about rows and columns, we are counting from one and not zero. For example, consider the matrix \(M\):

\[\begin{split} M = \begin{bmatrix} 1 & 2 & 3\\ 4 & 5 & 6\\ 7 & 8 & 9 \end{bmatrix} \end{split}\]

The first row of \(M\) is \([1, 2, 3]\). The second column of \(M\) is \([2, 5, 8]\).

The variable M is defined in the cell given below.

# DATA CELL
# DO NOT EDIT THIS CELL
rng = np.random.default_rng(seed = 1001)
n = rng.integers(100, 300)
M = rng.integers(0, 5, (n, n))

# SOLUTION
# RUN THE DATA CELL BEFORE RUNNING THE SOLUTION CELL

Question-3#

Solve for \(x\) in the following equation using matplotlib:

\[ x \sin x = e^x, \quad -1 \leq x \leq 0 \]

There is exactly one solution for the given range of \(x\). Enter your answer correct to three decimal places.

# Solution

Question-4#

A perceptron model has weight vector w of shape (d, ). Find the label of a test data-point x_test. Recall that labels lie in \(\{-1, 1\}\).

The variables w and x_test are defined in the cell given below.

# DATA CELL
# DO NOT EDIT THIS CELL
rng = np.random.default_rng(seed = 1003)
d = rng.integers(100, 200)
w = rng.uniform(0, 10, d)
x_test = rng.uniform(-10, 10, d)

# SOLUTION
# RUN THE DATA CELL BEFORE RUNNING THE SOLUTION CELL

Question-5#

(X_1, y_1) and (X_2, y_2) are two datasets. The data matrix for each dataset is of shape (d, n). Each label vector has shape (n, ). Use a perceptron to determine if the datasets are linearly separable or not.

NOTE: If the dataset is linearly separable, your code should terminate within \(20\) complete passes (epochs) over the entire dataset. If it is taking more than that, it could mean one of two things:

your code is wrong
the dataset is not linearly separable

The variables X_1, y_1, X_2 and y_2 are defined in the cell given below.

# DATA CELL
# DO NOT EDIT THIS CELL
# Dataset-1
X_1 = np.array([[-1, -1,  1,  2, -2, -1,  1, -2,  2, -2],
                [ 1,  0,  1, -2,  0,  1,  2,  2,  1,  0],
                [ 2,  2, -2,  0,  2,  2, -2,  2, -2,  2],
                [-1,  2,  2, -1,  0,  0, -1,  2,  2,  0]])
y_1 = np.array([ 1,  1, -1, -1,  1,  1, -1,  1, -1,  1])
# Dataset-2
X_2 = np.array([[-1, -2, -2, -2,  0,  0,  0,  2,  0,  2],
                [-2, -1, -1, -1,  1,  0, -1, -2, -2,  1],
                [ 0,  0,  1, -1,  2, -1,  1, -1, -1, -1],
                [-2, -2,  1,  1, -2,  0,  0,  1,  0, -1]])
y_2 = np.array([-1, -1,  1, -1,  1, -1,  1,  1, 1, -1])

Common Data for questions (6) to (9)#

Consider a dataset X of shape (d, n) for a clustering problem with three clusters. The initial cluster centers are given as c_1, c_2 and c_3. Each center is of shape (d, ). Run K-Means algorithm on this dataset until convergence. Follow the algorithm given below:

Initialize centers to c_1, c_2, c_3
Until convergence
- Compute cluster membership
- Recompute cluster centers

x_test a test point of shape (d, ). Let c_1_final, c_2_final, c_3_final be the cluster centers after convergence.

NOTE: The variables X, c_1, c_2, c_3 and x_test are all given to you in the code cell given below.

# DATA CELL
# DO NOT EDIT THIS CELL
rng = np.random.default_rng(seed = 1009)
n = rng.integers(24, 25)
d = rng.integers(2, 3)
cen = rng.integers(-10, 10, (3, d))
cov = np.eye(d)
X = np.zeros((d, n))
for i in range(3):
    X[:, n // 3 * i: n // 3 * (i + 1)] = rng.multivariate_normal(cen[i], cov, n // 3).T
x_test = rng.integers(-2, 4, (d, ))
c_1, c_2, c_3 = (cen[0] + rng.uniform(-2, 2, (d, )),
                 cen[1] + rng.uniform(-2, 2, (d, )),
                 cen[2] + rng.uniform(-2, 2, (d, )))

# SOLUTION
# RUN THE DATA CELL BEFORE RUNNING THE SOLUTION CELL

Question-6#

Enter the Euclidean norm of c_1_final as your answer correct to two decimal places. You can use np.linalg.norm(c_1_final).

# SOLUTION
# RUN THE DATA CELL BEFORE RUNNING THE SOLUTION CELL

Question-7#

Enter the Euclidean norm of c_2_final as your answer correct to two decimal places. You can use np.linalg.norm(c_2_final).

# SOLUTION
# RUN THE DATA CELL BEFORE RUNNING THE SOLUTION CELL

Question-8#

Enter the Euclidean norm of c_3_final as your answer correct to two decimal places. You can use np.linalg.norm(c_3_final).

# SOLUTION
# RUN THE DATA CELL BEFORE RUNNING THE SOLUTION CELL

Question-9#

To which cluster does the point x_test belong? Enter 1, 2 or 3 as your answer.

# SOLUTION
# RUN THE DATA CELL BEFORE RUNNING THE SOLUTION CELL

Common Data for questions (10) and (11)#

Consider a hard-margin SVM trained on a linearly separable dataset \((\mathbf{X}, \mathbf{y})\). The optimal weight vector obtaind is \(\mathbf{w}^{*}\):

\[\begin{split} \begin{aligned} \mathbf{X} &= \begin{bmatrix} 2 & 2 & 0 & 1 & 1 & 0 & -1 & -2 & -3\\ 0 & -1 & -1 & 1 & 3 & 1 & -1 & 0 & 1 \end{bmatrix}\\\\ \mathbf{y} &= \begin{bmatrix} -1 & -1 & -1 & -1 & 1 & 1 & 1 & 1 & 1 \end{bmatrix}^T\\\\ \mathbf{w}^{*} &= \begin{bmatrix} -2 & 1 \end{bmatrix}^T \end{aligned} \end{split}\]

\(\mathbf{X}\) has shape \((d, n)\), \(\mathbf{y}\) has shape \((n, )\) and \(\mathbf{w}^{*}\) has shape \((d, )\).

Question-10#

Find the number of support vectors.

# SOLUTION

Question-11#

Find the label of the test point \(\begin{bmatrix}2 & 1\end{bmatrix}^T\). Your answer should be in the set \(\{1, -1\}\).

# SOLUTION

Common Data for questions (12) to (17)#

Run the following cell to get the training and test dataset. The following variables are used in the cell:

X_train = Training dataset

y_train = label vector corresponding to training dataset

X_test = Test dataset

y_test = label vector corresponding to test dataset

# DATA CELL
# DO NOT EDIT THIS CELL
X_train = np.array([
    [1, 0, 1, -1, 2],
    [2, 1, 0, -1, 2],
    [0, 1, 2, 3, 1]
])
y_train = np.array([1, 2, 0, 3, 1])
X_test = np.array([
    [1, 0, 2, 1, 4],
    [0, 1, 3, 1, 2],
    [-1, 0, 3, 1, -1]
])
y_test = np.array([-1, 0, 2, 1, 2])

Question-12#

If we learn a linear regression model on the training dataset, how many weights need to be learned by the model?

# SOLUTION
# RUN THE DATA CELL BEFORE RUNNING THE SOLUTION CELL

Question-13#

If \(\mathbf{w}\) is the weight vector learnt using the least square linear regression model (normal equation method), what will be euclidean norm of \(\mathbf{w}\)? Enter your answer correct to two decimal places.

# SOLUTION

Question-14#

Find the root mean square error on the training dataset using the model defined in question \(13\).

\[ \text{RMSE} = \sqrt{\dfrac{1}{n}\sum\limits_{i=1}^{n} (y_i- \widehat{y}_i)^2} \]

Enter your answer correct to three decimal places.

# SOLUTION

Question 15#

Find the root mean square error on the test dataset using the model defined in question \(13\). Enter your answer correct to two decimal places.

# SOLUTION

Question 16#

Learn the ridge regression model on the training dataset defined in question \(12\) for \(\lambda = 0.01, 0.1, 1,\) and \(10\). Which value of \(\lambda\) gives the least training error?

# SOLUTION

Question 17#

Learn the ridge regression model on the training dataset defined in question \(12\) for \(\lambda = 0.01, 0.1, 1,\) and \(10\). Which value of \(\lambda\) gives the least test error?

# SOLUTION