Assignment Frequently Asked Questions FAQ

Assignment 1

Q: the assignment says that "Then you can modify the visualize() function to draw the linear decision boundary". Doesn't visualize() draw the decision boundary as it is, i.e. without modifications?

A: The visualize() function draws the decision boundary using the Monte Carlo method. That is, the decision boundary is shown by drawing a large number of points that evenly covers the entire region. You are asked to modify the visualize() function or implement a new function to draw the decision boundary by its equation, not the Monte Carlo method. Your decision boundary should be a straight line in the figure, which coincide with the boundary given by Monte Carlo method (i.e., the provided visualize() function).

Q: Where are the linear coefficients in the table coming from (e.g. bias = -0.07). We get different numbers from the code you posted (bias = +0.5).

A: This is an example of when the TA runs this sample code of linear regression. Due to the random initialization, if you re-run the sample linear regression code, you will of course get a different result, different coefficients, different accuracy, different equation of decision boundary.

Q: About decision boundary.

A: Please note that the boundary of only logistic regression and linear neural network are required, it is impossible to come out a linear equation to represent the decision boundary of a deep neural network.

This question is designed to help you understand the essence of classification problems. A suggestion to compute the boundaries of both of these two models is: try to write out a linear equation, then the decision boundary is the line that can classify between two classes. Such as: in linear regression, you may write a linear equation f(x) = w_1 x_1 + w_2 x_2 + b. Then the threshold to classify is 0.5, thus f(x) = 0.5 is the line to classify, so is the decision boundary. Almost the same way can come out with the decision boundary of logistic regression. For a linear neural network, you should first understand that a linear NN actually computes a linear function even if it has multiple layers, then compute that linear function.

Here Prof. Schulte gives a reference, which is very helpful in showing how to extract a single linear model from a multi-layer linear network (Theorem 6).

Q: About the required accuracy.

A: The required accuracies, 72% and 100%, are the accuracy to get full marks. So actually you are not encouraged to spend lots of time on re-training again and again without any improvement methods. The grading is not only based on accuracy. If your model does not reach this required accuracy, you might be deducted a few points, but if the other parts of your assignment are done well, you will still get a high score. So don't waste your time retraining the model, again and again, just relying on random initialization can help you find a high-accuracy model.

When you encounter this situation, there are some suggestions:

Try to analyze how it happens. First, make sure there are no errors in your model. Then, you can visualize the loss value during training, and the decision boundary of your trained model, etc.
Look for ways to improve. There are many training tricks you can use in deep learning. For this project, you can try: different initialization methods, different learning rates (visualize loss function to help you find a suitable learning rate), different optimization methods, and early stopping.

Q: About logistic regression and gradient descent. The parameters in logistic regression and the error function are fixed, so how can I improve my accuracy? A: you can choose hyperparameters for gradient descent to improve the learning.

Assignment 2

Q: Project 1.(b) improve your LENET5

A: In this subquestion, you can use any approaches you want to improve the performance of your model. Here are some examples.

Tune Hyperparamters such as learning rate and regularization weights, epoch number,...
Adding Drop out
Try different optimizers to improve training performance and convergence
Adding new Conv layers.
Data augmentation based on the original dataset
transfer learning or fine-tuning

But in (a) your implemented model should be a LENET-5. And you should explain any changes you made to get a high accuracy.

Q: How can I accelerate the training process to reduce the time required for model training?

Consider running your models for a few epochs, such as 5 or 10, to obtain initial insights into the training process and identify good hyperparameters.
Be aware that when working on Google colab, the first epoch may take much longer than subsequent epochs due to image loading, so it's normal to experience a delay during initial training. For example, with a T4 GPU on google collaborator, the first run through 20 epochs took 39 minutes. The second run through 20 epochs took 10 minutes.

Assignment 3

Q: Implement a RNN?

A: In Keras, you can call 'SimpleRNN'. In Pytorch, you can use torch.nn.RNN . It appears that torch.nn.RNN does not give you the option of a sigmoid activation function (see feature request). You can use hyperbolic tangent (tanh) instead.

Q: Out of memory problem when implementing project Q1?

A: Some students report that the out-of-memory problem occurs when implementing the Q1. The 'English Literature' file contains about 14,000 words. If you set the unk frequency threshold to be 1, there may be about 7,400 words left. If you set the unk threshold to be 2, there may be about 5,000 words left. In general, this amount of data is not too large. However, you can try to set a larger unk threshold if an OOM problem occurs.

Q: In Project part 2 it says (So the mini-batch size varies for the sentences with different lengths).

A: You maybe confused with this description. This means "The memory size of the mini-batch size varies for the sentences with different lengths?". You can directly ignore this sentence when implementing part 2.

Q: Project Part 2 mini-batch.

A: We ask you to deal with 1 sentence per mini-batch. However, due to different equipment limitation, if it takes long time for you to train, you are allowed to do several sentences in 1 mini-batch or use a >50% subset.

Q: Project Part 4, the 80% validation accuracy.

A: (Hint) Pay attention to your data preprocessing. The 80% accuracy is not difficult to achieve. The key idea of part 4 (and maybe for other questions) is the data preprocessing, for example, the stop-word removing, the punctuation, the uppercase and lowercase, etc.

Updated Wed March 13 2024, 22:54 by oschulte.

Simon Fraser University
Engaging the World

CourSys

Assignment Frequently Asked Questions FAQ

Assignment 1

Assignment 2

Assignment 3