Presented by Thức NC
Can you write traditional rule based program to tell the difference between an orange and an apple?
How many orange & green pixels?
How about these images?
Compress a 3D Swiss Roll onto a new 2D feature subspace
Q1. Suppose you want to develop a supervised machine learning model to predict whether a given email is "spam" or "not spam". Which of the following statements are true?
Q2. Suppose an online shoe store wants to create a supervised ML model that will provide personalized shoe recommendations to users. That is, the model will recommend certain pairs of shoes to Marty and different pairs of shoes to Janet. Which of the following statements are true?
Q3. Suppose you are working on weather prediction, and you would like to predict whether or not it will be raining at 5pm tomorrow. You want to use a learning algorithm for this. Would you treat this as a classification or a regression problem?
Q4. Suppose you are working on stock market prediction, and you would like to predict the price of a particular stock tomorrow (measured in dollars). You want to use a learning algorithm for this. Would you treat this as a classification or a regression problem?
Q5. Some of the problems below are best addressed using a supervised learning algorithm, and the others with an unsupervised learning algorithm. Which of the following would you apply supervised learning to?
Cricket database: Chirps per Minute vs. Temperature. What is the relationship?
The relationship seems to be linear
High loss in the left model; low loss in the right model.
| House size (X) | Price (Y) | X (Min-max Standardized) | Y (Min-max Standardized) |
|---|---|---|---|
| 1,100 | 199,000 | 0.00 | 0.00 |
| 1,400 | 245,000 | 0.22 | 0.22 |
| 1,425 | 319,000 | 0.24 | 0.58 |
| 1,550 | 240,000 | 0.33 | 0.20 |
| 1,600 | 312,000 | 0.37 | 0.55 |
| 1,700 | 279,000 | 0.44 | 0.39 |
| 1,700 | 310,000 | 0.44 | 0.54 |
| 1,875 | 308,000 | 0.57 | 0.53 |
| 2,350 | 405,000 | 0.93 | 1.00 |
| 2,450 | 324,000 | 1.00 | 0.61 |
| X | Y | a | b | Y' | SSE |
|---|---|---|---|---|---|
| 0.00 | 0.00 | 0.45 | 0.75 | 0.45 | 0.101 |
| 0.22 | 0.22 | 0.62 | 0.077 | ||
| 0.24 | 0.58 | 0.63 | 0.001 | ||
| 0.33 | 0.20 | 0.70 | 0.125 | ||
| 0.37 | 0.55 | 0.73 | 0.016 | ||
| 0.44 | 0.39 | 0.78 | 0.078 | ||
| 0.44 | 0.54 | 0.78 | 0.030 | ||
| 0.57 | 0.53 | 0.88 | 0.062 | ||
| 0.93 | 1.00 | 1.14 | 0.010 | ||
| 1.00 | 0.61 | 1.20 | 0.176 | ||
| Total: | 0.677 |
| X | Y | a | b | Y' | SSE=f(a,b) | df/da | df/db |
|---|---|---|---|---|---|---|---|
| 0.00 | 0.00 | 0.45 | 0.75 | 0.45 | 0.101 | 0.45 | 0.00 |
| 0.22 | 0.22 | 0.62 | 0.077 | 0.39 | 0.09 | ||
| 0.24 | 0.58 | 0.63 | 0.001 | 0.05 | 0.01 | ||
| 0.33 | 0.20 | 0.70 | 0.125 | 0.50 | 0.17 | ||
| 0.37 | 0.55 | 0.73 | 0.016 | 0.18 | 0.07 | ||
| 0.44 | 0.39 | 0.78 | 0.078 | 0.39 | 0.18 | ||
| 0.44 | 0.54 | 0.78 | 0.030 | 0.24 | 0.11 | ||
| 0.57 | 0.53 | 0.88 | 0.062 | 0.35 | 0.20 | ||
| 0.93 | 1.00 | 1.14 | 0.010 | 0.14 | 0.13 | ||
| 1.00 | 0.61 | 1.20 | 0.176 | 0.59 | 0.59 | ||
| Total: | 0.677 | 3.30 | 1.55 |
| X | Y | a | b | Y' | SSE=f(a,b) | df/da | df/db | new a | new b | new Y' | new SSE |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.00 | 0.00 | 0.45 | 0.75 | 0.45 | 0.101 | 0.45 | 0.00 | 0.12 | 0.60 | 0.12 | 0.007 |
| 0.22 | 0.22 | 0.62 | 0.077 | 0.39 | 0.09 | 0.25 | 0.000 | ||||
| 0.24 | 0.58 | 0.63 | 0.001 | 0.05 | 0.01 | 0.26 | 0.051 | ||||
| 0.33 | 0.20 | 0.70 | 0.125 | 0.50 | 0.17 | 0.32 | 0.007 | ||||
| 0.37 | 0.55 | 0.73 | 0.016 | 0.18 | 0.07 | 0.34 | 0.022 | ||||
| 0.44 | 0.39 | 0.78 | 0.078 | 0.39 | 0.18 | 0.38 | 0.000 | ||||
| 0.44 | 0.54 | 0.78 | 0.030 | 0.24 | 0.11 | 0.38 | 0.012 | ||||
| 0.57 | 0.53 | 0.88 | 0.062 | 0.35 | 0.20 | 0.46 | 0.002 | ||||
| 0.93 | 1.00 | 1.14 | 0.010 | 0.14 | 0.13 | 0.67 | 0.054 | ||||
| 1.00 | 0.61 | 1.20 | 0.176 | 0.59 | 0.59 | 0.72 | 0.006 | ||||
| Total: | 0.677 | 3.30 | 1.55 | 0.161 |
Q1. Suppose gradient descent is used to try to find the minimum of the function $f(x,y) = 1+x^2+y^2$ starting at the point $(1,1)$. What will the x and y coordinates be after the first step, given the learning rate of $0.5$?
Q2. Suppose gradient descent is used on function $f(x)=x^2-1$. If the gradient descent begins at $x=-1$ and uses the learning rate of $1.0$, in how many steps will it converge to the global minimum $x=0$?