Left stairs should have a dark
x
1
x_1
x1β,
x
3
x_3
x3β, and
x
4
x_4
x4β, and a light
x
2
x_2
x2β. So it makes sense to assign negative and positive weights respectively. For simplicity, we choose the following
Now we must choose the bias. In a perfect left stairs image,
x
1
x_1
x1β,
x
3
x_3
x3β, and
x
4
x_4
x4β will be 0, and
x
2
x_2
x2β will be 1, so the weighted sum will be 1. As the image becomes less perfect, the weighted sum decreases.
b
=
β
0.8
b = -0.8
b=β0.8 seems like a reasonable choice as it forces the weighted sum to be close to 1, but allows for a bit of noise.
Each Perceptron (P1 and P2 above) outputs a 0 or a 1. If either model outputs a 1, we want the final model to classify stairs. In other words, we want their sum to be greater than or equal to 1. We can achieve this by setting both weights to 1 and the bias term to -0.5. In other words, this π
b = -0.5 --\ \P1 -----------\ \ w = 1 -------- | _|Β― | ---> -------- P2 ----------/ w = 1
Notice how this perceptron mimmics an or operator.
Our final model looks like this
Food for thought
This example highlights how a multilayer perceptron can emulate conditional logic. This makes it a powerful prediction model.
Once you're convinced that MLP can be a good predictor, the question becomes How do we scale it?
Imagine connecting thousands of perceptrons across dozens of layers. Given some training dataset, how could we possibly find the optimal set of weights and biases?