1.0 | Introduction
The perceptron learning algorithm is a binary classification algorithm that learns a linear decision boundary to separate data points belonging to different classes. Given a set of input features
2.0 | History
In 1943 Warren McCulloch and Walter Pitts introduced the first model of an artificial neuron. Their model was primarly theoretical and conceptual, focused on mimicking the behaviour of biological neurons. A McCulloch neuron sums its input signals and applies a step function to determine if it fires (outputs 1) or not (outputs 0).
In 1958, Frank Rosenblatt published a paper titled The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain, which introduced a method for updating weights based on prediction errors. This marked a significant advancement over the McCulloch-Pitts model from 1943, which did not include any mechanism for learning or weight adjustment.
3.0 | Neural Activation
A neuron is activate if it recieves a sufficient amount of stimuli. A sufficient amount of stimuli is set by a threshold
The threshold
The perceptron activation function predicts a
4.0 | The Decision Boundary
Definition: The decision boundary is a hypersurface (in higher dimensions) or a hyperplane (in lower dimensions) that separates the input space into different regions, each associated with a different class label. It represents the point where the output of the perceptron changes from one class to another. This hyperplane is linear because it separates the input space into two regions by a straight line (in 2D), a plane (in 3D), or a hyperplane (in higher dimensions). The perceptron is guaranteed to converge if the data is linearly separable. If not, both its in-sample and out-of-sample performance will suffer.
4.1 | Learning the Decision Boundary
The decision boundary is generated from the edge-weights. The edge-weights are the modifiable variables within the perceptron learning algorithm that are adjusted during the training process to define the decision boundary.
During the training process, the perceptron adjusts the edge-weights based on misclassifications until a suitable decision boundary is found. Note, the perceptron learning algorithm does not utilize gradient descent for updating weights. Instead, it employs a simple update rule based on misclassifications.
4.2 | Weights Update Rule
During training, the perceptron updates its weights whenever it misclassifies a data point. The update rule is as follows:
Where
5.0 | Two Python Examples
In general, one trains a model on a training set and then tests the out-of-sample performance on a test set. This is what I did below, for both a linearly seperable and non-linearaly seperable dataset. The linearly seperable set does not represent anything in reality. The non-linearly seperable dataset contains attributes (features) that are utilized in order to predict whether or not the breastcancer is benign or malignant.
5.1 | Example 1: Linearly Seperable Dataset
def predict_neuron_label(w, x):
return np.sign(np.dot(w, x))
# Weight update rule
def update_weights(w, x, y, learning_rate):
= predict_neuron_label(w, x)
y_hat for i in range(len(w)):
= w[i] + learning_rate * (y - y_hat) * x[i]
w[i] return w
# Train the perceptron
= np.random.rand(3)
w = 0.01
learning_rate for i in range(1000):
for index, row in train.iterrows():
= np.array([1.0, row['mean radius'], row['mean concave points']])
x = row['label']
y = update_weights(w, x, y, learning_rate)
w
# Plot the decision boundary
= np.linspace(5, 30, 100)
x_vals = -w[1] / w[2]
slope = -w[0] / w[2]
intercept = slope * x_vals + intercept
y_vals '-g', label='Decision Boundary', linewidth=2)
plt.plot(x_vals, y_vals, 'mean radius'], train['mean concave points'], c=train['label'], cmap='coolwarm')
plt.scatter(train['mean radius')
plt.xlabel('mean concave points')
plt.ylabel(-0.05, 0.25)
plt.ylim('Training set')
plt.title( plt.show()
As per a trivial visual inspection, it is clear that the accuracy is
Similarily the perceptron manages to perfectly classify all test-sample points correctly. Although this is not a necessity.
5.2 | Example 2: Non-Linearly Seperable Dataset
This is the set which contains features used to predict benign versus malignant case of breast cancer. The dataset needed to be slightly processed. More specifically, the ground-truth labels contained
The decision boundary on the training set is displayed below.
Subsequently I calculated the accuracy of the model, see Python code below.
= 0
correct_predictions = 0
false_predictions for index, row in train.iterrows():
= np.array([1.0, row['mean radius'], row['mean concave points']])
x = row['label']
y = compute_output_linalg(w, x)
y_hat if y_hat != y:
+= 1
false_predictions else:
+= 1
correct_predictions
print('Accuracy: ', correct_predictions / len(train))
Which resulted in an accuracy of 0.884. Recall that since the dataset is not linearly seperable, it is impossible for the perceptron to find a perfect decision boundary. The next figure finds the decision boundary on the test set.
The accuracy on the test set was about 0.868. A almost
6.0 | Final Remarks
The perceptron learning algorithm exemplifies the power and simplicity of early single-layered neural network models. By iteratively adjusting weights based on misclassifications, it effectively learns a linear decision boundary, showcasing its ability to handle basic binary classification tasks. While the algorithm has limitations, especially with non-linearly separable data, its foundational principles have paved the way for more complex and robust machine learning models. For example, by introduced more layers and a non-linear activation function. Nonetheless the perceptron remains an important stepping stone in the evolution of neural networks and continues to be a valuable educational tool in understanding the basics of machine learning.