1.0 | Introduction
The perceptron learning algorithm is a binary classification algorithm that learns a linear decision boundary to separate data points belonging to different classes. Given a set of input features \(\textbf{x}\) and corresponding class labels \(\textbf{y}\), the perceptron algorithm updates the weights iteratively until a suitable decision boundary is found.
2.0 | History
In 1943 Warren McCulloch and Walter Pitts introduced the first model of an artificial neuron. Their model was primarly theoretical and conceptual, focused on mimicking the behaviour of biological neurons. A McCulloch neuron sums its input signals and applies a step function to determine if it fires (outputs 1) or not (outputs 0).
In 1958, Frank Rosenblatt published a paper titled The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain, which introduced a method for updating weights based on prediction errors. This marked a significant advancement over the McCulloch-Pitts model from 1943, which did not include any mechanism for learning or weight adjustment.
3.0 | Neural Activation
A neuron is activate if it recieves a sufficient amount of stimuli. A sufficient amount of stimuli is set by a threshold \(T\). If a neuron recieves T or more amounts of stimuli it activate and subsequently fire a signal.
The threshold \(T\) is being represented by the bias \(b\). In general, it is often represented by another weight and input combination \((w_{0}, x_{0})\), where \(x_{0} = 1\) and \(b = w_{0}\). All inputs and edge weights are real numbers. All source-nodes get multiplied by their respective edge weight and then summed together. Their sum \(\sum_{i=0}^{n} w_i \times x_i\) can be viewed as a dot product between vector weight-vector w and input-vector x, i.e. \(\textbf{w} \cdot \textbf{x}\). The activation function is then applied. In general, there are several plausible activataion functions. For example, the sigmoid, tanh or ReLu. In the context of the perceptron, the applied activation function is the sign function, \(sign(\textbf{w} \cdot \textbf{x})\).
The perceptron activation function predicts a \(-1\) if \(sign(\textbf{w} \cdot \textbf{x}) < 0\) and a \(1\) if \(sign(\textbf{w} \cdot \textbf{x}) \geq 0\).
4.0 | The Decision Boundary
Definition: The decision boundary is a hypersurface (in higher dimensions) or a hyperplane (in lower dimensions) that separates the input space into different regions, each associated with a different class label. It represents the point where the output of the perceptron changes from one class to another. This hyperplane is linear because it separates the input space into two regions by a straight line (in 2D), a plane (in 3D), or a hyperplane (in higher dimensions). The perceptron is guaranteed to converge if the data is linearly separable. If not, both its in-sample and out-of-sample performance will suffer.
4.1 | Learning the Decision Boundary
The decision boundary is generated from the edge-weights. The edge-weights are the modifiable variables within the perceptron learning algorithm that are adjusted during the training process to define the decision boundary.
During the training process, the perceptron adjusts the edge-weights based on misclassifications until a suitable decision boundary is found. Note, the perceptron learning algorithm does not utilize gradient descent for updating weights. Instead, it employs a simple update rule based on misclassifications.
4.2 | Weights Update Rule
During training, the perceptron updates its weights whenever it misclassifies a data point. The update rule is as follows:
\(w_{i} = w_{i} + \eta(y-\hat{y})x_{i}\).
Where \(\eta\) is the learning rate, \(x_i\) each indivdual point, \(y\) the ground truth labels, \(\hat{y}\) the predicted labels and \(w_{i}\) the weights. The weights are initialzed without restriction. Note that the weights are only updated if there is a mismatch between the ground truth labels and the predicted labels. The weights-update-rule is either set to terminate after a set amount of iteration or upon a tolerance threshold is reached. The resulting weights suffice to construct the decision boundary. In 2D, the decision boundary would be defined by a line.
5.0 | Two Python Examples
In general, one trains a model on a training set and then tests the out-of-sample performance on a test set. This is what I did below, for both a linearly seperable and non-linearaly seperable dataset. The linearly seperable set does not represent anything in reality. The non-linearly seperable dataset contains attributes (features) that are utilized in order to predict whether or not the breastcancer is benign or malignant.
5.1 | Example 1: Linearly Seperable Dataset
def predict_neuron_label(w, x):
return np.sign(np.dot(w, x))
# Weight update rule
def update_weights(w, x, y, learning_rate):
= predict_neuron_label(w, x)
y_hat for i in range(len(w)):
= w[i] + learning_rate * (y - y_hat) * x[i]
w[i] return w
# Train the perceptron
= np.random.rand(3)
w = 0.01
learning_rate for i in range(1000):
for index, row in train.iterrows():
= np.array([1.0, row['mean radius'], row['mean concave points']])
x = row['label']
y = update_weights(w, x, y, learning_rate)
w
# Plot the decision boundary
= np.linspace(5, 30, 100)
x_vals = -w[1] / w[2]
slope = -w[0] / w[2]
intercept = slope * x_vals + intercept
y_vals '-g', label='Decision Boundary', linewidth=2)
plt.plot(x_vals, y_vals, 'mean radius'], train['mean concave points'], c=train['label'], cmap='coolwarm')
plt.scatter(train['mean radius')
plt.xlabel('mean concave points')
plt.ylabel(-0.05, 0.25)
plt.ylim('Training set')
plt.title( plt.show()
As per a trivial visual inspection, it is clear that the accuracy is \(100\%\), i.e. the boundary perfectly seperates each point into its class of belonging.
Similarily the perceptron manages to perfectly classify all test-sample points correctly. Although this is not a necessity.
5.2 | Example 2: Non-Linearly Seperable Dataset
This is the set which contains features used to predict benign versus malignant case of breast cancer. The dataset needed to be slightly processed. More specifically, the ground-truth labels contained \(1\)’s and \(0\)’s instead of \(1\)’s and \(-1\)’s. The weight-update rule expects the labels to be either \(1\)’s or \(-1\)’s, else it will not function as expected. Also, I chose precisely two features to be used as model inputs. Precisely two so that the weights may be converted to linear form. This is necessitated to plot the decision boundary in 2D. I cherry-picked two features, mean radius and mean concave points.
The decision boundary on the training set is displayed below.
Subsequently I calculated the accuracy of the model, see Python code below.
= 0
correct_predictions = 0
false_predictions for index, row in train.iterrows():
= np.array([1.0, row['mean radius'], row['mean concave points']])
x = row['label']
y = compute_output_linalg(w, x)
y_hat if y_hat != y:
+= 1
false_predictions else:
+= 1
correct_predictions
print('Accuracy: ', correct_predictions / len(train))
Which resulted in an accuracy of 0.884. Recall that since the dataset is not linearly seperable, it is impossible for the perceptron to find a perfect decision boundary. The next figure finds the decision boundary on the test set.
The accuracy on the test set was about 0.868. A almost \(87\%\) out of sample accuracy is nontheless quite impressive. Even by only using two features (albeit important features) the perceptron managed to generate a predicitve model with great accuracy.
6.0 | Final Remarks
The perceptron learning algorithm exemplifies the power and simplicity of early single-layered neural network models. By iteratively adjusting weights based on misclassifications, it effectively learns a linear decision boundary, showcasing its ability to handle basic binary classification tasks. While the algorithm has limitations, especially with non-linearly separable data, its foundational principles have paved the way for more complex and robust machine learning models. For example, by introduced more layers and a non-linear activation function. Nonetheless the perceptron remains an important stepping stone in the evolution of neural networks and continues to be a valuable educational tool in understanding the basics of machine learning.