Lecture Notes for Machine Learning: Gaussian Discriminant Analysis
The following content is summarized from Leture5 of the Course Stanford CS229.
Discriminative learning algorithms and generative learning algorithms differ fundamentally in how they model data and make predictions.
Discriminative Learning Algorithm
- Directly model the decision boundary between different classes.
- Method: Learn the conditional probability P(y ∣ x), where y is the class label and x is the input feature.
- Example Algorithms: Logistic Regression, Support Vector Machines (SVM), Neural Networks.
Generative Learning Algorithm
- Model how the data is generated for each class.
- Method: Learn the joint probability P(x, y), which can be factored into P(x∣y)P(y). The algorithm derives P(y∣x) using Bayes’ theorem from this.
- Example Algorithms: Naive Bayes, Gaussian Mixture Models (GMM), Hidden Markov Models (HMM).
Focus on the generative learning algorithm:
Recall the Bayes’ theorem:
Bayes’ Theorem plays a key role in some generative learning algorithms, especially for classification tasks. These algorithms model P(x∣y) (the likelihood) and P(y) (the prior probability), and then use Bayes’ Theorem to compute P(y∣x), which is the posterior probability used for classification.
The process is as follows:
- Generative models learn the distribution P(x∣y) for each class y and the prior probability P(y).
- When new data x needs to be classified, Bayes’ Theorem is applied to compute P(y∣x), and the class y with the highest posterior probability is chosen.
Gaussian Discriminant Analysis
Gaussian Discriminant Analysis (GDA) is a classification method based on a generative model used for classification problems, particularly when the data follows a Gaussian distribution.
GDA assumes that the data for each class is generated from a multivariate Gaussian distribution. Specifically, for each class y, the corresponding feature x is assumed to follow a Gaussian distribution.
The probability density function of the Gaussian distribution is given by:
GDA typically involves the following steps:
Estimate the prior probability for each class P(y).
Estimate the mean μc for each class:
Estimate the covariance matrix Σc for each class:
Using Bayes’ theorem, calculate the posterior probability and choose the class with the highest posterior probability:
GDA has two main variants: Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA), which are applied based on different assumptions.
- Linear Discriminant Analysis (LDA): Assumes all classes share the same covariance matrix, Σc=Σ, resulting in linear decision boundaries.
- Quadratic Discriminant Analysis (QDA): Allows each class to have its own covariance matrix Σc, leading to quadratic decision boundaries.
For the case of Linear Discriminant Analysis (LDA) for binary classification
The log-likelihood function can be written as follows
The optimal solution of the parameters
The figure below shows the multivariate Gaussian distribution in 3D and contour plots under different covariance matrices.
The figure below illustrates the concept of discriminative learning algorithms and generative learning algorithms, represented by logistic regression and GDA, respectively. It can be observed that the decision boundaries of the two approaches differ slightly.
GDA and Logistic Regression