Difference Between PCA and AutoEncoder
PCA
Suppose there are m n-dimensional data, $ X_{n \times m} = [x_1, x_2,…, x_m]$, where each $x$ is an n-dimensional column vector.
Reduce Dimensionality
- Decentralize the data: $x_{i} = x_{i} - \frac{1}{m} \sum_{j=1}^{m} x_{j}$ and update $X$
- Calculate the Covariance matrix: $C = \frac{1}{m} XX^T$
- Take eigenvalue decomposition of the Covariance matrix $C$ and get the eigenvector matrix (the eigenvector is arranged in columns from the related largest to smallest eigenvalues). Take the first $k$ columns to form the matrix $P_{n \times k}$
- Project the original data into the $P$ coordinate system to get the dimensionality reduced data: $Y_{k \times m} = P_{n \times k}^T \times X_{n \times m}$, which is a linear transformation. The dimension of the data after PCA is changed compared to the origin data.
Data Reconstruction
PCA is lossy, that is, the compressed data does not maintain all the information of the original data, so the compressed data can not be restored back to the original high-dimensional data, but the restored data can be regarded as an approximation of the original data: $X_{n \times m}^{’} = P_{n \times k} Y_{k \times m}$
AutoEncoder
Encoder
The original data $X$ is input, and then compressed according to the network model, the original high dimensional data $X$ is compressed into low dimensional data C, and these low dimensional data is usually customarily referred to as latent vector, the original data after the activation function operation of the nonlinear hidden layer, the original data will be transformed into a low dimensional space, this space is considered to be the high-feature space. After the original data is operated by the activation function of the nonlinear hidden layer, the original data will be transformed to a low-dimensional space, which is considered as the high-feature space. AutoEncoder is a non-linear transformation. The dimension of the data after Encoder is the same as the origin data.
Decoder
Convert the original implicit layer data back into the original data space.
How to design the network
For simple datasets such as MNIST, a network with 1-2 hidden layers is usually sufficient. However, a network with 3 or more hidden layers can capture more complex features, but can also lead to overfitting.
As for MNIST, the number of nodes in the hidden layer should decrease layer by layer, usually the number of nodes in the last layer of the encoder, i.e., the dimension of the potential space, can be chosen from 32 to 128.
Code Example:
AutoEncoder on MNIST dataset:
self.encoder = nn.Sequential(
nn.Linear(28 * 28, 128),
nn.ReLU(True),
nn.Linear(128, 64),
nn.ReLU(True),
nn.Linear(64, 32)
)
# Design of the network can be changed
self.decoder = nn.Sequential(
nn.Linear(32, 64),
nn.ReLU(True),
nn.Linear(64, 128),
nn.ReLU(True),
nn.Linear(128, 28 * 28),
nn.Tanh()
)
# Design of the network can be changed