
Sleeping Everyday


Solve the problem of export latex code in Markdown as pdf

First, download VSCode and the Markdown All in One extension.

Second, Ctrl + Shift + P and input Print current document to HTML.

A html file is already created and with well-displayed latex equation.

Python permanently modifies pip mirror source

If change the mirror source to Tsinghua Mirror Source, type the following code in the Terminal:

pip config set global.index-url

Install Cuda In Virtual Environment of Conda

First Run the code in the Terminal to determine the version of Cuda:


CUDA Version is the version of cuda that the computer can support, so the version of cuda we want to install needs to be <= CUDA Version (backward compatible)

Run the code in order in the Terminal:

conda create -n env_name python=3.10
conda activate env_name

# conda search cudatoolkit --info
conda install cudatoolkit=11.8.0
conda install cudnn

# pytorch official website:
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia

After installing the Cuda, Run the code in order in the Terminal and verify that Cuda was installed successfully:

conda activate env_name
import torch

Image Pre-processing/Transformation

train_tfm = transforms.Compose([
    # Resize the images into the fixed size
    transforms.Resize((128, 128)),

    Do some Image Enhancement

    # ToTensor() should be the last transformation

Geometric Transformations

# Rotation
transform_rotate = transforms.RandomRotation(degrees=30)

# Translation
transform_translate = transforms.RandomAffine(degrees=0, translate=(0.1, 0.1))

# Flipping
transform_flip = transforms.RandomHorizontalFlip(p=0.5)

# Scaling
transform_scale = transforms.RandomResizedCrop(size=224, scale=(0.8, 1.0))

# Shearing
transform_shear = transforms.RandomAffine(degrees=0, shear=20)

Color Transformations

# Brightness Adjustment
transform_brightness = transforms.ColorJitter(brightness=0.5)

# Contrast Adjustment
transform_contrast = transforms.ColorJitter(contrast=0.5)

# Satuation Adjustment
transform_saturation = transforms.ColorJitter(saturation=0.5)

# Hue Adjustment
transform_hue = transforms.ColorJitter(hue=0.2)

Cropping and Padding

# Random Cropping
transform_random_crop = transforms.RandomCrop(size=224)

# Padding
transform_padding = transforms.Pad(padding=4)

Image Enhancement

# Random Erasing
transform_random_erasing = transforms.RandomErasing(p=0.5, scale=(0.02, 0.33), ratio=(0.3, 3.3), value=0)

Learning Diffusers


With pip

pip install --upgrade diffusers[torch]

With conda

conda install -c conda-forge diffusers


Directly call the pretrained model uploaded in diffusers:

import torch
from diffusers import DDPMPipeline

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load the butterfly pipeline
butterfly_pipeline = DDPMPipeline.from_pretrained(

# Create 8 images
images = butterfly_pipeline(batch_size=8).images

# View the result


Step 0: Login and Initialize some useful functions
# Login
from huggingface_hub import notebook_login
# copy the token in
import numpy as np
import torch
import torch.nn.functional as F
from matplotlib import pyplot as plt
from PIL import Image

def show_images(x):
    """Given a batch of images x, make a grid and convert to PIL"""
    x = x * 0.5 + 0.5  # Map from (-1, 1) back to (0, 1)
    grid = torchvision.utils.make_grid(x)
    grid_im = grid.detach().cpu().permute(1, 2, 0).clip(0, 1) * 255
    grid_im = Image.fromarray(np.array(grid_im).astype(np.uint8))
    return grid_im
Step 1: Download a training dataset

For this example, we’ll use a dataset of images from the Hugging Face Hub. Specifically, this collection of 1000 butterfly pictures.

import torchvision
from datasets import load_dataset
from torchvision import transforms

# Load dataset from
dataset = load_dataset("huggan/smithsonian_butterflies_subset", split="train")

# Or load images from a local folder
dataset = load_dataset("imagefolder", data_dir="path/to/folder")

# We'll train on 32-pixel square images, but you can try larger sizes too
image_size = 32
# You can lower your batch size if you're running out of GPU memory
batch_size = 64

# Define data augmentations
preprocess = transforms.Compose(
        transforms.Resize((image_size, image_size)),  # Resize
        transforms.RandomHorizontalFlip(),  # Randomly flip (data augmentation)
        transforms.ToTensor(),  # Convert to tensor (0, 1)
        transforms.Normalize([0.5], [0.5]),  # Map to (-1, 1)

def transform(examples):
    images = [preprocess(image.convert("RGB")) for image in examples["image"]]
    return {"images": images}


# Create a dataloader from the dataset to serve up the transformed images in batches; Save the images in the dataloader
train_dataloader =
    dataset, batch_size=batch_size, shuffle=True

View the first 8 image examples in the dataset:

xb = next(iter(train_dataloader))["images"].to(device)[:8]
print("X shape:", xb.shape)
show_images(xb).resize((8 * 64, 64), resample=Image.NEAREST)
Step 2: Define the Scheduler

Our plan for training is to take these input images and add noise to them, then feed the noisy images to the model. And during inference, we will use the model predictions to iteratively remove noise. In diffusers, these processes are both handled by the scheduler.

The noise schedule determines how much noise is added at different timesteps.

from diffusers import DDPMScheduler
# Define a Scheduler
noise_scheduler = DDPMScheduler(num_train_timesteps=1000)
# Add noise and View the process of noise-adding
# The core is add_noise()
timesteps = torch.linspace(0, 999, 8).long().to(device)
noise = torch.randn_like(xb) # Random a noise from standard Guassian N(0,I)
noisy_xb = noise_scheduler.add_noise(xb, noise, timesteps)
print("Noisy X shape", noisy_xb.shape)
show_images(noisy_xb).resize((8 * 64, 64), resample=Image.NEAREST)
Step 3: Define the Model

Most diffusion models use architectures that are some variant of a U-Net and that’s what we’ll use here.

from diffusers import UNet2DModel

# Create a model
model = UNet2DModel(
    sample_size=image_size,  # the target image resolution
    in_channels=3,  # the number of input channels, 3 for RGB images
    out_channels=3,  # the number of output channels
    layers_per_block=2,  # how many ResNet layers to use per UNet block
    block_out_channels=(64, 128, 128, 256),  # More channels -> more parameters
        "DownBlock2D",  # a regular ResNet downsampling block
        "AttnDownBlock2D",  # a ResNet downsampling block with spatial self-attention
        "AttnUpBlock2D",  # a ResNet upsampling block with spatial self-attention
        "UpBlock2D",  # a regular ResNet upsampling block

When dealing with higher-resolution inputs you may want to use more down and up-blocks, and keep the attention layers only at the lowest resolution (bottom) layers to reduce memory usage.

# Check that passing in a batch of data and some random timesteps produces an output the same shape as the input data:
with torch.no_grad():
    model_prediction = model(noisy_xb, timesteps).sample
Step 4: Training: Create a Training Loop

For each batch of data, we

  • Sample some random timesteps
  • Noise the data accordingly
  • Feed the noisy data through the model
  • Compare the model predictions with the target (i.e. the noise in this case) using mean squared error as our loss function
  • Update the model parameters via loss.backward() and optimizer.step()
# Set the noise scheduler
noise_scheduler = DDPMScheduler(
    num_train_timesteps=1000, beta_schedule="squaredcos_cap_v2"

# Training loop
optimizer = torch.optim.AdamW(model.parameters(), lr=4e-4)

losses = []

# Loop through the training epoch
for epoch in range(30):
    # Loop through all data
    for step, batch in enumerate(train_dataloader):
        # Set the Scheduler
        # Load the data
        clean_images = batch["images"].to(device)

        # Sample noise to add to the images
        noise = torch.randn(clean_images.shape).to(clean_images.device)
        bs = clean_images.shape[0]

        # Sample a random timestep for each image
        timesteps = torch.randint(
            0, noise_scheduler.num_train_timesteps, (bs,), device=clean_images.device

        # Add noise to the clean images according to the noise magnitude at each timestep
        noisy_images = noise_scheduler.add_noise(clean_images, noise, timesteps)

        # Input the noisy_images to the model and Get the model prediction
        noise_pred = model(noisy_images, timesteps, return_dict=False)[0]

        # Calculate the loss
        loss = F.mse_loss(noise_pred, noise)

        # Update the model parameters with the optimizer

    if (epoch + 1) % 5 == 0:
        loss_last_epoch = sum(losses[-len(train_dataloader) :]) / len(train_dataloader)
        print(f"Epoch:{epoch+1}, loss: {loss_last_epoch}")
# Plot the loss
fig, axs = plt.subplots(1, 2, figsize=(12, 4))
Step 5: Generate Images
  • Method 1: Create a pipeline
from diffusers import DDPMPipeline
image_pipe = DDPMPipeline(unet=model, scheduler=noise_scheduler)
pipeline_output = image_pipe()
# save a pipeline to a local folder like so:
# Inspecting the folder contents:
ls my_pipeline/
# Output: model_index.json  scheduler  unet
# The `scheduler` and `unet` subfolders contain everything needed to re-create those components. For example, inside the `unet` folder you'll find the model weights (`diffusion_pytorch_model.bin`) alongside a config file which specifies the UNet architecture. 
  • Method 2: Writing a Sampling Loop
# Random starting point (8 random images):
sample = torch.randn(8, 3, 32, 32).to(device)

# A specific process of denoising and generating the images
for i, t in enumerate(noise_scheduler.timesteps):
    # Get model pred
    with torch.no_grad():
        residual = model(sample, t).sample

    # Update sample with step
    sample = noise_scheduler.step(residual, t, sample).prev_sample


Difference Between PCA and AutoEncoder


Suppose there are m n-dimensional data, $ X_{n \times m} = [x_1, x_2,…, x_m]$, where each $x$ is an n-dimensional column vector.

Reduce Dimensionality
  1. Decentralize the data: $x_{i} = x_{i} - \frac{1}{m} \sum_{j=1}^{m} x_{j}$ and update $X$
  2. Calculate the Covariance matrix: $C = \frac{1}{m} XX^T$
  3. Take eigenvalue decomposition of the Covariance matrix $C$ and get the eigenvector matrix (the eigenvector is arranged in columns from the related largest to smallest eigenvalues). Take the first $k$ columns to form the matrix $P_{n \times k}$
  4. Project the original data into the $P$ coordinate system to get the dimensionality reduced data: $Y_{k \times m} = P_{n \times k}^T \times X_{n \times m}$, which is a linear transformation. The dimension of the data after PCA is changed compared to the origin data.
Data Reconstruction

PCA is lossy, that is, the compressed data does not maintain all the information of the original data, so the compressed data can not be restored back to the original high-dimensional data, but the restored data can be regarded as an approximation of the original data: $X_{n \times m}^{’} = P_{n \times k} Y_{k \times m}$



The original data $X$ is input, and then compressed according to the network model, the original high dimensional data $X$ is compressed into low dimensional data C, and these low dimensional data is usually customarily referred to as latent vector, the original data after the activation function operation of the nonlinear hidden layer, the original data will be transformed into a low dimensional space, this space is considered to be the high-feature space. After the original data is operated by the activation function of the nonlinear hidden layer, the original data will be transformed to a low-dimensional space, which is considered as the high-feature space. AutoEncoder is a non-linear transformation. The dimension of the data after Encoder is the same as the origin data.


Convert the original implicit layer data back into the original data space.

How to design the network

For simple datasets such as MNIST, a network with 1-2 hidden layers is usually sufficient. However, a network with 3 or more hidden layers can capture more complex features, but can also lead to overfitting.

As for MNIST, the number of nodes in the hidden layer should decrease layer by layer, usually the number of nodes in the last layer of the encoder, i.e., the dimension of the potential space, can be chosen from 32 to 128.

Code Example:

AutoEncoder on MNIST dataset:

self.encoder = nn.Sequential(
               nn.Linear(28 * 28, 128),
               nn.Linear(128, 64),
               nn.Linear(64, 32)
# Design of the network can be changed
self.decoder = nn.Sequential(
            nn.Linear(32, 64),
            nn.Linear(64, 128),
            nn.Linear(128, 28 * 28), 
# Design of the network can be changed

Study how to build a github repository


First create a github repository and set the repository name.

Run the code in order in the PowerShell under the target directory:

cd dir_name
git init
git add .
git commit -m "first commit"
git remote add origin
git push origin main

Study how to build a blog

New a post

( •̀ .̫ •́ )✧

Run the code in order in the PowerShell:

cd D:/html
D:/hugo_dir/hugo new post/

Push to Github


Run the code in order in the PowerShell:

cd public
git add .
git commit -m "test"
git push origin main

My First Blog

Hello Serendpity’s Blog

This is Serendpity’s first blog post. Love you all! ( •̀ ω •́ )y

