torchvision.transforms
更新了,所以一部分代码可能得改成torchvision.transforms.v2
When an image is transformed into a PyTorch tensor, the pixel values are scaled between 0.0 and 1.0.
This transformation can be done using torchvision.transforms.ToTensor()
. It converts the PIL image with a pixel range of to a PyTorch FloatTensor of shape (C, H, W) with a range .
After transform image to tensor, we may perform image normalization
Normalization
- Normalizing the images means transforming the images into such values that the mean and standard deviation of the image become 0.0 and 1.0 respectively.
- It helps get data within a range and reduces the skewness(偏斜), which helps learn faster and better.
- It can help tackle the diminishing and exploding gradients problems
In Pytorch
torchvision.transforms.Normalize()
Normalizes the tensor image with mean and standard deviation
Parameter:
- mean: Sequence of means for each channel
- std: Sequence of std
- inplace: Whether operate data in-place
Returns: Normalized Tensor Image
To normalizing images in Pytorch, we need to
- Load and visualize image and plot pixel values uisng PIL
- Transform image to Tensors using
torchvision.transorms.ToTensor()
- Calculate mean and standard deviation
- Normalize the image
- Visualize normalized image
- verify normalization
Load the image
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
img = Image.open(img_path)
img_np = np.array(img)
plt.hist(img_np.ravel(), bins=50, density=True) # 拉伸为一维
plt.xlabel("pixel values")
plt.ylabel("relative frequency")
plt.title("distribution of pixels")
It may look like this
Transforming images to Tensors
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
# define custom transform function
transform = transforms.Compose([
transforms.ToTensor
])
# transform PIL to Tensor
img_tr = transform(img)
img_np = np.array(img_tensor)
plt.hist(img_np.ravel(), bins=50, density=True)
plt.xlabel("pixel values")
plt.ylabel("relative frequency")
plt.title("distribution of pixels")
value transform into
Calculate mean and std
mean, std = img_tr.mean([1,2]), img_tr.std([1,2])
print("mean and std before normalize:")
print("Mean of the image:", mean)
print("Std of the image:", std)
Calculated the mean and std of the image for all three channels Red, Green, and Blue.
For images that are similar with ImageNet, we can use mean and std instead from ImageNet
Normalizing the images
from tochvision import transforms
# define custom transform function
transform_norm = transforms.Compose([
trsanfroms.ToTensor(),
transforms.Noeliiza(maean, std)
]) # mean和std是外面的变量
img_normalized = transform_norm(img)
img_np = np.array(img_normalized)
plt.hist(img_np.ravel(), bins=50, density=True)
plt.xlabel("pixel values")
plt.ylabel("relative frequency")
plt.title("distribution of pixels")
Visualize the normalized image
img_normalized = transform_norm(img)
img_normalized = np.array(img_normalized)
img_normalized = img_normalized.transpose(1, 2, 0)
plt.imshow(img_normalized)
plt.xticks([])
plt.yticks([])
Calculate the mean and std again
img_nor = transform_norm(img)
mean, std = img_nor.mean([1,2]), img_nor.std([1,2])
print("Mean and Std of normalized image:")
print("Mean of the image:", mean)
print("Std of the image:", std)