lecture1-cs231n

overview

Alt text
Alt text
Alt text

data_drive

Image Classification
A Core Task in Computer Vision
Today:
● The image classification task

● Two basic data-driven approaches to image classification
○ K-nearest neighbor and linear classifier

/*
An image is a tensor of integers
between [0, 255]:
e.g. 800 x 600 x 3
(3 channels RGB)
*/

直接根绝分类对象写算法判断几乎不可能
干扰:
Illumination 光线
Background Clutter 背景混乱
Occlusion 只露出一部分
Deformation 会形变
······
no obvious way to hard-code the algorithm for
recognizing a cat, or other classes.

Machine Learning: Data-Driven Approach

  1. Collect a dataset of images and labels
  2. Use Machine Learning algorithms to train a classifier
  3. Evaluate the classifier on new images

Knn K Nearest Neighbor Classifier(最近邻分类器)

KNN就是记忆存储大量得训练集,然后每次测试时候根据距离判断与测试图片最相似得K张照片,再根据投票选择最合适lable
(即矩阵差值最小的K个)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import numpy as np

class NearestNeighbor(object):
def __init__(self):
pass

def train(self, X, y):
"""
这个地方的训练其实就是把所有的已有图片读取进来 -_-||
"""
""" X is N x D where each row is an example. Y is 1-dimension of size N """
# the nearest neighbor classifier simply remembers all the training data
self.Xtr = X
self.ytr = y

def predict(self, X):
"""
所谓的预测过程其实就是扫描所有训练集中的图片,计算距离,取最小的距离对应图片的类目
"""
""" X is N x D where each row is an example we wish to predict label for """
num_test = X.shape[0]
# 这里要保证维度一致!
# lets make sure that the output type matches the input type
Ypred = np.zeros(num_test, dtype = self.ytr.dtype) # 一维元素都是0的array

# 把训练集扫一遍 -_-||
# loop over all test rows
for i in xrange(num_test):
# 注意这里的xrange仅适用于python2.x,range适用于python3.x
# find the nearest training image to the i'th test image
# using the L1 distance (sum of absolute value differences)
# 对训练集中每一张图片都与指定的一张测试图片,在对应元素位置上做差,
# 然后分别以每张图片为单位求和。
distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1) # 一个5000元素的list
# 取最小distance图片的下标:
min_index = np.argmin(distances) # get the index with smallest distance
Ypred[i] = self.ytr[min_index] # predict the label of the nearest example

return Ypred

两种距离:
Alt text
超参数:得提前设置好
Hyperparameters
两个选择问题:
What is the best value of k to use?
What is the best distance to use?
These are hyperparameters: choices about
the algorithms themselves.
Very problem/dataset-dependent.
Must try them all out and see what works best.
k-NN分类器需要设定k值,那么选择哪个k值最合适的呢?我们可以选择不同的距离函数,比如L1范数和L2范数等,那么选哪个好?还有不少选择我们甚至连考虑都没有考虑到(比如:点积)。所有这些选择,被称为超参数(hyperparameter)。在基于数据进行学习的机器学习算法设计中,超参数是很常见的。一般说来,这些超参数具体怎么设置或取值并不是显而易见的。

训练集、验证集、测试集
Alt text
(Example Dataset: CIFAR10
10 classes
50,000 training images
10,000 testing images)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# 假定已经有Xtr_rows, Ytr, Xte_rows, Yte了,其中Xtr_rows为50000*3072 矩阵
# assume we have Xtr_rows, Ytr, Xte_rows, Yte as before
# recall Xtr_rows is 50,000 x 3072 matrix
Xval_rows = Xtr_rows[:1000, :] # take first 1000 for validation 构建前1000个图为交叉验证集
Yval = Ytr[:1000]
Xtr_rows = Xtr_rows[1000:, :] # keep last 49,000 for train 保留其余49000个图为训练集
Ytr = Ytr[1000:]

# 设置一些k值,用于试验
# find hyperparameters that work best on the validation set
validation_accuracies = []
for k in [1, 3, 5, 10, 20, 50, 100]:

# 初始化对象
# use a particular value of k and evaluation on validation data
nn = NearestNeighbor()
nn.train(Xtr_rows, Ytr)
# 修改一下predict函数,接受 k 作为参数
# here we assume a modified NearestNeighbor class that can take a k as input
Yval_predict = nn.predict(Xval_rows, k = k)
acc = np.mean(Yval_predict == Yval)
print 'accuracy: %f' % (acc,)

# 输出结果
# keep track of what works on the validation set
validation_accuracies.append((k, acc)) # 元组形式append在列表里

K-Nearest Neighbors: Summary
In image classification we start with a training set of images and labels, and
must predict labels on the test set
The K-Nearest Neighbors classifier predicts labels based on the K nearest
training examples
Distance metric and K are hyperparameters
Choose hyperparameters using the validation set
Only run on the test set once at the very end!

小结:

介绍了图像分类问题。在该问题中,给出一个由被标注了分类标签的图像组成的集合,要求算法能预测没有标签的图像的分类标签,并根据算法预测准确率进行评价。
介绍了一个简单的图像分类器:最近邻分类器(Nearest Neighbor classifier)。分类器中存在不同的超参数(比如k值或距离类型的选取),要想选取好的超参数不是一件轻而易举的事。
选取超参数的正确方法是:将原始训练集分为训练集和验证集,我们在验证集上尝试不同的超参数,最后保留表现最好那个。
如果训练数据量不够,使用交叉验证方法,它能帮助我们在选取最优超参数的时候减少噪音。
一旦找到最优的超参数,就让算法以该参数在测试集跑且只跑一次,并根据测试结果评价算法。
最近邻分类器能够在CIFAR-10上得到将近40%的准确率。该算法简单易实现,但需要存储所有训练数据,并且在测试的时候过于耗费计算能力。
最后,我们知道了仅仅使用L1和L2范数来进行像素比较是不够的,图像更多的是按照背景和颜色被分类,而不是语义主体分身。
在接下来的课程中,我们将专注于解决这些问题和挑战,并最终能够得到超过90%准确率的解决方案。该方案能够在完成学习就丢掉训练集,并在一毫秒之内就完成一张图片的分类

Linear Classifier

f(x,W) = Wx + b
计算根据w,b选择分数最高的,即为该标签(w是向量)
1.代数表示
Alt text

2.可视化表示
即用最合适得参数W和b改为矩阵表示
Alt text
3.线性表示
Alt text

参考文章:
https://iphysresearch.github.io/blog/tag/cs231n/


lecture1-cs231n
https://shallowu.github.io/2024/01e069a11a.html
作者
ShallowU
发布于
2024年1月24日
许可协议