lecture1-cs231n

overview

Alt text

data_drive

Image Classification
A Core Task in Computer Vision
Today:
● The image classification task

● Two basic data-driven approaches to image classification
○ K-nearest neighbor and linear classifier

/*
An image is a tensor of integers
between [0, 255]:
e.g. 800 x 600 x 3
(3 channels RGB)
*/

直接根绝分类对象写算法判断几乎不可能
干扰：
Illumination 光线
Background Clutter 背景混乱
Occlusion 只露出一部分
Deformation 会形变
······
no obvious way to hard-code the algorithm for
recognizing a cat, or other classes.

Machine Learning: Data-Driven Approach

Collect a dataset of images and labels
Use Machine Learning algorithms to train a classifier
Evaluate the classifier on new images

Knn K Nearest Neighbor Classifier（最近邻分类器）

KNN就是记忆存储大量得训练集，然后每次测试时候根据距离判断与测试图片最相似得K张照片，再根据投票选择最合适lable
（即矩阵差值最小的K个）

import numpy as np

class NearestNeighbor(object):
	def __init__(self):
		pass

	def train(self, X, y):
    	""" 
    	这个地方的训练其实就是把所有的已有图片读取进来 -_-||
    	"""
    	""" X is N x D where each row is an example. Y is 1-dimension of size N """
    	# the nearest neighbor classifier simply remembers all the training data
    	self.Xtr = X
    	self.ytr = y

	def predict(self, X):
 		""" 
    	所谓的预测过程其实就是扫描所有训练集中的图片，计算距离，取最小的距离对应图片的类目
    	"""
    	""" X is N x D where each row is an example we wish to predict label for """
    	num_test = X.shape[0]
        # 这里要保证维度一致！
    	# lets make sure that the output type matches the input type
    	Ypred = np.zeros(num_test, dtype = self.ytr.dtype) # 一维元素都是0的array

        # 把训练集扫一遍 -_-||
		# loop over all test rows
    	for i in xrange(num_test):  
            # 注意这里的xrange仅适用于python2.x，range适用于python3.x
      		# find the nearest training image to the i'th test image
      		# using the L1 distance (sum of absolute value differences)
      		# 对训练集中每一张图片都与指定的一张测试图片，在对应元素位置上做差，
            # 然后分别以每张图片为单位求和。
      		distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1)	# 一个5000元素的list
      		# 取最小distance图片的下标：
      		min_index = np.argmin(distances) # get the index with smallest distance
      		Ypred[i] = self.ytr[min_index] # predict the label of the nearest example

    	return Ypred

两种距离：
Alt text
超参数：得提前设置好
Hyperparameters
两个选择问题：
What is the best value of k to use?
What is the best distance to use?
These are hyperparameters: choices about
the algorithms themselves.
Very problem/dataset-dependent.
Must try them all out and see what works best.
k-NN分类器需要设定k值，那么选择哪个k值最合适的呢？我们可以选择不同的距离函数，比如L1范数和L2范数等，那么选哪个好？还有不少选择我们甚至连考虑都没有考虑到（比如：点积）。所有这些选择，被称为超参数（hyperparameter）。在基于数据进行学习的机器学习算法设计中，超参数是很常见的。一般说来，这些超参数具体怎么设置或取值并不是显而易见的。

训练集、验证集、测试集
Alt text
（Example Dataset: CIFAR10
10 classes
50,000 training images
10,000 testing images）

# 假定已经有Xtr_rows, Ytr, Xte_rows, Yte了，其中Xtr_rows为50000*3072 矩阵
# assume we have Xtr_rows, Ytr, Xte_rows, Yte as before
# recall Xtr_rows is 50,000 x 3072 matrix
Xval_rows = Xtr_rows[:1000, :] # take first 1000 for validation 构建前1000个图为交叉验证集
Yval = Ytr[:1000]
Xtr_rows = Xtr_rows[1000:, :] # keep last 49,000 for train 保留其余49000个图为训练集
Ytr = Ytr[1000:]

# 设置一些k值，用于试验
# find hyperparameters that work best on the validation set
validation_accuracies = []
for k in [1, 3, 5, 10, 20, 50, 100]:

    # 初始化对象
    # use a particular value of k and evaluation on validation data
    nn = NearestNeighbor()
	nn.train(Xtr_rows, Ytr)
    # 修改一下predict函数，接受 k 作为参数
  	# here we assume a modified NearestNeighbor class that can take a k as input
  	Yval_predict = nn.predict(Xval_rows, k = k)
  	acc = np.mean(Yval_predict == Yval)
  	print 'accuracy: %f' % (acc,)

    # 输出结果
  	# keep track of what works on the validation set
  	validation_accuracies.append((k, acc)) # 元组形式append在列表里

K-Nearest Neighbors: Summary
In image classification we start with a training set of images and labels, and
must predict labels on the test set
The K-Nearest Neighbors classifier predicts labels based on the K nearest
training examples
Distance metric and K are hyperparameters
Choose hyperparameters using the validation set
Only run on the test set once at the very end!

小结：

介绍了图像分类问题。在该问题中，给出一个由被标注了分类标签的图像组成的集合，要求算法能预测没有标签的图像的分类标签，并根据算法预测准确率进行评价。
介绍了一个简单的图像分类器：最近邻分类器(Nearest Neighbor classifier)。分类器中存在不同的超参数(比如k值或距离类型的选取)，要想选取好的超参数不是一件轻而易举的事。
选取超参数的正确方法是：将原始训练集分为训练集和验证集，我们在验证集上尝试不同的超参数，最后保留表现最好那个。
如果训练数据量不够，使用交叉验证方法，它能帮助我们在选取最优超参数的时候减少噪音。
一旦找到最优的超参数，就让算法以该参数在测试集跑且只跑一次，并根据测试结果评价算法。
最近邻分类器能够在CIFAR-10上得到将近40%的准确率。该算法简单易实现，但需要存储所有训练数据，并且在测试的时候过于耗费计算能力。
最后，我们知道了仅仅使用L1和L2范数来进行像素比较是不够的，图像更多的是按照背景和颜色被分类，而不是语义主体分身。
在接下来的课程中，我们将专注于解决这些问题和挑战，并最终能够得到超过90%准确率的解决方案。该方案能够在完成学习就丢掉训练集，并在一毫秒之内就完成一张图片的分类