MXNet チュートリアル : 手書き数字認識 – MNIST

MXNet チュートリアル : 手書き数字認識 – MNIST (翻訳/解説)
翻訳 : (株)クラスキャットセールスインフォメーション
日時 : 02/21/2017

* 本ページは、MXNet 本家サイトの Handwritten Digit Recognition Tutorial を翻訳した上で適宜、補足説明したものです：
http://mxnet.io/tutorials/python/mnist.html

Handwritten Digit Classification – MLP と畳込みネットワークを使用して MNIST データセットから手書き数字を分類する単純なサンプルです。

* サンプルコードの動作確認はしておりますが、適宜、追加改変しています。
* 画像の多くは自作していますが一部 github から fork して引用しています。

序

このチュートリアルは古典的なコンピュータビジョン・アプリケーションを案内します : ニューラルネットワークで手書き数字を識別します。

データをロードする

最初に MNIST データセットを取得します、これは手書き数字認識のために一般的に使用されるデータセットです。このデータセットの各画像は 0 と 254 の間のグレースケール値で 28×28 にリサイズされます。次のコードはダウンロードして画像と相当するラベルを numpy にロードします。

import numpy as np
import os
import urllib
import gzip
import struct
def download_data(url, force_download=True): 
    fname = url.split("/")[-1]
    if force_download or not os.path.exists(fname):
        urllib.urlretrieve(url, fname)
    return fname

def read_data(label_url, image_url):
    with gzip.open(download_data(label_url)) as flbl:
        magic, num = struct.unpack(">II", flbl.read(8))
        label = np.fromstring(flbl.read(), dtype=np.int8)
    with gzip.open(download_data(image_url), 'rb') as fimg:
        magic, num, rows, cols = struct.unpack(">IIII", fimg.read(16))
        image = np.fromstring(fimg.read(), dtype=np.uint8).reshape(len(label), rows, cols)
    return (label, image)

path='http://yann.lecun.com/exdb/mnist/'
(train_lbl, train_img) = read_data(
    path+'train-labels-idx1-ubyte.gz', path+'train-images-idx3-ubyte.gz')
(val_lbl, val_img) = read_data(
    path+'t10k-labels-idx1-ubyte.gz', path+'t10k-images-idx3-ubyte.gz')

最初の 10 画像をプロットしてそれらのラベルをプリントとします。

%matplotlib inline
import matplotlib.pyplot as plt
for i in range(10):
    plt.subplot(1,10,i+1)
    plt.imshow(train_img[i], cmap='Greys_r')
    plt.axis('off')
plt.show()
print('label: %s' % (train_lbl[0:10],))

label: [5 0 4 1 9 2 1 3 1 4]

次に MXNet のためのデータ iterator を作成します。データ iterator は (python の) iterator に似たもので、各 $next()$ 呼び出しでデータのバッチを返します。バッチは幾つかの画像を相当するラベルと一緒に含みます。これらの画像は shape $(batch\_size, num\_channels, width, height)$ で 4-D 行列にストアされます。MNIST データセットについては、一つだけのカラー・チャネルがあり、width と height はともに 28 です。それから、トレーニングに使用される画像はしばしばシャッフルします、これはトレーニング・プロセスを加速します。

import mxnet as mx

def to4d(img):
    return img.reshape(img.shape[0], 1, 28, 28).astype(np.float32)/255

batch_size = 100
train_iter = mx.io.NDArrayIter(to4d(train_img), train_lbl, batch_size, shuffle=True)
val_iter = mx.io.NDArrayIter(to4d(val_img), val_lbl, batch_size)

多層パーセプトロン

多層パーセプトロンは幾つかの完全結合層を含みます。$n \times m$ 入力行列 $X$ を持つ完全結合層はサイズ $n \times k$ の行列を出力します、ここで $k$ はしばしば隠れサイズと呼ばれます。この層は２つのパラメータ、$m \times k$ 重み行列 $W$ と $m \times 1$ バイアス行列 $b$ を持ちます。それは出力を以下で計算します。

$$Y = W X + b$$

完全結合層の出力はしばしば活性化層に供給され、これは element-wise な演算を実行します。２つの一般的なオプションは sigmoid 関数、あるいは rectifier (or “relu”) 関数で、これは 0 と入力の max を出力します。

最後の完全結合層はしばしばデータセットのクラス数と同じ隠れサイズを持ちます。そして softmax 層を積み、これは入力を確率スコアにマップします。再び入力 $X$ はサイズ $x \times m$ を持ちます :

$$ \left[\frac{\exp(x_{i1})}{\sum_{j=1}^m \exp(x_{ij})},\ldots, \frac{\exp(x_{im})}{\sum_{j=1}^m \exp(x_{ij})}\right] $$

MXNet で多層パーセプトロンを定義することは率直 (= straightforward) で、次のように示されます :

# Create a place holder variable for the input data
data = mx.sym.Variable('data')
# Flatten the data from 4-D shape (batch_size, num_channel, width, height) 
# into 2-D (batch_size, num_channel*width*height)
data = mx.sym.Flatten(data=data)

# The first fully-connected layer
fc1  = mx.sym.FullyConnected(data=data, name='fc1', num_hidden=128)
# Apply relu to the output of the first fully-connnected layer
act1 = mx.sym.Activation(data=fc1, name='relu1', act_type="relu")

# The second fully-connected layer and the according activation function
fc2  = mx.sym.FullyConnected(data=act1, name='fc2', num_hidden = 64)
act2 = mx.sym.Activation(data=fc2, name='relu2', act_type="relu")

# The thrid fully-connected layer, note that the hidden size should be 10, which is the number of unique digits
fc3  = mx.sym.FullyConnected(data=act2, name='fc3', num_hidden=10)
# The softmax and loss layer
mlp  = mx.sym.SoftmaxOutput(data=fc3, name='softmax')

# We visualize the network structure with output size (the batch_size is ignored.)
shape = {"data" : (batch_size, 1, 28, 28)}
mx.viz.plot_network(symbol=mlp, shape=shape)

さてネットワーク定義とデータ iterator の準備ができました。トレーニングを開始できます。

# @@@ AUTOTEST_OUTPUT_IGNORED_CELL
import logging
logging.getLogger().setLevel(logging.DEBUG)

model = mx.model.FeedForward(
    symbol = mlp,       # network structure
    num_epoch = 10,     # number of data passes for training 
    learning_rate = 0.1 # learning rate of SGD 
)
model.fit(
    X=train_iter,       # training data
    eval_data=val_iter, # validation data
    batch_end_callback = mx.callback.Speedometer(batch_size, 200) # output progress for each 200 data batches
)

INFO:root:Start training with [cpu(0)]
INFO:root:Epoch[0] Batch [200]	Speed: 19178.20 samples/sec	Train-accuracy=0.112900
INFO:root:Epoch[0] Batch [400]	Speed: 24869.00 samples/sec	Train-accuracy=0.110750
INFO:root:Epoch[0] Batch [600]	Speed: 25210.35 samples/sec	Train-accuracy=0.140050
INFO:root:Epoch[0] Resetting Data Iterator
INFO:root:Epoch[0] Time cost=2.726
INFO:root:Epoch[0] Validation-accuracy=0.258200
INFO:root:Epoch[1] Batch [200]	Speed: 25492.90 samples/sec	Train-accuracy=0.421850
INFO:root:Epoch[1] Batch [400]	Speed: 25741.42 samples/sec	Train-accuracy=0.745950
INFO:root:Epoch[1] Batch [600]	Speed: 25685.61 samples/sec	Train-accuracy=0.835700
INFO:root:Epoch[1] Resetting Data Iterator
INFO:root:Epoch[1] Time cost=2.355
INFO:root:Epoch[1] Validation-accuracy=0.848900
INFO:root:Epoch[2] Batch [200]	Speed: 26176.50 samples/sec	Train-accuracy=0.855650
INFO:root:Epoch[2] Batch [400]	Speed: 25339.32 samples/sec	Train-accuracy=0.885600
INFO:root:Epoch[2] Batch [600]	Speed: 25406.15 samples/sec	Train-accuracy=0.904950
INFO:root:Epoch[2] Resetting Data Iterator
INFO:root:Epoch[2] Time cost=2.368
INFO:root:Epoch[2] Validation-accuracy=0.911900
INFO:root:Epoch[3] Batch [200]	Speed: 26090.96 samples/sec	Train-accuracy=0.914800
INFO:root:Epoch[3] Batch [400]	Speed: 23369.18 samples/sec	Train-accuracy=0.927450
INFO:root:Epoch[3] Batch [600]	Speed: 24945.86 samples/sec	Train-accuracy=0.936700
INFO:root:Epoch[3] Resetting Data Iterator
INFO:root:Epoch[3] Time cost=2.455
INFO:root:Epoch[3] Validation-accuracy=0.935800
INFO:root:Epoch[4] Batch [200]	Speed: 25774.72 samples/sec	Train-accuracy=0.940500
INFO:root:Epoch[4] Batch [400]	Speed: 25342.44 samples/sec	Train-accuracy=0.946850
INFO:root:Epoch[4] Batch [600]	Speed: 17439.10 samples/sec	Train-accuracy=0.950700
INFO:root:Epoch[4] Resetting Data Iterator
INFO:root:Epoch[4] Time cost=2.738
INFO:root:Epoch[4] Validation-accuracy=0.948900
INFO:root:Epoch[5] Batch [200]	Speed: 25016.45 samples/sec	Train-accuracy=0.952700
INFO:root:Epoch[5] Batch [400]	Speed: 25466.91 samples/sec	Train-accuracy=0.957000
INFO:root:Epoch[5] Batch [600]	Speed: 24626.90 samples/sec	Train-accuracy=0.959550
INFO:root:Epoch[5] Resetting Data Iterator
INFO:root:Epoch[5] Time cost=2.425
INFO:root:Epoch[5] Validation-accuracy=0.957700
INFO:root:Epoch[6] Batch [200]	Speed: 25067.18 samples/sec	Train-accuracy=0.960950
INFO:root:Epoch[6] Batch [400]	Speed: 24772.56 samples/sec	Train-accuracy=0.963550
INFO:root:Epoch[6] Batch [600]	Speed: 15472.62 samples/sec	Train-accuracy=0.965000
INFO:root:Epoch[6] Resetting Data Iterator
INFO:root:Epoch[6] Time cost=2.911
INFO:root:Epoch[6] Validation-accuracy=0.964000
INFO:root:Epoch[7] Batch [200]	Speed: 25828.42 samples/sec	Train-accuracy=0.966100
INFO:root:Epoch[7] Batch [400]	Speed: 25189.55 samples/sec	Train-accuracy=0.968550
INFO:root:Epoch[7] Batch [600]	Speed: 25013.47 samples/sec	Train-accuracy=0.969950
INFO:root:Epoch[7] Resetting Data Iterator
INFO:root:Epoch[7] Time cost=2.399
INFO:root:Epoch[7] Validation-accuracy=0.965300
INFO:root:Epoch[8] Batch [200]	Speed: 26268.23 samples/sec	Train-accuracy=0.971450
INFO:root:Epoch[8] Batch [400]	Speed: 19536.00 samples/sec	Train-accuracy=0.972100
INFO:root:Epoch[8] Batch [600]	Speed: 12375.30 samples/sec	Train-accuracy=0.974000
INFO:root:Epoch[8] Resetting Data Iterator
INFO:root:Epoch[8] Time cost=3.439
INFO:root:Epoch[8] Validation-accuracy=0.967000
INFO:root:Epoch[9] Batch [200]	Speed: 25806.82 samples/sec	Train-accuracy=0.974350
INFO:root:Epoch[9] Batch [400]	Speed: 24996.94 samples/sec	Train-accuracy=0.975600
INFO:root:Epoch[9] Batch [600]	Speed: 24979.98 samples/sec	Train-accuracy=0.977250
INFO:root:Epoch[9] Resetting Data Iterator
INFO:root:Epoch[9] Time cost=2.408
INFO:root:Epoch[9] Validation-accuracy=0.969700

トレーニングが終了した後は、単一の画像を予測できます。

# @@@ AUTOTEST_OUTPUT_IGNORED_CELL
plt.imshow(val_img[0], cmap='Greys_r')
plt.axis('off')
plt.show()
prob = model.predict(val_img[0:1].astype(np.float32)/255)[0]
assert max(prob) > 0.99, "Low prediction accuracy."
print 'Classified as %d with probability %f' % (prob.argmax(), max(prob))

Classified as 7 with probability 0.999889

データ iterator が与えられた時の精度を評価することもできます。

# @@@ AUTOTEST_OUTPUT_IGNORED_CELL
valid_acc = model.score(val_iter)
print 'Validation accuracy: %f%%' % (valid_acc *100,)
assert valid_acc > 0.95, "Low validation accuracy."

Validation accuracy: 96.960000%

更に以下のコードを利用すれば Web ページのボックスに書かれる数字を認識することもできます。

from IPython.display import HTML
import cv2
import numpy as np

def classify(img):
    img = img[len('data:image/png;base64,'):].decode('base64')
    img = cv2.imdecode(np.fromstring(img, np.uint8), -1)
    img = cv2.resize(img[:,:,3], (28,28))
    img = img.astype(np.float32).reshape((1,1,28,28))/255.0
    return model.predict(img)[0].argmax()

HTML(filename="mnist_demo.html")




Result:

(訳注: 以下は実際に組み込んで訳者の手書き数字を認識させたスナップショットです。)

畳込みニューラルネットワーク

前の完全結合層では画像をトレーニングの間ベクトルに単に reshape していることに注意しましょう。それは、ピクセルが水平にも垂直次元にも相関関係にあるという空間情報を無視しています。畳込み層は、より構造的な重み $W$ を使用してこの欠点を改善することを目指しています。単なる行列-行列操作の代わりに、出力を得るために 2-D 畳込みを使用しています。

異なる特徴を捕捉するためにまた、それぞれが個別の重み行列を持つ複数の特徴マップを持てます :

畳込み層の他にも、畳込みニューラルネットワークの他の主要な変更はプーリング層の追加です。プーリング層は $n \times m$ (= しばしばカーネルサイズと呼ばれます) 画像パッチをネットワークが空間位置に敏感でなくなるように単一の値に減じます。

data = mx.symbol.Variable('data')
# first conv layer
conv1 = mx.sym.Convolution(data=data, kernel=(5,5), num_filter=20)
tanh1 = mx.sym.Activation(data=conv1, act_type="tanh")
pool1 = mx.sym.Pooling(data=tanh1, pool_type="max", kernel=(2,2), stride=(2,2))
# second conv layer
conv2 = mx.sym.Convolution(data=pool1, kernel=(5,5), num_filter=50)
tanh2 = mx.sym.Activation(data=conv2, act_type="tanh")
pool2 = mx.sym.Pooling(data=tanh2, pool_type="max", kernel=(2,2), stride=(2,2))
# first fullc layer
flatten = mx.sym.Flatten(data=pool2)
fc1 = mx.symbol.FullyConnected(data=flatten, num_hidden=500)
tanh3 = mx.sym.Activation(data=fc1, act_type="tanh")
# second fullc
fc2 = mx.sym.FullyConnected(data=tanh3, num_hidden=10)
# softmax loss
lenet = mx.sym.SoftmaxOutput(data=fc2, name='softmax')
mx.viz.plot_network(symbol=lenet, shape=shape)

LeNet は前の multilayer perceptron よりも複雑なので、トレーニングには CPU の代わりに GPU を使います。

# @@@ AUTOTEST_OUTPUT_IGNORED_CELL
model = mx.model.FeedForward(
    ctx = mx.gpu(0),     # use GPU 0 for training, others are same as before
    symbol = lenet,       
    num_epoch = 10,     
    learning_rate = 0.1)
model.fit(
    X=train_iter,  
    eval_data=val_iter, 
    batch_end_callback = mx.callback.Speedometer(batch_size, 200)
) 
assert model.score(val_iter) > 0.98, "Low validation accuracy."

(訳注: 以下は CPU で試しています。)

INFO:root:Start training with [cpu(0)]
INFO:root:Epoch[0] Batch [200]	Speed: 718.21 samples/sec	Train-accuracy=0.112550
INFO:root:Epoch[0] Batch [400]	Speed: 720.94 samples/sec	Train-accuracy=0.110750
INFO:root:Epoch[0] Batch [600]	Speed: 732.30 samples/sec	Train-accuracy=0.112050
INFO:root:Epoch[0] Resetting Data Iterator
INFO:root:Epoch[0] Time cost=82.999
INFO:root:Epoch[0] Validation-accuracy=0.113500
INFO:root:Epoch[1] Batch [200]	Speed: 664.57 samples/sec	Train-accuracy=0.408550
INFO:root:Epoch[1] Batch [400]	Speed: 651.25 samples/sec	Train-accuracy=0.889900
INFO:root:Epoch[1] Batch [600]	Speed: 730.05 samples/sec	Train-accuracy=0.935100
INFO:root:Epoch[1] Resetting Data Iterator
INFO:root:Epoch[1] Time cost=88.266
INFO:root:Epoch[1] Validation-accuracy=0.947700
INFO:root:Epoch[2] Batch [200]	Speed: 714.93 samples/sec	Train-accuracy=0.949900
INFO:root:Epoch[2] Batch [400]	Speed: 652.99 samples/sec	Train-accuracy=0.965900
INFO:root:Epoch[2] Batch [600]	Speed: 733.28 samples/sec	Train-accuracy=0.970400
INFO:root:Epoch[2] Resetting Data Iterator
INFO:root:Epoch[2] Time cost=85.936
INFO:root:Epoch[2] Validation-accuracy=0.974400
INFO:root:Epoch[3] Batch [200]	Speed: 743.78 samples/sec	Train-accuracy=0.971700
INFO:root:Epoch[3] Batch [400]	Speed: 736.98 samples/sec	Train-accuracy=0.977950
INFO:root:Epoch[3] Batch [600]	Speed: 748.92 samples/sec	Train-accuracy=0.978450
INFO:root:Epoch[3] Resetting Data Iterator
INFO:root:Epoch[3] Time cost=80.793
INFO:root:Epoch[3] Validation-accuracy=0.980400
INFO:root:Epoch[4] Batch [200]	Speed: 715.04 samples/sec	Train-accuracy=0.979100
INFO:root:Epoch[4] Batch [400]	Speed: 689.19 samples/sec	Train-accuracy=0.983000
INFO:root:Epoch[4] Batch [600]	Speed: 734.42 samples/sec	Train-accuracy=0.983700
INFO:root:Epoch[4] Resetting Data Iterator
INFO:root:Epoch[4] Time cost=84.282
INFO:root:Epoch[4] Validation-accuracy=0.983300
INFO:root:Epoch[5] Batch [200]	Speed: 740.29 samples/sec	Train-accuracy=0.982300
INFO:root:Epoch[5] Batch [400]	Speed: 737.16 samples/sec	Train-accuracy=0.986000
INFO:root:Epoch[5] Batch [600]	Speed: 742.74 samples/sec	Train-accuracy=0.986600
INFO:root:Epoch[5] Resetting Data Iterator
INFO:root:Epoch[5] Time cost=81.135
INFO:root:Epoch[5] Validation-accuracy=0.983700
INFO:root:Epoch[6] Batch [200]	Speed: 753.02 samples/sec	Train-accuracy=0.985550
INFO:root:Epoch[6] Batch [400]	Speed: 748.51 samples/sec	Train-accuracy=0.988600
INFO:root:Epoch[6] Batch [600]	Speed: 746.93 samples/sec	Train-accuracy=0.988700
INFO:root:Epoch[6] Resetting Data Iterator
INFO:root:Epoch[6] Time cost=80.118
INFO:root:Epoch[6] Validation-accuracy=0.985100
INFO:root:Epoch[7] Batch [200]	Speed: 746.74 samples/sec	Train-accuracy=0.988000
INFO:root:Epoch[7] Batch [400]	Speed: 748.86 samples/sec	Train-accuracy=0.989950
INFO:root:Epoch[7] Batch [600]	Speed: 750.64 samples/sec	Train-accuracy=0.990250
INFO:root:Epoch[7] Resetting Data Iterator
INFO:root:Epoch[7] Time cost=80.202
INFO:root:Epoch[7] Validation-accuracy=0.985600
INFO:root:Epoch[8] Batch [200]	Speed: 749.68 samples/sec	Train-accuracy=0.989800
INFO:root:Epoch[8] Batch [400]	Speed: 745.11 samples/sec	Train-accuracy=0.991150
INFO:root:Epoch[8] Batch [600]	Speed: 735.06 samples/sec	Train-accuracy=0.991300
INFO:root:Epoch[8] Resetting Data Iterator
INFO:root:Epoch[8] Time cost=80.796
INFO:root:Epoch[8] Validation-accuracy=0.986600
INFO:root:Epoch[9] Batch [200]	Speed: 729.48 samples/sec	Train-accuracy=0.991300
INFO:root:Epoch[9] Batch [400]	Speed: 749.17 samples/sec	Train-accuracy=0.992200
INFO:root:Epoch[9] Batch [600]	Speed: 739.76 samples/sec	Train-accuracy=0.992300
INFO:root:Epoch[9] Resetting Data Iterator
INFO:root:Epoch[9] Time cost=81.223
INFO:root:Epoch[9] Validation-accuracy=0.987500

同じハイパーパラメータで LeNet は 98.7% 検証精度を達成することに注意してください、これは前の multilayer perceptron 精度 96.6% を改善します。

モデル・パラメータを $mod$ で書き直すので、新しい CNN モデルが分類精度を改善するか否かを先の数字認識ボックスで試すことができます。

Next Steps

以上

2017年2月
月	火	水	木	金	土	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28