PyTorch 0.4.1 examples (コード解説) : 画像分類 – Oxford 花 17 種 (VGG)

翻訳 : (株)クラスキャットセールスインフォメーション
作成日時 : 08/07/2018 (0.4.1)

* 本ページは、github 上の以下の pytorch/examples と keras/examples レポジトリのサンプル・コードを参考にしています：

* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。

VGG

PyTorch 0.4.1 の自作のサンプルをコードの簡単な解説とともに提供しています。
初級チュートリアル程度の知識は仮定しています。

MNIST / Fashion-MNIST / CIFAR-10 & CIFAR-100 について一通りウォークスルーした後、
17 Category Flower Dataset を題材にして AlexNet を試してみました。

今回は VGG で試します。

University of Oxford: 17 種フラワー・データセット

ここでは University of Oxford が提供している古典的な題材を利用します。
データセットの詳細は 17 Category Flower Dataset を参照してください。

VGG 11 モデル定義

VGG の実装は torchvision に含まれていますが、まずは自前で実装してみます。

※ VGG の詳細は Very Deep Convolutional Networks for Large-Scale Visual Recognition (by Karen Simonyan & Andrew Zisserman) を参照してください。

VGG は一般には VGG 16 と VGG 19 を指しますが、最初に見当をつけるためにもう少し単純化した VGG 11 を定義しました。
この場合、抽出器の層スタックは次のようなものになります :

* ブロック 1 *
Conv2d (out_channels=64, kernel_size=3)
MaxPool2d(kernel_size=2, stride=2)

* ブロック 2 *
Conv2d (out_channels=128, kernel_size=3)
MaxPool2d(kernel_size=2, stride=2)

* ブロック 3 *
Conv2d (out_channels=256, kernel_size=3)
Conv2d (out_channels=256, kernel_size=3)
MaxPool2d(kernel_size=2, stride=2)

* ブロック 4 *
Conv2d (out_channels=512, kernel_size=3)
Conv2d (out_channels=512, kernel_size=3)
MaxPool2d(kernel_size=2, stride=2)

* ブロック 5 *
Conv2d (out_channels=512, kernel_size=3)
Conv2d (out_channels=512, kernel_size=3)
MaxPool2d(kernel_size=2, stride=2)

そしてストレートにコーディングすれば以下のようになります。
ブロックに分けたのは便宜的なもので、nn.Sequential でまとめてかまいません :

class VGG11(nn.Module):
    def __init__(self, num_classes):
        super(VGG11, self).__init__()

        self.block1_output = nn.Sequential (
            nn.Conv2d(3, 64, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )

        self.block2_output = nn.Sequential (
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )

        self.block3_output = nn.Sequential (
            nn.Conv2d(128, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )

        self.block4_output = nn.Sequential (
            nn.Conv2d(256, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )

        self.block5_output = nn.Sequential (
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )

        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, num_classes),
        )


    def forward(self, x):
        x = self.block1_output(x)
        x = self.block2_output(x)
        x = self.block3_output(x)
        x = self.block4_output(x)
        x = self.block5_output(x)

        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x

VGG11 モデルを訓練 / 評価する

学習率 0.008, 0.006 と 0.004 で 50 エポック訓練してみます :

結果をまとめると :

0.008 : 69.29 %
0.006 : 81.89 %
0.004 : 81.89 %

5.0e-3 前後であれば問題なさそうです。

VGG 13 (with バッチ正規化層) モデル定義

VGG 11 が一応機能しているようなので、次に VGG 13 を定義します :

* ブロック 1 *
Conv2d (out_channels=64, kernel_size=3)
Conv2d (out_channels=64, kernel_size=3)
MaxPool2d(kernel_size=2, stride=2)

* ブロック 2 *
Conv2d (out_channels=128, kernel_size=3)
Conv2d (out_channels=128, kernel_size=3)
MaxPool2d(kernel_size=2, stride=2)

* ブロック 3 *
Conv2d (out_channels=256, kernel_size=3)
Conv2d (out_channels=256, kernel_size=3)
MaxPool2d(kernel_size=2, stride=2)

* ブロック 4 *
Conv2d (out_channels=512, kernel_size=3)
Conv2d (out_channels=512, kernel_size=3)
MaxPool2d(kernel_size=2, stride=2)

* ブロック 5 *
Conv2d (out_channels=512, kernel_size=3)
Conv2d (out_channels=512, kernel_size=3)
MaxPool2d(kernel_size=2, stride=2)

◆ そのコーディングですが、バッチ正規化層と初期化を付け加えました :

class VGG13(nn.Module):
    def __init__(self, num_classes):
        super(VGG13, self).__init__()

        self.block1_output = nn.Sequential (
            nn.Conv2d(3, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )

        self.block2_output = nn.Sequential (
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )

        self.block3_output = nn.Sequential (
            nn.Conv2d(128, 256, kernel_size=3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )

        self.block4_output = nn.Sequential (
            nn.Conv2d(256, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )

        self.block5_output = nn.Sequential (
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )

        self.classifier = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, 4096),
            nn.ReLU(True),
            nn.Dropout(),
            nn.Linear(4096, num_classes),
        )

        self._initialize_weights()


    def forward(self, x):
        x = self.block1_output(x)
        x = self.block2_output(x)
        x = self.block3_output(x)
        x = self.block4_output(x)
        x = self.block5_output(x)
        #print(x.size())

        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                #init.orthogonal_(m.weight.data, gain=init.calculate_gain('relu'))
                init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                if m.bias is not None:
                    init.constant_(m.bias, 0)
            elif isinstance(m, nn.BatchNorm2d):
                init.constant_(m.weight, 1)
                init.constant_(m.bias, 0)
            elif isinstance(m, nn.Linear):
                init.normal_(m.weight, 0, 0.01)
                init.constant_(m.bias, 0)

VGG 13 (with バッチ正規化層) モデルを訓練 / 評価する

まとめると :

0.001 : 72.44 %
0.0005 : 83.46 %
0.00025 : 81.10 %
0.0001 : 82.68 %

以上

月	火	水	木	金	土	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31