PyTorch Lightning 1.1 : research: CIFAR100 (GoogLeNet)

PyTorch Lightning 1.1: research : CIFAR100 (GoogLeNet)
作成 : (株)クラスキャット セールスインフォメーション
作成日時 : 02/25/2021 (1.1.x)

* 本ページは以下の CIFAR10 用リソースを参考に CIFAR100 で遂行した実験結果のレポートです:

* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。

 

無料セミナー実施中 クラスキャット主催 人工知能 & ビジネス Web セミナー

人工知能とビジネスをテーマにウェビナー (WEB セミナー) を定期的に開催しています。スケジュールは弊社 公式 Web サイト でご確認頂けます。
  • お住まいの地域に関係なく Web ブラウザからご参加頂けます。事前登録 が必要ですのでご注意ください。
  • Windows PC のブラウザからご参加が可能です。スマートデバイスもご利用可能です。
クラスキャットは人工知能・テレワークに関する各種サービスを提供しております :

人工知能研究開発支援 人工知能研修サービス テレワーク & オンライン授業を支援
PoC(概念実証)を失敗させないための支援 (本支援はセミナーに参加しアンケートに回答した方を対象としています。)

お問合せ : 本件に関するお問い合わせ先は下記までお願いいたします。

株式会社クラスキャット セールス・マーケティング本部 セールス・インフォメーション
E-Mail:sales-info@classcat.com ; WebSite: https://www.classcat.com/
Facebook: https://www.facebook.com/ClassCatJP/

 

research: CIFAR100 (GoogLeNet)

仕様

  • Total params: 6,402,564 (6.4M)
  • Trainable params: 6,402,564
  • Non-trainable params: 0

 
結果

  • GoogLeNet
  • {‘test_acc’: 0.7184000015258789, ‘test_loss’: 1.179699182510376}
  • 100 エポック ; Wall time: 2h 23min 33s
  • Tesla T4
  • ReduceLROnPlateau

 

CIFAR 100 DM

from typing import Any, Callable, Optional, Sequence, Union
 
from pl_bolts.datamodules.vision_datamodule import VisionDataModule
#from pl_bolts.datasets import TrialCIFAR10
#from pl_bolts.transforms.dataset_normalizations import cifar10_normalization
from pl_bolts.utils import _TORCHVISION_AVAILABLE
from pl_bolts.utils.warnings import warn_missing_pkg
 
if _TORCHVISION_AVAILABLE:
    from torchvision import transforms
    #from torchvision import transforms as transform_lib
    from torchvision.datasets import CIFAR100
else:  # pragma: no cover
    warn_missing_pkg('torchvision')
    CIFAR100 = None
def cifar100_normalization():
    if not _TORCHVISION_AVAILABLE:  # pragma: no cover
        raise ModuleNotFoundError(
            'You want to use `torchvision` which is not installed yet, install it with `pip install torchvision`.'
        )

    normalize = transforms.Normalize(
        mean=[x / 255.0 for x in [129.3, 124.1, 112.4]],
        std=[x / 255.0 for x in [68.2, 65.4, 70.4]],
        # cifar10
        #mean=[x / 255.0 for x in [125.3, 123.0, 113.9]],
        #std=[x / 255.0 for x in [63.0, 62.1, 66.7]],
    )
    return normalize
class CIFAR100DataModule(VisionDataModule):
    """
    .. figure:: https://3qeqpr26caki16dnhd19sv6by6v-wpengine.netdna-ssl.com/wp-content/uploads/2019/01/
        Plot-of-a-Subset-of-Images-from-the-CIFAR-10-Dataset.png
        :width: 400
        :alt: CIFAR-10
    Specs:
        - 10 classes (1 per class)
        - Each image is (3 x 32 x 32)
    Standard CIFAR10, train, val, test splits and transforms
    Transforms::
        mnist_transforms = transform_lib.Compose([
            transform_lib.ToTensor(),
            transforms.Normalize(
                mean=[x / 255.0 for x in [125.3, 123.0, 113.9]],
                std=[x / 255.0 for x in [63.0, 62.1, 66.7]]
            )
        ])
    Example::
        from pl_bolts.datamodules import CIFAR10DataModule
        dm = CIFAR10DataModule(PATH)
        model = LitModel()
        Trainer().fit(model, datamodule=dm)
    Or you can set your own transforms
    Example::
        dm.train_transforms = ...
        dm.test_transforms = ...
        dm.val_transforms  = ...
    """
    name = "cifar100"
    dataset_cls = CIFAR100
    dims = (3, 32, 32)

    def __init__(
        self,
        data_dir: Optional[str] = None,
        val_split: Union[int, float] = 0.2,
        num_workers: int = 16,
        normalize: bool = False,
        batch_size: int = 32,
        seed: int = 42,
        shuffle: bool = False,
        pin_memory: bool = False,
        drop_last: bool = False,
        *args: Any,
        **kwargs: Any,
    ) -> None:
        """
        Args:
            data_dir: Where to save/load the data
            val_split: Percent (float) or number (int) of samples to use for the validation split
            num_workers: How many workers to use for loading data
            normalize: If true applies image normalize
            batch_size: How many samples per batch to load
            seed: Random seed to be used for train/val/test splits
            shuffle: If true shuffles the train data every epoch
            pin_memory: If true, the data loader will copy Tensors into CUDA pinned memory before
                        returning them
            drop_last: If true drops the last incomplete batch
        """
        super().__init__(  # type: ignore[misc]
            data_dir=data_dir,
            val_split=val_split,
            num_workers=num_workers,
            normalize=normalize,
            batch_size=batch_size,
            seed=seed,
            shuffle=shuffle,
            pin_memory=pin_memory,
            drop_last=drop_last,
            *args,
            **kwargs,
        )

    @property
    def num_samples(self) -> int:
        train_len, _ = self._get_splits(len_dataset=50_000)
        return train_len

    @property
    def num_classes(self) -> int:
        """
        Return:
            10
        """
        return 100

    def default_transforms(self) -> Callable:
        if self.normalize:
            cf100_transforms = transforms.Compose([transform_lib.ToTensor(), cifar100_normalization()])
        else:
            cf100_transforms = transforms.Compose([transform_lib.ToTensor()])

        return cf100_transforms

 

モデル

import torch
import torch.nn as nn
class Inception(nn.Module):
    def __init__(self, input_channels, n1x1, n3x3_reduce, n3x3, n5x5_reduce, n5x5, pool_proj):
        super().__init__()

        #1x1conv branch
        self.b1 = nn.Sequential(
            nn.Conv2d(input_channels, n1x1, kernel_size=1),
            nn.BatchNorm2d(n1x1),
            nn.ReLU(inplace=True)
        )

        #1x1conv -> 3x3conv branch
        self.b2 = nn.Sequential(
            nn.Conv2d(input_channels, n3x3_reduce, kernel_size=1),
            nn.BatchNorm2d(n3x3_reduce),
            nn.ReLU(inplace=True),
            nn.Conv2d(n3x3_reduce, n3x3, kernel_size=3, padding=1),
            nn.BatchNorm2d(n3x3),
            nn.ReLU(inplace=True)
        )

        #1x1conv -> 5x5conv branch
        #we use 2 3x3 conv filters stacked instead
        #of 1 5x5 filters to obtain the same receptive
        #field with fewer parameters
        self.b3 = nn.Sequential(
            nn.Conv2d(input_channels, n5x5_reduce, kernel_size=1),
            nn.BatchNorm2d(n5x5_reduce),
            nn.ReLU(inplace=True),
            nn.Conv2d(n5x5_reduce, n5x5, kernel_size=3, padding=1),
            nn.BatchNorm2d(n5x5, n5x5),
            nn.ReLU(inplace=True),
            nn.Conv2d(n5x5, n5x5, kernel_size=3, padding=1),
            nn.BatchNorm2d(n5x5),
            nn.ReLU(inplace=True)
        )

        #3x3pooling -> 1x1conv
        #same conv
        self.b4 = nn.Sequential(
            nn.MaxPool2d(3, stride=1, padding=1),
            nn.Conv2d(input_channels, pool_proj, kernel_size=1),
            nn.BatchNorm2d(pool_proj),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        return torch.cat([self.b1(x), self.b2(x), self.b3(x), self.b4(x)], dim=1)


class GoogleNet(nn.Module):

    def __init__(self, num_class=100):
        super().__init__()
        self.prelayer = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, padding=1, bias=False),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, kernel_size=3, padding=1, bias=False),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 192, kernel_size=3, padding=1, bias=False),
            nn.BatchNorm2d(192),
            nn.ReLU(inplace=True),
        )

        #although we only use 1 conv layer as prelayer,
        #we still use name a3, b3.......
        self.a3 = Inception(192, 64, 96, 128, 16, 32, 32)
        self.b3 = Inception(256, 128, 128, 192, 32, 96, 64)

        ##"""In general, an Inception network is a network consisting of
        ##modules of the above type stacked upon each other, with occasional
        ##max-pooling layers with stride 2 to halve the resolution of the
        ##grid"""
        self.maxpool = nn.MaxPool2d(3, stride=2, padding=1)

        self.a4 = Inception(480, 192, 96, 208, 16, 48, 64)
        self.b4 = Inception(512, 160, 112, 224, 24, 64, 64)
        self.c4 = Inception(512, 128, 128, 256, 24, 64, 64)
        self.d4 = Inception(512, 112, 144, 288, 32, 64, 64)
        self.e4 = Inception(528, 256, 160, 320, 32, 128, 128)

        self.a5 = Inception(832, 256, 160, 320, 32, 128, 128)
        self.b5 = Inception(832, 384, 192, 384, 48, 128, 128)

        #input feature size: 8*8*1024
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.dropout = nn.Dropout2d(p=0.4)
        self.linear = nn.Linear(1024, num_class)

    def forward(self, x):
        x = self.prelayer(x)
        x = self.maxpool(x)
        x = self.a3(x)
        x = self.b3(x)

        x = self.maxpool(x)

        x = self.a4(x)
        x = self.b4(x)
        x = self.c4(x)
        x = self.d4(x)
        x = self.e4(x)

        x = self.maxpool(x)

        x = self.a5(x)
        x = self.b5(x)

        #"""It was found that a move from fully connected layers to
        #average pooling improved the top-1 accuracy by about 0.6%,
        #however the use of dropout remained essential even after
        #removing the fully connected layers."""
        x = self.avgpool(x)
        x = self.dropout(x)
        x = x.view(x.size()[0], -1)
        x = self.linear(x)

        return x

def googlenet():
    return GoogleNet()
net = googlenet()
print(net)
y = net(torch.randn(1, 3, 32, 32))
print(y.size())
GoogleNet(
  (prelayer): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU(inplace=True)
    (6): Conv2d(64, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (7): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (8): ReLU(inplace=True)
  )
  (a3): Inception(
    (b1): Sequential(
      (0): Conv2d(192, 64, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (b2): Sequential(
      (0): Conv2d(192, 96, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(96, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (4): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
    )
    (b3): Sequential(
      (0): Conv2d(192, 16, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (4): BatchNorm2d(32, eps=32, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
      (6): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (7): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (8): ReLU(inplace=True)
    )
    (b4): Sequential(
      (0): MaxPool2d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)
      (1): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1))
      (2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): ReLU(inplace=True)
    )
  )
  (b3): Inception(
    (b1): Sequential(
      (0): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (b2): Sequential(
      (0): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(128, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (4): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
    )
    (b3): Sequential(
      (0): Conv2d(256, 32, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(32, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (4): BatchNorm2d(96, eps=96, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
      (6): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (7): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (8): ReLU(inplace=True)
    )
    (b4): Sequential(
      (0): MaxPool2d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)
      (1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1))
      (2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): ReLU(inplace=True)
    )
  )
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (a4): Inception(
    (b1): Sequential(
      (0): Conv2d(480, 192, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (b2): Sequential(
      (0): Conv2d(480, 96, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(96, 208, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (4): BatchNorm2d(208, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
    )
    (b3): Sequential(
      (0): Conv2d(480, 16, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(16, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (4): BatchNorm2d(48, eps=48, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
      (6): Conv2d(48, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (7): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (8): ReLU(inplace=True)
    )
    (b4): Sequential(
      (0): MaxPool2d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)
      (1): Conv2d(480, 64, kernel_size=(1, 1), stride=(1, 1))
      (2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): ReLU(inplace=True)
    )
  )
  (b4): Inception(
    (b1): Sequential(
      (0): Conv2d(512, 160, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (b2): Sequential(
      (0): Conv2d(512, 112, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(112, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(112, 224, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (4): BatchNorm2d(224, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
    )
    (b3): Sequential(
      (0): Conv2d(512, 24, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(24, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (4): BatchNorm2d(64, eps=64, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
      (6): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (7): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (8): ReLU(inplace=True)
    )
    (b4): Sequential(
      (0): MaxPool2d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)
      (1): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1))
      (2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): ReLU(inplace=True)
    )
  )
  (c4): Inception(
    (b1): Sequential(
      (0): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (b2): Sequential(
      (0): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
    )
    (b3): Sequential(
      (0): Conv2d(512, 24, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(24, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (4): BatchNorm2d(64, eps=64, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
      (6): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (7): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (8): ReLU(inplace=True)
    )
    (b4): Sequential(
      (0): MaxPool2d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)
      (1): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1))
      (2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): ReLU(inplace=True)
    )
  )
  (d4): Inception(
    (b1): Sequential(
      (0): Conv2d(512, 112, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(112, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (b2): Sequential(
      (0): Conv2d(512, 144, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(144, 288, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (4): BatchNorm2d(288, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
    )
    (b3): Sequential(
      (0): Conv2d(512, 32, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (4): BatchNorm2d(64, eps=64, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
      (6): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (7): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (8): ReLU(inplace=True)
    )
    (b4): Sequential(
      (0): MaxPool2d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)
      (1): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1))
      (2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): ReLU(inplace=True)
    )
  )
  (e4): Inception(
    (b1): Sequential(
      (0): Conv2d(528, 256, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (b2): Sequential(
      (0): Conv2d(528, 160, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(160, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (4): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
    )
    (b3): Sequential(
      (0): Conv2d(528, 32, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(32, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (4): BatchNorm2d(128, eps=128, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
      (6): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (7): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (8): ReLU(inplace=True)
    )
    (b4): Sequential(
      (0): MaxPool2d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)
      (1): Conv2d(528, 128, kernel_size=(1, 1), stride=(1, 1))
      (2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): ReLU(inplace=True)
    )
  )
  (a5): Inception(
    (b1): Sequential(
      (0): Conv2d(832, 256, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (b2): Sequential(
      (0): Conv2d(832, 160, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(160, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (4): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
    )
    (b3): Sequential(
      (0): Conv2d(832, 32, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(32, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (4): BatchNorm2d(128, eps=128, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
      (6): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (7): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (8): ReLU(inplace=True)
    )
    (b4): Sequential(
      (0): MaxPool2d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)
      (1): Conv2d(832, 128, kernel_size=(1, 1), stride=(1, 1))
      (2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): ReLU(inplace=True)
    )
  )
  (b5): Inception(
    (b1): Sequential(
      (0): Conv2d(832, 384, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (b2): Sequential(
      (0): Conv2d(832, 192, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (4): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
    )
    (b3): Sequential(
      (0): Conv2d(832, 48, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
      (3): Conv2d(48, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (4): BatchNorm2d(128, eps=128, momentum=0.1, affine=True, track_running_stats=True)
      (5): ReLU(inplace=True)
      (6): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (7): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (8): ReLU(inplace=True)
    )
    (b4): Sequential(
      (0): MaxPool2d(kernel_size=3, stride=1, padding=1, dilation=1, ceil_mode=False)
      (1): Conv2d(832, 128, kernel_size=(1, 1), stride=(1, 1))
      (2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (3): ReLU(inplace=True)
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (dropout): Dropout2d(p=0.4, inplace=False)
  (linear): Linear(in_features=1024, out_features=100, bias=True)
)
torch.Size([1, 100])
from torchsummary import summary
 
summary(googlenet().to('cuda'), (3, 32, 32))
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [-1, 64, 32, 32]           1,728
       BatchNorm2d-2           [-1, 64, 32, 32]             128
              ReLU-3           [-1, 64, 32, 32]               0
            Conv2d-4           [-1, 64, 32, 32]          36,864
       BatchNorm2d-5           [-1, 64, 32, 32]             128
              ReLU-6           [-1, 64, 32, 32]               0
            Conv2d-7          [-1, 192, 32, 32]         110,592
       BatchNorm2d-8          [-1, 192, 32, 32]             384
              ReLU-9          [-1, 192, 32, 32]               0
        MaxPool2d-10          [-1, 192, 16, 16]               0
           Conv2d-11           [-1, 64, 16, 16]          12,352
      BatchNorm2d-12           [-1, 64, 16, 16]             128
             ReLU-13           [-1, 64, 16, 16]               0
           Conv2d-14           [-1, 96, 16, 16]          18,528
      BatchNorm2d-15           [-1, 96, 16, 16]             192
             ReLU-16           [-1, 96, 16, 16]               0
           Conv2d-17          [-1, 128, 16, 16]         110,720
      BatchNorm2d-18          [-1, 128, 16, 16]             256
             ReLU-19          [-1, 128, 16, 16]               0
           Conv2d-20           [-1, 16, 16, 16]           3,088
      BatchNorm2d-21           [-1, 16, 16, 16]              32
             ReLU-22           [-1, 16, 16, 16]               0
           Conv2d-23           [-1, 32, 16, 16]           4,640
      BatchNorm2d-24           [-1, 32, 16, 16]              64
             ReLU-25           [-1, 32, 16, 16]               0
           Conv2d-26           [-1, 32, 16, 16]           9,248
      BatchNorm2d-27           [-1, 32, 16, 16]              64
             ReLU-28           [-1, 32, 16, 16]               0
        MaxPool2d-29          [-1, 192, 16, 16]               0
           Conv2d-30           [-1, 32, 16, 16]           6,176
      BatchNorm2d-31           [-1, 32, 16, 16]              64
             ReLU-32           [-1, 32, 16, 16]               0
        Inception-33          [-1, 256, 16, 16]               0
           Conv2d-34          [-1, 128, 16, 16]          32,896
      BatchNorm2d-35          [-1, 128, 16, 16]             256
             ReLU-36          [-1, 128, 16, 16]               0
           Conv2d-37          [-1, 128, 16, 16]          32,896
      BatchNorm2d-38          [-1, 128, 16, 16]             256
             ReLU-39          [-1, 128, 16, 16]               0
           Conv2d-40          [-1, 192, 16, 16]         221,376
      BatchNorm2d-41          [-1, 192, 16, 16]             384
             ReLU-42          [-1, 192, 16, 16]               0
           Conv2d-43           [-1, 32, 16, 16]           8,224
      BatchNorm2d-44           [-1, 32, 16, 16]              64
             ReLU-45           [-1, 32, 16, 16]               0
           Conv2d-46           [-1, 96, 16, 16]          27,744
      BatchNorm2d-47           [-1, 96, 16, 16]             192
             ReLU-48           [-1, 96, 16, 16]               0
           Conv2d-49           [-1, 96, 16, 16]          83,040
      BatchNorm2d-50           [-1, 96, 16, 16]             192
             ReLU-51           [-1, 96, 16, 16]               0
        MaxPool2d-52          [-1, 256, 16, 16]               0
           Conv2d-53           [-1, 64, 16, 16]          16,448
      BatchNorm2d-54           [-1, 64, 16, 16]             128
             ReLU-55           [-1, 64, 16, 16]               0
        Inception-56          [-1, 480, 16, 16]               0
        MaxPool2d-57            [-1, 480, 8, 8]               0
           Conv2d-58            [-1, 192, 8, 8]          92,352
      BatchNorm2d-59            [-1, 192, 8, 8]             384
             ReLU-60            [-1, 192, 8, 8]               0
           Conv2d-61             [-1, 96, 8, 8]          46,176
      BatchNorm2d-62             [-1, 96, 8, 8]             192
             ReLU-63             [-1, 96, 8, 8]               0
           Conv2d-64            [-1, 208, 8, 8]         179,920
      BatchNorm2d-65            [-1, 208, 8, 8]             416
             ReLU-66            [-1, 208, 8, 8]               0
           Conv2d-67             [-1, 16, 8, 8]           7,696
      BatchNorm2d-68             [-1, 16, 8, 8]              32
             ReLU-69             [-1, 16, 8, 8]               0
           Conv2d-70             [-1, 48, 8, 8]           6,960
      BatchNorm2d-71             [-1, 48, 8, 8]              96
             ReLU-72             [-1, 48, 8, 8]               0
           Conv2d-73             [-1, 48, 8, 8]          20,784
      BatchNorm2d-74             [-1, 48, 8, 8]              96
             ReLU-75             [-1, 48, 8, 8]               0
        MaxPool2d-76            [-1, 480, 8, 8]               0
           Conv2d-77             [-1, 64, 8, 8]          30,784
      BatchNorm2d-78             [-1, 64, 8, 8]             128
             ReLU-79             [-1, 64, 8, 8]               0
        Inception-80            [-1, 512, 8, 8]               0
           Conv2d-81            [-1, 160, 8, 8]          82,080
      BatchNorm2d-82            [-1, 160, 8, 8]             320
             ReLU-83            [-1, 160, 8, 8]               0
           Conv2d-84            [-1, 112, 8, 8]          57,456
      BatchNorm2d-85            [-1, 112, 8, 8]             224
             ReLU-86            [-1, 112, 8, 8]               0
           Conv2d-87            [-1, 224, 8, 8]         226,016
      BatchNorm2d-88            [-1, 224, 8, 8]             448
             ReLU-89            [-1, 224, 8, 8]               0
           Conv2d-90             [-1, 24, 8, 8]          12,312
      BatchNorm2d-91             [-1, 24, 8, 8]              48
             ReLU-92             [-1, 24, 8, 8]               0
           Conv2d-93             [-1, 64, 8, 8]          13,888
      BatchNorm2d-94             [-1, 64, 8, 8]             128
             ReLU-95             [-1, 64, 8, 8]               0
           Conv2d-96             [-1, 64, 8, 8]          36,928
      BatchNorm2d-97             [-1, 64, 8, 8]             128
             ReLU-98             [-1, 64, 8, 8]               0
        MaxPool2d-99            [-1, 512, 8, 8]               0
          Conv2d-100             [-1, 64, 8, 8]          32,832
     BatchNorm2d-101             [-1, 64, 8, 8]             128
            ReLU-102             [-1, 64, 8, 8]               0
       Inception-103            [-1, 512, 8, 8]               0
          Conv2d-104            [-1, 128, 8, 8]          65,664
     BatchNorm2d-105            [-1, 128, 8, 8]             256
            ReLU-106            [-1, 128, 8, 8]               0
          Conv2d-107            [-1, 128, 8, 8]          65,664
     BatchNorm2d-108            [-1, 128, 8, 8]             256
            ReLU-109            [-1, 128, 8, 8]               0
          Conv2d-110            [-1, 256, 8, 8]         295,168
     BatchNorm2d-111            [-1, 256, 8, 8]             512
            ReLU-112            [-1, 256, 8, 8]               0
          Conv2d-113             [-1, 24, 8, 8]          12,312
     BatchNorm2d-114             [-1, 24, 8, 8]              48
            ReLU-115             [-1, 24, 8, 8]               0
          Conv2d-116             [-1, 64, 8, 8]          13,888
     BatchNorm2d-117             [-1, 64, 8, 8]             128
            ReLU-118             [-1, 64, 8, 8]               0
          Conv2d-119             [-1, 64, 8, 8]          36,928
     BatchNorm2d-120             [-1, 64, 8, 8]             128
            ReLU-121             [-1, 64, 8, 8]               0
       MaxPool2d-122            [-1, 512, 8, 8]               0
          Conv2d-123             [-1, 64, 8, 8]          32,832
     BatchNorm2d-124             [-1, 64, 8, 8]             128
            ReLU-125             [-1, 64, 8, 8]               0
       Inception-126            [-1, 512, 8, 8]               0
          Conv2d-127            [-1, 112, 8, 8]          57,456
     BatchNorm2d-128            [-1, 112, 8, 8]             224
            ReLU-129            [-1, 112, 8, 8]               0
          Conv2d-130            [-1, 144, 8, 8]          73,872
     BatchNorm2d-131            [-1, 144, 8, 8]             288
            ReLU-132            [-1, 144, 8, 8]               0
          Conv2d-133            [-1, 288, 8, 8]         373,536
     BatchNorm2d-134            [-1, 288, 8, 8]             576
            ReLU-135            [-1, 288, 8, 8]               0
          Conv2d-136             [-1, 32, 8, 8]          16,416
     BatchNorm2d-137             [-1, 32, 8, 8]              64
            ReLU-138             [-1, 32, 8, 8]               0
          Conv2d-139             [-1, 64, 8, 8]          18,496
     BatchNorm2d-140             [-1, 64, 8, 8]             128
            ReLU-141             [-1, 64, 8, 8]               0
          Conv2d-142             [-1, 64, 8, 8]          36,928
     BatchNorm2d-143             [-1, 64, 8, 8]             128
            ReLU-144             [-1, 64, 8, 8]               0
       MaxPool2d-145            [-1, 512, 8, 8]               0
          Conv2d-146             [-1, 64, 8, 8]          32,832
     BatchNorm2d-147             [-1, 64, 8, 8]             128
            ReLU-148             [-1, 64, 8, 8]               0
       Inception-149            [-1, 528, 8, 8]               0
          Conv2d-150            [-1, 256, 8, 8]         135,424
     BatchNorm2d-151            [-1, 256, 8, 8]             512
            ReLU-152            [-1, 256, 8, 8]               0
          Conv2d-153            [-1, 160, 8, 8]          84,640
     BatchNorm2d-154            [-1, 160, 8, 8]             320
            ReLU-155            [-1, 160, 8, 8]               0
          Conv2d-156            [-1, 320, 8, 8]         461,120
     BatchNorm2d-157            [-1, 320, 8, 8]             640
            ReLU-158            [-1, 320, 8, 8]               0
          Conv2d-159             [-1, 32, 8, 8]          16,928
     BatchNorm2d-160             [-1, 32, 8, 8]              64
            ReLU-161             [-1, 32, 8, 8]               0
          Conv2d-162            [-1, 128, 8, 8]          36,992
     BatchNorm2d-163            [-1, 128, 8, 8]             256
            ReLU-164            [-1, 128, 8, 8]               0
          Conv2d-165            [-1, 128, 8, 8]         147,584
     BatchNorm2d-166            [-1, 128, 8, 8]             256
            ReLU-167            [-1, 128, 8, 8]               0
       MaxPool2d-168            [-1, 528, 8, 8]               0
          Conv2d-169            [-1, 128, 8, 8]          67,712
     BatchNorm2d-170            [-1, 128, 8, 8]             256
            ReLU-171            [-1, 128, 8, 8]               0
       Inception-172            [-1, 832, 8, 8]               0
       MaxPool2d-173            [-1, 832, 4, 4]               0
          Conv2d-174            [-1, 256, 4, 4]         213,248
     BatchNorm2d-175            [-1, 256, 4, 4]             512
            ReLU-176            [-1, 256, 4, 4]               0
          Conv2d-177            [-1, 160, 4, 4]         133,280
     BatchNorm2d-178            [-1, 160, 4, 4]             320
            ReLU-179            [-1, 160, 4, 4]               0
          Conv2d-180            [-1, 320, 4, 4]         461,120
     BatchNorm2d-181            [-1, 320, 4, 4]             640
            ReLU-182            [-1, 320, 4, 4]               0
          Conv2d-183             [-1, 32, 4, 4]          26,656
     BatchNorm2d-184             [-1, 32, 4, 4]              64
            ReLU-185             [-1, 32, 4, 4]               0
          Conv2d-186            [-1, 128, 4, 4]          36,992
     BatchNorm2d-187            [-1, 128, 4, 4]             256
            ReLU-188            [-1, 128, 4, 4]               0
          Conv2d-189            [-1, 128, 4, 4]         147,584
     BatchNorm2d-190            [-1, 128, 4, 4]             256
            ReLU-191            [-1, 128, 4, 4]               0
       MaxPool2d-192            [-1, 832, 4, 4]               0
          Conv2d-193            [-1, 128, 4, 4]         106,624
     BatchNorm2d-194            [-1, 128, 4, 4]             256
            ReLU-195            [-1, 128, 4, 4]               0
       Inception-196            [-1, 832, 4, 4]               0
          Conv2d-197            [-1, 384, 4, 4]         319,872
     BatchNorm2d-198            [-1, 384, 4, 4]             768
            ReLU-199            [-1, 384, 4, 4]               0
          Conv2d-200            [-1, 192, 4, 4]         159,936
     BatchNorm2d-201            [-1, 192, 4, 4]             384
            ReLU-202            [-1, 192, 4, 4]               0
          Conv2d-203            [-1, 384, 4, 4]         663,936
     BatchNorm2d-204            [-1, 384, 4, 4]             768
            ReLU-205            [-1, 384, 4, 4]               0
          Conv2d-206             [-1, 48, 4, 4]          39,984
     BatchNorm2d-207             [-1, 48, 4, 4]              96
            ReLU-208             [-1, 48, 4, 4]               0
          Conv2d-209            [-1, 128, 4, 4]          55,424
     BatchNorm2d-210            [-1, 128, 4, 4]             256
            ReLU-211            [-1, 128, 4, 4]               0
          Conv2d-212            [-1, 128, 4, 4]         147,584
     BatchNorm2d-213            [-1, 128, 4, 4]             256
            ReLU-214            [-1, 128, 4, 4]               0
       MaxPool2d-215            [-1, 832, 4, 4]               0
          Conv2d-216            [-1, 128, 4, 4]         106,624
     BatchNorm2d-217            [-1, 128, 4, 4]             256
            ReLU-218            [-1, 128, 4, 4]               0
       Inception-219           [-1, 1024, 4, 4]               0
AdaptiveAvgPool2d-220           [-1, 1024, 1, 1]               0
       Dropout2d-221           [-1, 1024, 1, 1]               0
          Linear-222                  [-1, 100]         102,500
================================================================
Total params: 6,402,564
Trainable params: 6,402,564
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.01
Forward/backward pass size (MB): 27.12
Params size (MB): 24.42
Estimated Total Size (MB): 51.56
----------------------------------------------------------------

 

Lightning モジュール

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.optim.lr_scheduler import OneCycleLR, CyclicLR, ExponentialLR, CosineAnnealingLR, ReduceLROnPlateau
from torch.optim.swa_utils import AveragedModel, update_bn
import torchvision
 
import pytorch_lightning as pl
from pytorch_lightning.callbacks import LearningRateMonitor, GPUStatsMonitor, EarlyStopping
from pytorch_lightning.metrics.functional import accuracy
#from pl_bolts.datamodules import CIFAR10DataModule
#from pl_bolts.transforms.dataset_normalizations import cifar10_normalization
pl.seed_everything(7);
batch_size = 50
 
train_transforms = torchvision.transforms.Compose([
    torchvision.transforms.RandomCrop(32, padding=4),
    torchvision.transforms.RandomHorizontalFlip(),
    torchvision.transforms.ToTensor(),
    cifar100_normalization(),
])
 
test_transforms = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor(),
    cifar100_normalization(),
])
 
cifar100_dm = CIFAR100DataModule(
    batch_size=batch_size,
    num_workers=8,
    train_transforms=train_transforms,
    test_transforms=test_transforms,
    val_transforms=test_transforms,
)
class LitCifar100(pl.LightningModule):
    def __init__(self, lr=0.05, factor=0.8):
        super().__init__()
  
        self.save_hyperparameters()
        self.model = googlenet()
 
    def forward(self, x):
        out = self.model(x)
        return F.log_softmax(out, dim=1)
  
    def training_step(self, batch, batch_idx):
        x, y = batch
        logits = F.log_softmax(self.model(x), dim=1)
        loss = F.nll_loss(logits, y)
        self.log('train_loss', loss)
        return loss
  
    def evaluate(self, batch, stage=None):
        x, y = batch
        logits = self(x)
        loss = F.nll_loss(logits, y)
        preds = torch.argmax(logits, dim=1)
        acc = accuracy(preds, y)
  
        if stage:
            self.log(f'{stage}_loss', loss, prog_bar=True)
            self.log(f'{stage}_acc', acc, prog_bar=True)
  
    def validation_step(self, batch, batch_idx):
        self.evaluate(batch, 'val')
  
    def test_step(self, batch, batch_idx):
        self.evaluate(batch, 'test')
  
    def configure_optimizers(self):
        optimizer = torch.optim.SGD(self.parameters(), lr=self.hparams.lr, momentum=0.9, weight_decay=5e-4)
 
        return {
          'optimizer': optimizer,
          'lr_scheduler': ReduceLROnPlateau(optimizer, 'max', patience=5, factor=self.hparams.factor, verbose=True, threshold=0.0001, threshold_mode='abs', cooldown=1, min_lr=1e-5),
          'monitor': 'val_acc'
        }

 

訓練 / 評価

%%time
 
model = LitCifar100(lr=0.05, factor=0.5)
model.datamodule = cifar100_dm
  
trainer = pl.Trainer(
    gpus=1,
    max_epochs=100,
    progress_bar_refresh_rate=100,
    logger=pl.loggers.TensorBoardLogger('tblogs/', name='googlenet'),
    callbacks=[LearningRateMonitor(logging_interval='step')],
)
  
trainer.fit(model, cifar100_dm)
trainer.test(model, datamodule=cifar100_dm);
  | Name  | Type        | Params
  | Name  | Type      | Params
------------------------------------
0 | model | GoogleNet | 6.4 M 
------------------------------------
6.4 M     Trainable params
0         Non-trainable params
6.4 M     Total params
25.610    Total estimated model params size (MB)
(...)
Epoch    27: reducing learning rate of group 0 to 2.5000e-02.
Epoch    34: reducing learning rate of group 0 to 1.2500e-02.
Epoch    41: reducing learning rate of group 0 to 6.2500e-03.
Epoch    49: reducing learning rate of group 0 to 3.1250e-03.
Epoch    58: reducing learning rate of group 0 to 1.5625e-03.
Epoch    77: reducing learning rate of group 0 to 7.8125e-04.
Epoch    98: reducing learning rate of group 0 to 3.9063e-04.
(...)
--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': 0.7184000015258789, 'test_loss': 1.179699182510376}
--------------------------------------------------------------------------------
CPU times: user 1h 52min 54s, sys: 26min 55s, total: 2h 19min 50s
Wall time: 2h 23min 33s
 

以上