PyTorch Lightning 1.1 : research: CIFAR100 (RegNet)

PyTorch Lightning 1.1: research : CIFAR100 (RegNet)
作成 : (株)クラスキャット セールスインフォメーション
作成日時 : 02/25/2021 (1.1.x)

* 本ページは以下の CIFAR10 用リソースを参考に CIFAR100 で遂行した実験結果のレポートです:

* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。

 

無料セミナー実施中 クラスキャット主催 人工知能 & ビジネス Web セミナー

人工知能とビジネスをテーマにウェビナー (WEB セミナー) を定期的に開催しています。スケジュールは弊社 公式 Web サイト でご確認頂けます。
  • お住まいの地域に関係なく Web ブラウザからご参加頂けます。事前登録 が必要ですのでご注意ください。
  • Windows PC のブラウザからご参加が可能です。スマートデバイスもご利用可能です。
クラスキャットは人工知能・テレワークに関する各種サービスを提供しております :

人工知能研究開発支援 人工知能研修サービス テレワーク & オンライン授業を支援
PoC(概念実証)を失敗させないための支援 (本支援はセミナーに参加しアンケートに回答した方を対象としています。)

お問合せ : 本件に関するお問い合わせ先は下記までお願いいたします。

株式会社クラスキャット セールス・マーケティング本部 セールス・インフォメーション
E-Mail:sales-info@classcat.com ; WebSite: https://www.classcat.com/
Facebook: https://www.facebook.com/ClassCatJP/

 

research: CIFAR100 (RegNet)

RegNetX_200MF

仕様

  • Total params: 2,355,156 (2.4M)
  • Trainable params: 2,355,156
  • Non-trainable params: 0

 
結果

  • RegNetX_200MF
  • {‘test_acc’: 0.7339000105857849, ‘test_loss’: 1.152081847190857}
  • 100 エポック ; Wall time: 2h 8min 30s
  • Tesla T4
  • ReduceLROnPlateau

 

RegNetX_400MF

仕様

  • Total params: 4,813,988 (4.8M)
  • Trainable params: 4,813,988
  • Non-trainable params: 0

 
結果

  • RegNetX_400MF
  • {‘test_acc’: 0.732200026512146, ‘test_loss’: 1.1259433031082153}
  • 100 エポック ; Wall time: Wall time: 3h 19min 18s
  • Tesla T4
  • ReduceLROnPlateau

 

RegNetY_400MF

仕様

  • Total params: 5,749,012 (5.7M)
  • Trainable params: 5,749,012
  • Non-trainable params: 0

 
結果

  • RegNetY_400MF
  • {‘test_acc’: 0.7128999829292297, ‘test_loss’: 1.2059353590011597}
  • 100 エポック ; Wall time: Wall time: 3h 47min 37s
  • Tesla T4
  • ReduceLROnPlateau

 

CIFAR 100 DM

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from typing import Any, Callable, Optional, Sequence, Union
  
from pl_bolts.datamodules.vision_datamodule import VisionDataModule
#from pl_bolts.datasets import TrialCIFAR10
#from pl_bolts.transforms.dataset_normalizations import cifar10_normalization
from pl_bolts.utils import _TORCHVISION_AVAILABLE
from pl_bolts.utils.warnings import warn_missing_pkg
  
if _TORCHVISION_AVAILABLE:
    from torchvision import transforms
    #from torchvision import transforms as transform_lib
    from torchvision.datasets import CIFAR100
else# pragma: no cover
    warn_missing_pkg('torchvision')
    CIFAR100 = None
1
2
3
4
5
6
7
8
9
10
11
12
13
14
def cifar100_normalization():
    if not _TORCHVISION_AVAILABLE:  # pragma: no cover
        raise ModuleNotFoundError(
            'You want to use `torchvision` which is not installed yet, install it with `pip install torchvision`.'
        )
 
    normalize = transforms.Normalize(
        mean=[x / 255.0 for x in [129.3, 124.1, 112.4]],
        std=[x / 255.0 for x in [68.2, 65.4, 70.4]],
        # cifar10
        #mean=[x / 255.0 for x in [125.3, 123.0, 113.9]],
        #std=[x / 255.0 for x in [63.0, 62.1, 66.7]],
    )
    return normalize
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
class CIFAR100DataModule(VisionDataModule):
    """
        Plot-of-a-Subset-of-Images-from-the-CIFAR-10-Dataset.png
        :width: 400
        :alt: CIFAR-10
    Specs:
        - 10 classes (1 per class)
        - Each image is (3 x 32 x 32)
    Standard CIFAR10, train, val, test splits and transforms
    Transforms::
        mnist_transforms = transform_lib.Compose([
            transform_lib.ToTensor(),
            transforms.Normalize(
                mean=[x / 255.0 for x in [125.3, 123.0, 113.9]],
                std=[x / 255.0 for x in [63.0, 62.1, 66.7]]
            )
        ])
    Example::
        from pl_bolts.datamodules import CIFAR10DataModule
        dm = CIFAR10DataModule(PATH)
        model = LitModel()
        Trainer().fit(model, datamodule=dm)
    Or you can set your own transforms
    Example::
        dm.train_transforms = ...
        dm.test_transforms = ...
        dm.val_transforms  = ...
    """
    name = "cifar100"
    dataset_cls = CIFAR100
    dims = (3, 32, 32)
 
    def __init__(
        self,
        data_dir: Optional[str] = None,
        val_split: Union[int, float] = 0.2,
        num_workers: int = 16,
        normalize: bool = False,
        batch_size: int = 32,
        seed: int = 42,
        shuffle: bool = False,
        pin_memory: bool = False,
        drop_last: bool = False,
        *args: Any,
        **kwargs: Any,
    ) -> None:
        """
        Args:
            data_dir: Where to save/load the data
            val_split: Percent (float) or number (int) of samples to use for the validation split
            num_workers: How many workers to use for loading data
            normalize: If true applies image normalize
            batch_size: How many samples per batch to load
            seed: Random seed to be used for train/val/test splits
            shuffle: If true shuffles the train data every epoch
            pin_memory: If true, the data loader will copy Tensors into CUDA pinned memory before
                        returning them
            drop_last: If true drops the last incomplete batch
        """
        super().__init__(  # type: ignore[misc]
            data_dir=data_dir,
            val_split=val_split,
            num_workers=num_workers,
            normalize=normalize,
            batch_size=batch_size,
            seed=seed,
            shuffle=shuffle,
            pin_memory=pin_memory,
            drop_last=drop_last,
            *args,
            **kwargs,
        )
 
    @property
    def num_samples(self) -> int:
        train_len, _ = self._get_splits(len_dataset=50_000)
        return train_len
 
    @property
    def num_classes(self) -> int:
        """
        Return:
            10
        """
        return 100
 
    def default_transforms(self) -> Callable:
        if self.normalize:
            cf100_transforms = transforms.Compose([transform_lib.ToTensor(), cifar100_normalization()])
        else:
            cf100_transforms = transforms.Compose([transform_lib.ToTensor()])
 
        return cf100_transforms

 

モデル

1
2
3
import torch
import torch.nn as nn
import torch.nn.functional as F
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
class SE(nn.Module):
    '''Squeeze-and-Excitation block.'''
 
    def __init__(self, in_planes, se_planes):
        super(SE, self).__init__()
        self.se1 = nn.Conv2d(in_planes, se_planes, kernel_size=1, bias=True)
        self.se2 = nn.Conv2d(se_planes, in_planes, kernel_size=1, bias=True)
 
    def forward(self, x):
        out = F.adaptive_avg_pool2d(x, (1, 1))
        out = F.relu(self.se1(out))
        out = self.se2(out).sigmoid()
        out = x * out
        return out
 
 
class Block(nn.Module):
    def __init__(self, w_in, w_out, stride, group_width, bottleneck_ratio, se_ratio):
        super(Block, self).__init__()
        # 1x1
        w_b = int(round(w_out * bottleneck_ratio))
        self.conv1 = nn.Conv2d(w_in, w_b, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(w_b)
        # 3x3
        num_groups = w_b // group_width
        self.conv2 = nn.Conv2d(w_b, w_b, kernel_size=3,
                               stride=stride, padding=1, groups=num_groups, bias=False)
        self.bn2 = nn.BatchNorm2d(w_b)
        # se
        self.with_se = se_ratio > 0
        if self.with_se:
            w_se = int(round(w_in * se_ratio))
            self.se = SE(w_b, w_se)
        # 1x1
        self.conv3 = nn.Conv2d(w_b, w_out, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(w_out)
 
        self.shortcut = nn.Sequential()
        if stride != 1 or w_in != w_out:
            self.shortcut = nn.Sequential(
                nn.Conv2d(w_in, w_out,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(w_out)
            )
 
    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = F.relu(self.bn2(self.conv2(out)))
        if self.with_se:
            out = self.se(out)
        out = self.bn3(self.conv3(out))
        out += self.shortcut(x)
        out = F.relu(out)
        return out
 
 
class RegNet(nn.Module):
    def __init__(self, cfg, num_classes=100):
        super(RegNet, self).__init__()
        self.cfg = cfg
        self.in_planes = 64
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3,
                               stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.layer1 = self._make_layer(0)
        self.layer2 = self._make_layer(1)
        self.layer3 = self._make_layer(2)
        self.layer4 = self._make_layer(3)
        self.linear = nn.Linear(self.cfg['widths'][-1], num_classes)
 
    def _make_layer(self, idx):
        depth = self.cfg['depths'][idx]
        width = self.cfg['widths'][idx]
        stride = self.cfg['strides'][idx]
        group_width = self.cfg['group_width']
        bottleneck_ratio = self.cfg['bottleneck_ratio']
        se_ratio = self.cfg['se_ratio']
 
        layers = []
        for i in range(depth):
            s = stride if i == 0 else 1
            layers.append(Block(self.in_planes, width,
                                s, group_width, bottleneck_ratio, se_ratio))
            self.in_planes = width
        return nn.Sequential(*layers)
 
    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = F.adaptive_avg_pool2d(out, (1, 1))
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out
 
 
def RegNetX_200MF():
    cfg = {
        'depths': [1, 1, 4, 7],
        'widths': [24, 56, 152, 368],
        'strides': [1, 1, 2, 2],
        'group_width': 8,
        'bottleneck_ratio': 1,
        'se_ratio': 0,
    }
    return RegNet(cfg)
 
 
def RegNetX_400MF():
    cfg = {
        'depths': [1, 2, 7, 12],
        'widths': [32, 64, 160, 384],
        'strides': [1, 1, 2, 2],
        'group_width': 16,
        'bottleneck_ratio': 1,
        'se_ratio': 0,
    }
    return RegNet(cfg)
 
 
def RegNetY_400MF():
    cfg = {
        'depths': [1, 2, 7, 12],
        'widths': [32, 64, 160, 384],
        'strides': [1, 1, 2, 2],
        'group_width': 16,
        'bottleneck_ratio': 1,
        'se_ratio': 0.25,
    }
    return RegNet(cfg)
1
2
3
4
net = RegNetX_200MF()
print(net)
y = net(torch.randn(1, 3, 32, 32))
print(y.size())
RegNet(
  (conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (layer1): Sequential(
    (0): Block(
      (conv1): Conv2d(64, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=3, bias=False)
      (bn2): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(64, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
  )
  (layer2): Sequential(
    (0): Block(
      (conv1): Conv2d(24, 56, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(56, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(56, 56, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=7, bias=False)
      (bn2): BatchNorm2d(56, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(56, 56, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(56, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(24, 56, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm2d(56, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
  )
  (layer3): Sequential(
    (0): Block(
      (conv1): Conv2d(56, 152, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(152, 152, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=19, bias=False)
      (bn2): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(152, 152, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(56, 152, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Block(
      (conv1): Conv2d(152, 152, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(152, 152, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=19, bias=False)
      (bn2): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(152, 152, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
    (2): Block(
      (conv1): Conv2d(152, 152, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(152, 152, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=19, bias=False)
      (bn2): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(152, 152, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
    (3): Block(
      (conv1): Conv2d(152, 152, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(152, 152, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=19, bias=False)
      (bn2): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(152, 152, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (layer4): Sequential(
    (0): Block(
      (conv1): Conv2d(152, 368, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(368, 368, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=46, bias=False)
      (bn2): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential(
        (0): Conv2d(152, 368, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): Block(
      (conv1): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(368, 368, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=46, bias=False)
      (bn2): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
    (2): Block(
      (conv1): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(368, 368, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=46, bias=False)
      (bn2): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
    (3): Block(
      (conv1): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(368, 368, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=46, bias=False)
      (bn2): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
    (4): Block(
      (conv1): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(368, 368, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=46, bias=False)
      (bn2): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
    (5): Block(
      (conv1): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(368, 368, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=46, bias=False)
      (bn2): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
    (6): Block(
      (conv1): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(368, 368, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=46, bias=False)
      (bn2): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
  )
  (linear): Linear(in_features=368, out_features=100, bias=True)
)
torch.Size([1, 100])
1
2
3
from torchsummary import summary
  
summary(RegNetX_200MF().to('cuda'), (3, 32, 32))
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1           [-1, 64, 32, 32]           1,728
       BatchNorm2d-2           [-1, 64, 32, 32]             128
            Conv2d-3           [-1, 24, 32, 32]           1,536
       BatchNorm2d-4           [-1, 24, 32, 32]              48
            Conv2d-5           [-1, 24, 32, 32]           1,728
       BatchNorm2d-6           [-1, 24, 32, 32]              48
            Conv2d-7           [-1, 24, 32, 32]             576
       BatchNorm2d-8           [-1, 24, 32, 32]              48
            Conv2d-9           [-1, 24, 32, 32]           1,536
      BatchNorm2d-10           [-1, 24, 32, 32]              48
            Block-11           [-1, 24, 32, 32]               0
           Conv2d-12           [-1, 56, 32, 32]           1,344
      BatchNorm2d-13           [-1, 56, 32, 32]             112
           Conv2d-14           [-1, 56, 32, 32]           4,032
      BatchNorm2d-15           [-1, 56, 32, 32]             112
           Conv2d-16           [-1, 56, 32, 32]           3,136
      BatchNorm2d-17           [-1, 56, 32, 32]             112
           Conv2d-18           [-1, 56, 32, 32]           1,344
      BatchNorm2d-19           [-1, 56, 32, 32]             112
            Block-20           [-1, 56, 32, 32]               0
           Conv2d-21          [-1, 152, 32, 32]           8,512
      BatchNorm2d-22          [-1, 152, 32, 32]             304
           Conv2d-23          [-1, 152, 16, 16]          10,944
      BatchNorm2d-24          [-1, 152, 16, 16]             304
           Conv2d-25          [-1, 152, 16, 16]          23,104
      BatchNorm2d-26          [-1, 152, 16, 16]             304
           Conv2d-27          [-1, 152, 16, 16]           8,512
      BatchNorm2d-28          [-1, 152, 16, 16]             304
            Block-29          [-1, 152, 16, 16]               0
           Conv2d-30          [-1, 152, 16, 16]          23,104
      BatchNorm2d-31          [-1, 152, 16, 16]             304
           Conv2d-32          [-1, 152, 16, 16]          10,944
      BatchNorm2d-33          [-1, 152, 16, 16]             304
           Conv2d-34          [-1, 152, 16, 16]          23,104
      BatchNorm2d-35          [-1, 152, 16, 16]             304
            Block-36          [-1, 152, 16, 16]               0
           Conv2d-37          [-1, 152, 16, 16]          23,104
      BatchNorm2d-38          [-1, 152, 16, 16]             304
           Conv2d-39          [-1, 152, 16, 16]          10,944
      BatchNorm2d-40          [-1, 152, 16, 16]             304
           Conv2d-41          [-1, 152, 16, 16]          23,104
      BatchNorm2d-42          [-1, 152, 16, 16]             304
            Block-43          [-1, 152, 16, 16]               0
           Conv2d-44          [-1, 152, 16, 16]          23,104
      BatchNorm2d-45          [-1, 152, 16, 16]             304
           Conv2d-46          [-1, 152, 16, 16]          10,944
      BatchNorm2d-47          [-1, 152, 16, 16]             304
           Conv2d-48          [-1, 152, 16, 16]          23,104
      BatchNorm2d-49          [-1, 152, 16, 16]             304
            Block-50          [-1, 152, 16, 16]               0
           Conv2d-51          [-1, 368, 16, 16]          55,936
      BatchNorm2d-52          [-1, 368, 16, 16]             736
           Conv2d-53            [-1, 368, 8, 8]          26,496
      BatchNorm2d-54            [-1, 368, 8, 8]             736
           Conv2d-55            [-1, 368, 8, 8]         135,424
      BatchNorm2d-56            [-1, 368, 8, 8]             736
           Conv2d-57            [-1, 368, 8, 8]          55,936
      BatchNorm2d-58            [-1, 368, 8, 8]             736
            Block-59            [-1, 368, 8, 8]               0
           Conv2d-60            [-1, 368, 8, 8]         135,424
      BatchNorm2d-61            [-1, 368, 8, 8]             736
           Conv2d-62            [-1, 368, 8, 8]          26,496
      BatchNorm2d-63            [-1, 368, 8, 8]             736
           Conv2d-64            [-1, 368, 8, 8]         135,424
      BatchNorm2d-65            [-1, 368, 8, 8]             736
            Block-66            [-1, 368, 8, 8]               0
           Conv2d-67            [-1, 368, 8, 8]         135,424
      BatchNorm2d-68            [-1, 368, 8, 8]             736
           Conv2d-69            [-1, 368, 8, 8]          26,496
      BatchNorm2d-70            [-1, 368, 8, 8]             736
           Conv2d-71            [-1, 368, 8, 8]         135,424
      BatchNorm2d-72            [-1, 368, 8, 8]             736
            Block-73            [-1, 368, 8, 8]               0
           Conv2d-74            [-1, 368, 8, 8]         135,424
      BatchNorm2d-75            [-1, 368, 8, 8]             736
           Conv2d-76            [-1, 368, 8, 8]          26,496
      BatchNorm2d-77            [-1, 368, 8, 8]             736
           Conv2d-78            [-1, 368, 8, 8]         135,424
      BatchNorm2d-79            [-1, 368, 8, 8]             736
            Block-80            [-1, 368, 8, 8]               0
           Conv2d-81            [-1, 368, 8, 8]         135,424
      BatchNorm2d-82            [-1, 368, 8, 8]             736
           Conv2d-83            [-1, 368, 8, 8]          26,496
      BatchNorm2d-84            [-1, 368, 8, 8]             736
           Conv2d-85            [-1, 368, 8, 8]         135,424
      BatchNorm2d-86            [-1, 368, 8, 8]             736
            Block-87            [-1, 368, 8, 8]               0
           Conv2d-88            [-1, 368, 8, 8]         135,424
      BatchNorm2d-89            [-1, 368, 8, 8]             736
           Conv2d-90            [-1, 368, 8, 8]          26,496
      BatchNorm2d-91            [-1, 368, 8, 8]             736
           Conv2d-92            [-1, 368, 8, 8]         135,424
      BatchNorm2d-93            [-1, 368, 8, 8]             736
            Block-94            [-1, 368, 8, 8]               0
           Conv2d-95            [-1, 368, 8, 8]         135,424
      BatchNorm2d-96            [-1, 368, 8, 8]             736
           Conv2d-97            [-1, 368, 8, 8]          26,496
      BatchNorm2d-98            [-1, 368, 8, 8]             736
           Conv2d-99            [-1, 368, 8, 8]         135,424
     BatchNorm2d-100            [-1, 368, 8, 8]             736
           Block-101            [-1, 368, 8, 8]               0
          Linear-102                  [-1, 100]          36,900
================================================================
Total params: 2,355,156
Trainable params: 2,355,156
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.01
Forward/backward pass size (MB): 27.56
Params size (MB): 8.98
Estimated Total Size (MB): 36.55
----------------------------------------------------------------

 

Lightning モジュール

1
2
3
4
5
6
7
8
9
10
11
12
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.optim.lr_scheduler import OneCycleLR, CyclicLR, ExponentialLR, CosineAnnealingLR, ReduceLROnPlateau
from torch.optim.swa_utils import AveragedModel, update_bn
import torchvision
  
import pytorch_lightning as pl
from pytorch_lightning.callbacks import LearningRateMonitor, GPUStatsMonitor, EarlyStopping
from pytorch_lightning.metrics.functional import accuracy
#from pl_bolts.datamodules import CIFAR10DataModule
#from pl_bolts.transforms.dataset_normalizations import cifar10_normalization
1
pl.seed_everything(7);
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
batch_size = 50
  
train_transforms = torchvision.transforms.Compose([
    torchvision.transforms.RandomCrop(32, padding=4),
    torchvision.transforms.RandomHorizontalFlip(),
    torchvision.transforms.ToTensor(),
    cifar100_normalization(),
])
  
test_transforms = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor(),
    cifar100_normalization(),
])
  
cifar100_dm = CIFAR100DataModule(
    batch_size=batch_size,
    num_workers=8,
    train_transforms=train_transforms,
    test_transforms=test_transforms,
    val_transforms=test_transforms,
)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
class LitCifar100(pl.LightningModule):
    def __init__(self, lr=0.05, factor=0.8):
        super().__init__()
   
        self.save_hyperparameters()
        self.model = RegNetX_200MF()
  
    def forward(self, x):
        out = self.model(x)
        return F.log_softmax(out, dim=1)
   
    def training_step(self, batch, batch_idx):
        x, y = batch
        logits = F.log_softmax(self.model(x), dim=1)
        loss = F.nll_loss(logits, y)
        self.log('train_loss', loss)
        return loss
   
    def evaluate(self, batch, stage=None):
        x, y = batch
        logits = self(x)
        loss = F.nll_loss(logits, y)
        preds = torch.argmax(logits, dim=1)
        acc = accuracy(preds, y)
   
        if stage:
            self.log(f'{stage}_loss', loss, prog_bar=True)
            self.log(f'{stage}_acc', acc, prog_bar=True)
   
    def validation_step(self, batch, batch_idx):
        self.evaluate(batch, 'val')
   
    def test_step(self, batch, batch_idx):
        self.evaluate(batch, 'test')
   
    def configure_optimizers(self):
        optimizer = torch.optim.SGD(self.parameters(), lr=self.hparams.lr, momentum=0.9, weight_decay=5e-4)
  
        return {
          'optimizer': optimizer,
          'lr_scheduler': ReduceLROnPlateau(optimizer, 'max', patience=5, factor=self.hparams.factor, verbose=True, threshold=0.0001, threshold_mode='abs', cooldown=1, min_lr=1e-5),
          'monitor': 'val_acc'
        }

 

訓練 / 評価

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
%%time
  
model = LitCifar100(lr=0.05, factor=0.5)
model.datamodule = cifar100_dm
   
trainer = pl.Trainer(
    gpus=1,
    max_epochs=100,
    progress_bar_refresh_rate=100,
    logger=pl.loggers.TensorBoardLogger('tblogs/', name='regnetx_200mf'),
    callbacks=[LearningRateMonitor(logging_interval='step')],
)
   
trainer.fit(model, cifar100_dm)
trainer.test(model, datamodule=cifar100_dm);
  | Name  | Type   | Params
---------------------------------
0 | model | RegNet | 2.4 M 
---------------------------------
2.4 M     Trainable params
0         Non-trainable params
2.4 M     Total params
9.421     Total estimated model params size (MB)
(...)
Epoch    33: reducing learning rate of group 0 to 2.5000e-02.
Epoch    40: reducing learning rate of group 0 to 1.2500e-02.
Epoch    47: reducing learning rate of group 0 to 6.2500e-03.
Epoch    56: reducing learning rate of group 0 to 3.1250e-03.
Epoch    72: reducing learning rate of group 0 to 1.5625e-03.
Epoch    79: reducing learning rate of group 0 to 7.8125e-04.
Epoch    95: reducing learning rate of group 0 to 3.9063e-04.
(...)
-------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': 0.7339000105857849, 'test_loss': 1.152081847190857}
--------------------------------------------------------------------------------
CPU times: user 1h 46min 25s, sys: 18min 45s, total: 2h 5min 10s
Wall time: 2h 8min 30s
 

以上