PyTorch Lightning 1.1: research : CIFAR100 (RegNet)
作成 : (株)クラスキャット セールスインフォメーション
作成日時 : 02/25/2021 (1.1.x)
* 本ページは以下の CIFAR10 用リソースを参考に CIFAR100 で遂行した実験結果のレポートです:
- notebooks : PyTorch Lightning CIFAR10 ~94% Baseline Tutorial
- Train CIFAR10 with PyTorch
- Pytorch-cifar100
* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。
★ 無料セミナー実施中 ★ クラスキャット主催 人工知能 & ビジネス Web セミナー
人工知能とビジネスをテーマにウェビナー (WEB セミナー) を定期的に開催しています。スケジュールは弊社 公式 Web サイト でご確認頂けます。
- お住まいの地域に関係なく Web ブラウザからご参加頂けます。事前登録 が必要ですのでご注意ください。
- Windows PC のブラウザからご参加が可能です。スマートデバイスもご利用可能です。
クラスキャットは人工知能・テレワークに関する各種サービスを提供しております :
人工知能研究開発支援 | 人工知能研修サービス | テレワーク & オンライン授業を支援 |
PoC(概念実証)を失敗させないための支援 (本支援はセミナーに参加しアンケートに回答した方を対象としています。) |
◆ お問合せ : 本件に関するお問い合わせ先は下記までお願いいたします。
株式会社クラスキャット セールス・マーケティング本部 セールス・インフォメーション |
E-Mail:sales-info@classcat.com ; WebSite: https://www.classcat.com/ |
Facebook: https://www.facebook.com/ClassCatJP/ |
research: CIFAR100 (RegNet)
RegNetX_200MF
仕様
- Total params: 2,355,156 (2.4M)
- Trainable params: 2,355,156
- Non-trainable params: 0
結果
- RegNetX_200MF
- {‘test_acc’: 0.7339000105857849, ‘test_loss’: 1.152081847190857}
- 100 エポック ; Wall time: 2h 8min 30s
- Tesla T4
- ReduceLROnPlateau
RegNetX_400MF
仕様
- Total params: 4,813,988 (4.8M)
- Trainable params: 4,813,988
- Non-trainable params: 0
結果
- RegNetX_400MF
- {‘test_acc’: 0.732200026512146, ‘test_loss’: 1.1259433031082153}
- 100 エポック ; Wall time: Wall time: 3h 19min 18s
- Tesla T4
- ReduceLROnPlateau
RegNetY_400MF
仕様
- Total params: 5,749,012 (5.7M)
- Trainable params: 5,749,012
- Non-trainable params: 0
結果
- RegNetY_400MF
- {‘test_acc’: 0.7128999829292297, ‘test_loss’: 1.2059353590011597}
- 100 エポック ; Wall time: Wall time: 3h 47min 37s
- Tesla T4
- ReduceLROnPlateau
CIFAR 100 DM
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | from typing import Any , Callable , Optional, Sequence, Union from pl_bolts.datamodules.vision_datamodule import VisionDataModule #from pl_bolts.datasets import TrialCIFAR10 #from pl_bolts.transforms.dataset_normalizations import cifar10_normalization from pl_bolts.utils import _TORCHVISION_AVAILABLE from pl_bolts.utils.warnings import warn_missing_pkg if _TORCHVISION_AVAILABLE: from torchvision import transforms #from torchvision import transforms as transform_lib from torchvision.datasets import CIFAR100 else : # pragma: no cover warn_missing_pkg( 'torchvision' ) CIFAR100 = None |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | def cifar100_normalization(): if not _TORCHVISION_AVAILABLE: # pragma: no cover raise ModuleNotFoundError( 'You want to use `torchvision` which is not installed yet, install it with `pip install torchvision`.' ) normalize = transforms.Normalize( mean = [x / 255.0 for x in [ 129.3 , 124.1 , 112.4 ]], std = [x / 255.0 for x in [ 68.2 , 65.4 , 70.4 ]], # cifar10 #mean=[x / 255.0 for x in [125.3, 123.0, 113.9]], #std=[x / 255.0 for x in [63.0, 62.1, 66.7]], ) return normalize |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 | class CIFAR100DataModule(VisionDataModule): """ Plot-of-a-Subset-of-Images-from-the-CIFAR-10-Dataset.png :width: 400 :alt: CIFAR-10 Specs: - 10 classes (1 per class) - Each image is (3 x 32 x 32) Standard CIFAR10, train, val, test splits and transforms Transforms:: mnist_transforms = transform_lib.Compose([ transform_lib.ToTensor(), transforms.Normalize( mean=[x / 255.0 for x in [125.3, 123.0, 113.9]], std=[x / 255.0 for x in [63.0, 62.1, 66.7]] ) ]) Example:: from pl_bolts.datamodules import CIFAR10DataModule dm = CIFAR10DataModule(PATH) model = LitModel() Trainer().fit(model, datamodule=dm) Or you can set your own transforms Example:: dm.train_transforms = ... dm.test_transforms = ... dm.val_transforms = ... """ name = "cifar100" dataset_cls = CIFAR100 dims = ( 3 , 32 , 32 ) def __init__( self , data_dir: Optional[ str ] = None , val_split: Union[ int , float ] = 0.2 , num_workers: int = 16 , normalize: bool = False , batch_size: int = 32 , seed: int = 42 , shuffle: bool = False , pin_memory: bool = False , drop_last: bool = False , * args: Any , * * kwargs: Any , ) - > None : """ Args: data_dir: Where to save/load the data val_split: Percent (float) or number (int) of samples to use for the validation split num_workers: How many workers to use for loading data normalize: If true applies image normalize batch_size: How many samples per batch to load seed: Random seed to be used for train/val/test splits shuffle: If true shuffles the train data every epoch pin_memory: If true, the data loader will copy Tensors into CUDA pinned memory before returning them drop_last: If true drops the last incomplete batch """ super ().__init__( # type: ignore[misc] data_dir = data_dir, val_split = val_split, num_workers = num_workers, normalize = normalize, batch_size = batch_size, seed = seed, shuffle = shuffle, pin_memory = pin_memory, drop_last = drop_last, * args, * * kwargs, ) @property def num_samples( self ) - > int : train_len, _ = self ._get_splits(len_dataset = 50_000 ) return train_len @property def num_classes( self ) - > int : """ Return: 10 """ return 100 def default_transforms( self ) - > Callable : if self .normalize: cf100_transforms = transforms.Compose([transform_lib.ToTensor(), cifar100_normalization()]) else : cf100_transforms = transforms.Compose([transform_lib.ToTensor()]) return cf100_transforms |
モデル
1 2 3 | import torch import torch.nn as nn import torch.nn.functional as F |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | class SE(nn.Module): '''Squeeze-and-Excitation block.''' def __init__( self , in_planes, se_planes): super (SE, self ).__init__() self .se1 = nn.Conv2d(in_planes, se_planes, kernel_size = 1 , bias = True ) self .se2 = nn.Conv2d(se_planes, in_planes, kernel_size = 1 , bias = True ) def forward( self , x): out = F.adaptive_avg_pool2d(x, ( 1 , 1 )) out = F.relu( self .se1(out)) out = self .se2(out).sigmoid() out = x * out return out class Block(nn.Module): def __init__( self , w_in, w_out, stride, group_width, bottleneck_ratio, se_ratio): super (Block, self ).__init__() # 1x1 w_b = int ( round (w_out * bottleneck_ratio)) self .conv1 = nn.Conv2d(w_in, w_b, kernel_size = 1 , bias = False ) self .bn1 = nn.BatchNorm2d(w_b) # 3x3 num_groups = w_b / / group_width self .conv2 = nn.Conv2d(w_b, w_b, kernel_size = 3 , stride = stride, padding = 1 , groups = num_groups, bias = False ) self .bn2 = nn.BatchNorm2d(w_b) # se self .with_se = se_ratio > 0 if self .with_se: w_se = int ( round (w_in * se_ratio)) self .se = SE(w_b, w_se) # 1x1 self .conv3 = nn.Conv2d(w_b, w_out, kernel_size = 1 , bias = False ) self .bn3 = nn.BatchNorm2d(w_out) self .shortcut = nn.Sequential() if stride ! = 1 or w_in ! = w_out: self .shortcut = nn.Sequential( nn.Conv2d(w_in, w_out, kernel_size = 1 , stride = stride, bias = False ), nn.BatchNorm2d(w_out) ) def forward( self , x): out = F.relu( self .bn1( self .conv1(x))) out = F.relu( self .bn2( self .conv2(out))) if self .with_se: out = self .se(out) out = self .bn3( self .conv3(out)) out + = self .shortcut(x) out = F.relu(out) return out class RegNet(nn.Module): def __init__( self , cfg, num_classes = 100 ): super (RegNet, self ).__init__() self .cfg = cfg self .in_planes = 64 self .conv1 = nn.Conv2d( 3 , 64 , kernel_size = 3 , stride = 1 , padding = 1 , bias = False ) self .bn1 = nn.BatchNorm2d( 64 ) self .layer1 = self ._make_layer( 0 ) self .layer2 = self ._make_layer( 1 ) self .layer3 = self ._make_layer( 2 ) self .layer4 = self ._make_layer( 3 ) self .linear = nn.Linear( self .cfg[ 'widths' ][ - 1 ], num_classes) def _make_layer( self , idx): depth = self .cfg[ 'depths' ][idx] width = self .cfg[ 'widths' ][idx] stride = self .cfg[ 'strides' ][idx] group_width = self .cfg[ 'group_width' ] bottleneck_ratio = self .cfg[ 'bottleneck_ratio' ] se_ratio = self .cfg[ 'se_ratio' ] layers = [] for i in range (depth): s = stride if i = = 0 else 1 layers.append(Block( self .in_planes, width, s, group_width, bottleneck_ratio, se_ratio)) self .in_planes = width return nn.Sequential( * layers) def forward( self , x): out = F.relu( self .bn1( self .conv1(x))) out = self .layer1(out) out = self .layer2(out) out = self .layer3(out) out = self .layer4(out) out = F.adaptive_avg_pool2d(out, ( 1 , 1 )) out = out.view(out.size( 0 ), - 1 ) out = self .linear(out) return out def RegNetX_200MF(): cfg = { 'depths' : [ 1 , 1 , 4 , 7 ], 'widths' : [ 24 , 56 , 152 , 368 ], 'strides' : [ 1 , 1 , 2 , 2 ], 'group_width' : 8 , 'bottleneck_ratio' : 1 , 'se_ratio' : 0 , } return RegNet(cfg) def RegNetX_400MF(): cfg = { 'depths' : [ 1 , 2 , 7 , 12 ], 'widths' : [ 32 , 64 , 160 , 384 ], 'strides' : [ 1 , 1 , 2 , 2 ], 'group_width' : 16 , 'bottleneck_ratio' : 1 , 'se_ratio' : 0 , } return RegNet(cfg) def RegNetY_400MF(): cfg = { 'depths' : [ 1 , 2 , 7 , 12 ], 'widths' : [ 32 , 64 , 160 , 384 ], 'strides' : [ 1 , 1 , 2 , 2 ], 'group_width' : 16 , 'bottleneck_ratio' : 1 , 'se_ratio' : 0.25 , } return RegNet(cfg) |
1 2 3 4 | net = RegNetX_200MF() print (net) y = net(torch.randn( 1 , 3 , 32 , 32 )) print (y.size()) |
RegNet( (conv1): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (layer1): Sequential( (0): Block( (conv1): Conv2d(64, 24, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(24, 24, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=3, bias=False) (bn2): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(24, 24, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential( (0): Conv2d(64, 24, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) (layer2): Sequential( (0): Block( (conv1): Conv2d(24, 56, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(56, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(56, 56, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=7, bias=False) (bn2): BatchNorm2d(56, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(56, 56, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(56, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential( (0): Conv2d(24, 56, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(56, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) (layer3): Sequential( (0): Block( (conv1): Conv2d(56, 152, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(152, 152, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=19, bias=False) (bn2): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(152, 152, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential( (0): Conv2d(56, 152, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Block( (conv1): Conv2d(152, 152, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(152, 152, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=19, bias=False) (bn2): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(152, 152, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) (2): Block( (conv1): Conv2d(152, 152, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(152, 152, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=19, bias=False) (bn2): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(152, 152, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) (3): Block( (conv1): Conv2d(152, 152, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(152, 152, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=19, bias=False) (bn2): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(152, 152, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(152, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) ) (layer4): Sequential( (0): Block( (conv1): Conv2d(152, 368, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(368, 368, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=46, bias=False) (bn2): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential( (0): Conv2d(152, 368, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Block( (conv1): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(368, 368, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=46, bias=False) (bn2): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) (2): Block( (conv1): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(368, 368, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=46, bias=False) (bn2): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) (3): Block( (conv1): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(368, 368, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=46, bias=False) (bn2): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) (4): Block( (conv1): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(368, 368, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=46, bias=False) (bn2): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) (5): Block( (conv1): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(368, 368, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=46, bias=False) (bn2): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) (6): Block( (conv1): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(368, 368, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=46, bias=False) (bn2): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(368, 368, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(368, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) ) (linear): Linear(in_features=368, out_features=100, bias=True) ) torch.Size([1, 100])
1 2 3 | from torchsummary import summary summary(RegNetX_200MF().to( 'cuda' ), ( 3 , 32 , 32 )) |
---------------------------------------------------------------- Layer (type) Output Shape Param # ================================================================ Conv2d-1 [-1, 64, 32, 32] 1,728 BatchNorm2d-2 [-1, 64, 32, 32] 128 Conv2d-3 [-1, 24, 32, 32] 1,536 BatchNorm2d-4 [-1, 24, 32, 32] 48 Conv2d-5 [-1, 24, 32, 32] 1,728 BatchNorm2d-6 [-1, 24, 32, 32] 48 Conv2d-7 [-1, 24, 32, 32] 576 BatchNorm2d-8 [-1, 24, 32, 32] 48 Conv2d-9 [-1, 24, 32, 32] 1,536 BatchNorm2d-10 [-1, 24, 32, 32] 48 Block-11 [-1, 24, 32, 32] 0 Conv2d-12 [-1, 56, 32, 32] 1,344 BatchNorm2d-13 [-1, 56, 32, 32] 112 Conv2d-14 [-1, 56, 32, 32] 4,032 BatchNorm2d-15 [-1, 56, 32, 32] 112 Conv2d-16 [-1, 56, 32, 32] 3,136 BatchNorm2d-17 [-1, 56, 32, 32] 112 Conv2d-18 [-1, 56, 32, 32] 1,344 BatchNorm2d-19 [-1, 56, 32, 32] 112 Block-20 [-1, 56, 32, 32] 0 Conv2d-21 [-1, 152, 32, 32] 8,512 BatchNorm2d-22 [-1, 152, 32, 32] 304 Conv2d-23 [-1, 152, 16, 16] 10,944 BatchNorm2d-24 [-1, 152, 16, 16] 304 Conv2d-25 [-1, 152, 16, 16] 23,104 BatchNorm2d-26 [-1, 152, 16, 16] 304 Conv2d-27 [-1, 152, 16, 16] 8,512 BatchNorm2d-28 [-1, 152, 16, 16] 304 Block-29 [-1, 152, 16, 16] 0 Conv2d-30 [-1, 152, 16, 16] 23,104 BatchNorm2d-31 [-1, 152, 16, 16] 304 Conv2d-32 [-1, 152, 16, 16] 10,944 BatchNorm2d-33 [-1, 152, 16, 16] 304 Conv2d-34 [-1, 152, 16, 16] 23,104 BatchNorm2d-35 [-1, 152, 16, 16] 304 Block-36 [-1, 152, 16, 16] 0 Conv2d-37 [-1, 152, 16, 16] 23,104 BatchNorm2d-38 [-1, 152, 16, 16] 304 Conv2d-39 [-1, 152, 16, 16] 10,944 BatchNorm2d-40 [-1, 152, 16, 16] 304 Conv2d-41 [-1, 152, 16, 16] 23,104 BatchNorm2d-42 [-1, 152, 16, 16] 304 Block-43 [-1, 152, 16, 16] 0 Conv2d-44 [-1, 152, 16, 16] 23,104 BatchNorm2d-45 [-1, 152, 16, 16] 304 Conv2d-46 [-1, 152, 16, 16] 10,944 BatchNorm2d-47 [-1, 152, 16, 16] 304 Conv2d-48 [-1, 152, 16, 16] 23,104 BatchNorm2d-49 [-1, 152, 16, 16] 304 Block-50 [-1, 152, 16, 16] 0 Conv2d-51 [-1, 368, 16, 16] 55,936 BatchNorm2d-52 [-1, 368, 16, 16] 736 Conv2d-53 [-1, 368, 8, 8] 26,496 BatchNorm2d-54 [-1, 368, 8, 8] 736 Conv2d-55 [-1, 368, 8, 8] 135,424 BatchNorm2d-56 [-1, 368, 8, 8] 736 Conv2d-57 [-1, 368, 8, 8] 55,936 BatchNorm2d-58 [-1, 368, 8, 8] 736 Block-59 [-1, 368, 8, 8] 0 Conv2d-60 [-1, 368, 8, 8] 135,424 BatchNorm2d-61 [-1, 368, 8, 8] 736 Conv2d-62 [-1, 368, 8, 8] 26,496 BatchNorm2d-63 [-1, 368, 8, 8] 736 Conv2d-64 [-1, 368, 8, 8] 135,424 BatchNorm2d-65 [-1, 368, 8, 8] 736 Block-66 [-1, 368, 8, 8] 0 Conv2d-67 [-1, 368, 8, 8] 135,424 BatchNorm2d-68 [-1, 368, 8, 8] 736 Conv2d-69 [-1, 368, 8, 8] 26,496 BatchNorm2d-70 [-1, 368, 8, 8] 736 Conv2d-71 [-1, 368, 8, 8] 135,424 BatchNorm2d-72 [-1, 368, 8, 8] 736 Block-73 [-1, 368, 8, 8] 0 Conv2d-74 [-1, 368, 8, 8] 135,424 BatchNorm2d-75 [-1, 368, 8, 8] 736 Conv2d-76 [-1, 368, 8, 8] 26,496 BatchNorm2d-77 [-1, 368, 8, 8] 736 Conv2d-78 [-1, 368, 8, 8] 135,424 BatchNorm2d-79 [-1, 368, 8, 8] 736 Block-80 [-1, 368, 8, 8] 0 Conv2d-81 [-1, 368, 8, 8] 135,424 BatchNorm2d-82 [-1, 368, 8, 8] 736 Conv2d-83 [-1, 368, 8, 8] 26,496 BatchNorm2d-84 [-1, 368, 8, 8] 736 Conv2d-85 [-1, 368, 8, 8] 135,424 BatchNorm2d-86 [-1, 368, 8, 8] 736 Block-87 [-1, 368, 8, 8] 0 Conv2d-88 [-1, 368, 8, 8] 135,424 BatchNorm2d-89 [-1, 368, 8, 8] 736 Conv2d-90 [-1, 368, 8, 8] 26,496 BatchNorm2d-91 [-1, 368, 8, 8] 736 Conv2d-92 [-1, 368, 8, 8] 135,424 BatchNorm2d-93 [-1, 368, 8, 8] 736 Block-94 [-1, 368, 8, 8] 0 Conv2d-95 [-1, 368, 8, 8] 135,424 BatchNorm2d-96 [-1, 368, 8, 8] 736 Conv2d-97 [-1, 368, 8, 8] 26,496 BatchNorm2d-98 [-1, 368, 8, 8] 736 Conv2d-99 [-1, 368, 8, 8] 135,424 BatchNorm2d-100 [-1, 368, 8, 8] 736 Block-101 [-1, 368, 8, 8] 0 Linear-102 [-1, 100] 36,900 ================================================================ Total params: 2,355,156 Trainable params: 2,355,156 Non-trainable params: 0 ---------------------------------------------------------------- Input size (MB): 0.01 Forward/backward pass size (MB): 27.56 Params size (MB): 8.98 Estimated Total Size (MB): 36.55 ----------------------------------------------------------------
Lightning モジュール
1 2 3 4 5 6 7 8 9 10 11 12 | import torch import torch.nn as nn import torch.nn.functional as F from torch.optim.lr_scheduler import OneCycleLR, CyclicLR, ExponentialLR, CosineAnnealingLR, ReduceLROnPlateau from torch.optim.swa_utils import AveragedModel, update_bn import torchvision import pytorch_lightning as pl from pytorch_lightning.callbacks import LearningRateMonitor, GPUStatsMonitor, EarlyStopping from pytorch_lightning.metrics.functional import accuracy #from pl_bolts.datamodules import CIFAR10DataModule #from pl_bolts.transforms.dataset_normalizations import cifar10_normalization |
1 | pl.seed_everything( 7 ); |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | batch_size = 50 train_transforms = torchvision.transforms.Compose([ torchvision.transforms.RandomCrop( 32 , padding = 4 ), torchvision.transforms.RandomHorizontalFlip(), torchvision.transforms.ToTensor(), cifar100_normalization(), ]) test_transforms = torchvision.transforms.Compose([ torchvision.transforms.ToTensor(), cifar100_normalization(), ]) cifar100_dm = CIFAR100DataModule( batch_size = batch_size, num_workers = 8 , train_transforms = train_transforms, test_transforms = test_transforms, val_transforms = test_transforms, ) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | class LitCifar100(pl.LightningModule): def __init__( self , lr = 0.05 , factor = 0.8 ): super ().__init__() self .save_hyperparameters() self .model = RegNetX_200MF() def forward( self , x): out = self .model(x) return F.log_softmax(out, dim = 1 ) def training_step( self , batch, batch_idx): x, y = batch logits = F.log_softmax( self .model(x), dim = 1 ) loss = F.nll_loss(logits, y) self .log( 'train_loss' , loss) return loss def evaluate( self , batch, stage = None ): x, y = batch logits = self (x) loss = F.nll_loss(logits, y) preds = torch.argmax(logits, dim = 1 ) acc = accuracy(preds, y) if stage: self .log(f '{stage}_loss' , loss, prog_bar = True ) self .log(f '{stage}_acc' , acc, prog_bar = True ) def validation_step( self , batch, batch_idx): self .evaluate(batch, 'val' ) def test_step( self , batch, batch_idx): self .evaluate(batch, 'test' ) def configure_optimizers( self ): optimizer = torch.optim.SGD( self .parameters(), lr = self .hparams.lr, momentum = 0.9 , weight_decay = 5e - 4 ) return { 'optimizer' : optimizer, 'lr_scheduler' : ReduceLROnPlateau(optimizer, 'max' , patience = 5 , factor = self .hparams.factor, verbose = True , threshold = 0.0001 , threshold_mode = 'abs' , cooldown = 1 , min_lr = 1e - 5 ), 'monitor' : 'val_acc' } |
訓練 / 評価
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | % % time model = LitCifar100(lr = 0.05 , factor = 0.5 ) model.datamodule = cifar100_dm trainer = pl.Trainer( gpus = 1 , max_epochs = 100 , progress_bar_refresh_rate = 100 , logger = pl.loggers.TensorBoardLogger( 'tblogs/' , name = 'regnetx_200mf' ), callbacks = [LearningRateMonitor(logging_interval = 'step' )], ) trainer.fit(model, cifar100_dm) trainer.test(model, datamodule = cifar100_dm); |
| Name | Type | Params --------------------------------- 0 | model | RegNet | 2.4 M --------------------------------- 2.4 M Trainable params 0 Non-trainable params 2.4 M Total params 9.421 Total estimated model params size (MB) (...) Epoch 33: reducing learning rate of group 0 to 2.5000e-02. Epoch 40: reducing learning rate of group 0 to 1.2500e-02. Epoch 47: reducing learning rate of group 0 to 6.2500e-03. Epoch 56: reducing learning rate of group 0 to 3.1250e-03. Epoch 72: reducing learning rate of group 0 to 1.5625e-03. Epoch 79: reducing learning rate of group 0 to 7.8125e-04. Epoch 95: reducing learning rate of group 0 to 3.9063e-04. (...) ------------------------------------------------------------------------------- DATALOADER:0 TEST RESULTS {'test_acc': 0.7339000105857849, 'test_loss': 1.152081847190857} -------------------------------------------------------------------------------- CPU times: user 1h 46min 25s, sys: 18min 45s, total: 2h 5min 10s Wall time: 2h 8min 30s
以上