PyTorch Lightning 1.1: research : CIFAR10 (MobileNet)
作成 : (株)クラスキャット セールスインフォメーション
作成日時 : 02/20/2021 (1.1.x)
* 本ページは、以下のリソースを参考にして遂行した実験結果のレポートです:
* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。
★ 無料セミナー実施中 ★ クラスキャット主催 人工知能 & ビジネス Web セミナー
人工知能とビジネスをテーマにウェビナー (WEB セミナー) を定期的に開催しています。スケジュールは弊社 公式 Web サイト でご確認頂けます。
- お住まいの地域に関係なく Web ブラウザからご参加頂けます。事前登録 が必要ですのでご注意ください。
- Windows PC のブラウザからご参加が可能です。スマートデバイスもご利用可能です。
クラスキャットは人工知能・テレワークに関する各種サービスを提供しております :
人工知能研究開発支援 | 人工知能研修サービス | テレワーク & オンライン授業を支援 |
PoC(概念実証)を失敗させないための支援 (本支援はセミナーに参加しアンケートに回答した方を対象としています。) |
◆ お問合せ : 本件に関するお問い合わせ先は下記までお願いいたします。
株式会社クラスキャット セールス・マーケティング本部 セールス・インフォメーション |
E-Mail:sales-info@classcat.com ; WebSite: https://www.classcat.com/ |
Facebook: https://www.facebook.com/ClassCatJP/ |
research: CIFAR10 (MobileNet)
結果
50 エポック : OneCycleLR
- MobileNetV2 – {‘test_acc’: 0.8402000069618225, ‘test_loss’: 0.4621441066265106} – Wall time: 56min 25s
50 エポック : CyclicLR
- MobileNetV2 – {‘test_acc’: 0.906499981880188, ‘test_loss’: 0.3456864356994629} – Wall time: 51min 54s
150 エポック : ReduceLROnPlateau
- MobileNet – {‘test_acc’: 0.8924999833106995, ‘test_loss’: 0.4093822240829468} – Wall time: 2h 2min 19s (Tesla K80)
- MobileNetV2 – {‘test_acc’: 0.9147999882698059, ‘test_loss’: 0.3481239378452301} – Wall time: 4h 39min 26s (Tesla K80)
コード
import torch import torch.nn as nn import torch.nn.functional as F class Block(nn.Module): '''expand + depthwise + pointwise''' def __init__(self, in_planes, out_planes, expansion, stride): super(Block, self).__init__() self.stride = stride planes = expansion * in_planes self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, stride=1, padding=0, bias=False) self.bn1 = nn.BatchNorm2d(planes) self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, groups=planes, bias=False) self.bn2 = nn.BatchNorm2d(planes) self.conv3 = nn.Conv2d(planes, out_planes, kernel_size=1, stride=1, padding=0, bias=False) self.bn3 = nn.BatchNorm2d(out_planes) self.shortcut = nn.Sequential() if stride == 1 and in_planes != out_planes: self.shortcut = nn.Sequential( nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=1, padding=0, bias=False), nn.BatchNorm2d(out_planes), ) def forward(self, x): out = F.relu(self.bn1(self.conv1(x))) out = F.relu(self.bn2(self.conv2(out))) out = self.bn3(self.conv3(out)) out = out + self.shortcut(x) if self.stride==1 else out return out class MobileNetV2(nn.Module): # (expansion, out_planes, num_blocks, stride) cfg = [(1, 16, 1, 1), (6, 24, 2, 1), # NOTE: change stride 2 -> 1 for CIFAR10 (6, 32, 3, 2), (6, 64, 4, 2), (6, 96, 3, 1), (6, 160, 3, 2), (6, 320, 1, 1)] def __init__(self, num_classes=10): super(MobileNetV2, self).__init__() # NOTE: change conv1 stride 2 -> 1 for CIFAR10 self.conv1 = nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1, bias=False) self.bn1 = nn.BatchNorm2d(32) self.layers = self._make_layers(in_planes=32) self.conv2 = nn.Conv2d(320, 1280, kernel_size=1, stride=1, padding=0, bias=False) self.bn2 = nn.BatchNorm2d(1280) self.linear = nn.Linear(1280, num_classes) def _make_layers(self, in_planes): layers = [] for expansion, out_planes, num_blocks, stride in self.cfg: strides = [stride] + [1]*(num_blocks-1) for stride in strides: layers.append(Block(in_planes, out_planes, expansion, stride)) in_planes = out_planes return nn.Sequential(*layers) def forward(self, x): out = F.relu(self.bn1(self.conv1(x))) out = self.layers(out) out = F.relu(self.bn2(self.conv2(out))) # NOTE: change pooling kernel_size 7 -> 4 for CIFAR10 out = F.avg_pool2d(out, 4) out = out.view(out.size(0), -1) out = self.linear(out) return out
net = MobileNetV2() print(net) x = torch.randn(2,3,32,32) y = net(x) print(y.size())
MobileNetV2( (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (layers): Sequential( (0): Block( (conv1): Conv2d(32, 32, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False) (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential( (0): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): Block( (conv1): Conv2d(16, 96, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=96, bias=False) (bn2): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential( (0): Conv2d(16, 24, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (2): Block( (conv1): Conv2d(24, 144, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(144, 144, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=144, bias=False) (bn2): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(144, 24, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) (3): Block( (conv1): Conv2d(24, 144, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(144, 144, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=144, bias=False) (bn2): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(144, 32, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) (4): Block( (conv1): Conv2d(32, 192, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192, bias=False) (bn2): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) (5): Block( (conv1): Conv2d(32, 192, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192, bias=False) (bn2): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) (6): Block( (conv1): Conv2d(32, 192, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(192, 192, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=192, bias=False) (bn2): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(192, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) (7): Block( (conv1): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384, bias=False) (bn2): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(384, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) (8): Block( (conv1): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384, bias=False) (bn2): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(384, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) (9): Block( (conv1): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384, bias=False) (bn2): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(384, 64, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) (10): Block( (conv1): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384, bias=False) (bn2): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(384, 96, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential( (0): Conv2d(64, 96, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (11): Block( (conv1): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(576, 576, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=576, bias=False) (bn2): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(576, 96, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) (12): Block( (conv1): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(576, 576, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=576, bias=False) (bn2): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(576, 96, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) (13): Block( (conv1): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(576, 576, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=576, bias=False) (bn2): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(576, 160, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) (14): Block( (conv1): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False) (bn2): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) (15): Block( (conv1): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False) (bn2): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) (16): Block( (conv1): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False) (bn2): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv3): Conv2d(960, 320, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn3): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential( (0): Conv2d(160, 320, kernel_size=(1, 1), stride=(1, 1), bias=False) (1): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) ) (conv2): Conv2d(320, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False) (bn2): BatchNorm2d(1280, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (linear): Linear(in_features=1280, out_features=10, bias=True) ) torch.Size([2, 10])
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') device
device(type='cuda')
from torchsummary import summary summary(MobileNetV2().to('cuda'), (3, 32, 32))
---------------------------------------------------------------- Layer (type) Output Shape Param # ================================================================ Conv2d-1 [-1, 32, 32, 32] 864 BatchNorm2d-2 [-1, 32, 32, 32] 64 Conv2d-3 [-1, 32, 32, 32] 1,024 BatchNorm2d-4 [-1, 32, 32, 32] 64 Conv2d-5 [-1, 32, 32, 32] 288 BatchNorm2d-6 [-1, 32, 32, 32] 64 Conv2d-7 [-1, 16, 32, 32] 512 BatchNorm2d-8 [-1, 16, 32, 32] 32 Conv2d-9 [-1, 16, 32, 32] 512 BatchNorm2d-10 [-1, 16, 32, 32] 32 Block-11 [-1, 16, 32, 32] 0 Conv2d-12 [-1, 96, 32, 32] 1,536 BatchNorm2d-13 [-1, 96, 32, 32] 192 Conv2d-14 [-1, 96, 32, 32] 864 BatchNorm2d-15 [-1, 96, 32, 32] 192 Conv2d-16 [-1, 24, 32, 32] 2,304 BatchNorm2d-17 [-1, 24, 32, 32] 48 Conv2d-18 [-1, 24, 32, 32] 384 BatchNorm2d-19 [-1, 24, 32, 32] 48 Block-20 [-1, 24, 32, 32] 0 Conv2d-21 [-1, 144, 32, 32] 3,456 BatchNorm2d-22 [-1, 144, 32, 32] 288 Conv2d-23 [-1, 144, 32, 32] 1,296 BatchNorm2d-24 [-1, 144, 32, 32] 288 Conv2d-25 [-1, 24, 32, 32] 3,456 BatchNorm2d-26 [-1, 24, 32, 32] 48 Block-27 [-1, 24, 32, 32] 0 Conv2d-28 [-1, 144, 32, 32] 3,456 BatchNorm2d-29 [-1, 144, 32, 32] 288 Conv2d-30 [-1, 144, 16, 16] 1,296 BatchNorm2d-31 [-1, 144, 16, 16] 288 Conv2d-32 [-1, 32, 16, 16] 4,608 BatchNorm2d-33 [-1, 32, 16, 16] 64 Block-34 [-1, 32, 16, 16] 0 Conv2d-35 [-1, 192, 16, 16] 6,144 BatchNorm2d-36 [-1, 192, 16, 16] 384 Conv2d-37 [-1, 192, 16, 16] 1,728 BatchNorm2d-38 [-1, 192, 16, 16] 384 Conv2d-39 [-1, 32, 16, 16] 6,144 BatchNorm2d-40 [-1, 32, 16, 16] 64 Block-41 [-1, 32, 16, 16] 0 Conv2d-42 [-1, 192, 16, 16] 6,144 BatchNorm2d-43 [-1, 192, 16, 16] 384 Conv2d-44 [-1, 192, 16, 16] 1,728 BatchNorm2d-45 [-1, 192, 16, 16] 384 Conv2d-46 [-1, 32, 16, 16] 6,144 BatchNorm2d-47 [-1, 32, 16, 16] 64 Block-48 [-1, 32, 16, 16] 0 Conv2d-49 [-1, 192, 16, 16] 6,144 BatchNorm2d-50 [-1, 192, 16, 16] 384 Conv2d-51 [-1, 192, 8, 8] 1,728 BatchNorm2d-52 [-1, 192, 8, 8] 384 Conv2d-53 [-1, 64, 8, 8] 12,288 BatchNorm2d-54 [-1, 64, 8, 8] 128 Block-55 [-1, 64, 8, 8] 0 Conv2d-56 [-1, 384, 8, 8] 24,576 BatchNorm2d-57 [-1, 384, 8, 8] 768 Conv2d-58 [-1, 384, 8, 8] 3,456 BatchNorm2d-59 [-1, 384, 8, 8] 768 Conv2d-60 [-1, 64, 8, 8] 24,576 BatchNorm2d-61 [-1, 64, 8, 8] 128 Block-62 [-1, 64, 8, 8] 0 Conv2d-63 [-1, 384, 8, 8] 24,576 BatchNorm2d-64 [-1, 384, 8, 8] 768 Conv2d-65 [-1, 384, 8, 8] 3,456 BatchNorm2d-66 [-1, 384, 8, 8] 768 Conv2d-67 [-1, 64, 8, 8] 24,576 BatchNorm2d-68 [-1, 64, 8, 8] 128 Block-69 [-1, 64, 8, 8] 0 Conv2d-70 [-1, 384, 8, 8] 24,576 BatchNorm2d-71 [-1, 384, 8, 8] 768 Conv2d-72 [-1, 384, 8, 8] 3,456 BatchNorm2d-73 [-1, 384, 8, 8] 768 Conv2d-74 [-1, 64, 8, 8] 24,576 BatchNorm2d-75 [-1, 64, 8, 8] 128 Block-76 [-1, 64, 8, 8] 0 Conv2d-77 [-1, 384, 8, 8] 24,576 BatchNorm2d-78 [-1, 384, 8, 8] 768 Conv2d-79 [-1, 384, 8, 8] 3,456 BatchNorm2d-80 [-1, 384, 8, 8] 768 Conv2d-81 [-1, 96, 8, 8] 36,864 BatchNorm2d-82 [-1, 96, 8, 8] 192 Conv2d-83 [-1, 96, 8, 8] 6,144 BatchNorm2d-84 [-1, 96, 8, 8] 192 Block-85 [-1, 96, 8, 8] 0 Conv2d-86 [-1, 576, 8, 8] 55,296 BatchNorm2d-87 [-1, 576, 8, 8] 1,152 Conv2d-88 [-1, 576, 8, 8] 5,184 BatchNorm2d-89 [-1, 576, 8, 8] 1,152 Conv2d-90 [-1, 96, 8, 8] 55,296 BatchNorm2d-91 [-1, 96, 8, 8] 192 Block-92 [-1, 96, 8, 8] 0 Conv2d-93 [-1, 576, 8, 8] 55,296 BatchNorm2d-94 [-1, 576, 8, 8] 1,152 Conv2d-95 [-1, 576, 8, 8] 5,184 BatchNorm2d-96 [-1, 576, 8, 8] 1,152 Conv2d-97 [-1, 96, 8, 8] 55,296 BatchNorm2d-98 [-1, 96, 8, 8] 192 Block-99 [-1, 96, 8, 8] 0 Conv2d-100 [-1, 576, 8, 8] 55,296 BatchNorm2d-101 [-1, 576, 8, 8] 1,152 Conv2d-102 [-1, 576, 4, 4] 5,184 BatchNorm2d-103 [-1, 576, 4, 4] 1,152 Conv2d-104 [-1, 160, 4, 4] 92,160 BatchNorm2d-105 [-1, 160, 4, 4] 320 Block-106 [-1, 160, 4, 4] 0 Conv2d-107 [-1, 960, 4, 4] 153,600 BatchNorm2d-108 [-1, 960, 4, 4] 1,920 Conv2d-109 [-1, 960, 4, 4] 8,640 BatchNorm2d-110 [-1, 960, 4, 4] 1,920 Conv2d-111 [-1, 160, 4, 4] 153,600 BatchNorm2d-112 [-1, 160, 4, 4] 320 Block-113 [-1, 160, 4, 4] 0 Conv2d-114 [-1, 960, 4, 4] 153,600 BatchNorm2d-115 [-1, 960, 4, 4] 1,920 Conv2d-116 [-1, 960, 4, 4] 8,640 BatchNorm2d-117 [-1, 960, 4, 4] 1,920 Conv2d-118 [-1, 160, 4, 4] 153,600 BatchNorm2d-119 [-1, 160, 4, 4] 320 Block-120 [-1, 160, 4, 4] 0 Conv2d-121 [-1, 960, 4, 4] 153,600 BatchNorm2d-122 [-1, 960, 4, 4] 1,920 Conv2d-123 [-1, 960, 4, 4] 8,640 BatchNorm2d-124 [-1, 960, 4, 4] 1,920 Conv2d-125 [-1, 320, 4, 4] 307,200 BatchNorm2d-126 [-1, 320, 4, 4] 640 Conv2d-127 [-1, 320, 4, 4] 51,200 BatchNorm2d-128 [-1, 320, 4, 4] 640 Block-129 [-1, 320, 4, 4] 0 Conv2d-130 [-1, 1280, 4, 4] 409,600 BatchNorm2d-131 [-1, 1280, 4, 4] 2,560 Linear-132 [-1, 10] 12,810 ================================================================ Total params: 2,296,922 Trainable params: 2,296,922 Non-trainable params: 0 ---------------------------------------------------------------- Input size (MB): 0.01 Forward/backward pass size (MB): 27.37 Params size (MB): 8.76 Estimated Total Size (MB): 36.14 ----------------------------------------------------------------
! pip install pytorch-lightning pytorch-lightning-bolts -qU
import torch import torch.nn as nn import torch.nn.functional as F from torch.optim.lr_scheduler import OneCycleLR, CyclicLR, ExponentialLR, CosineAnnealingLR from torch.optim.swa_utils import AveragedModel, update_bn import torchvision import pytorch_lightning as pl from pytorch_lightning.callbacks import LearningRateMonitor, GPUStatsMonitor, EarlyStopping from pytorch_lightning.metrics.functional import accuracy from pl_bolts.datamodules import CIFAR10DataModule from pl_bolts.transforms.dataset_normalizations import cifar10_normalization
pl.seed_everything(7);
Global seed set to 7
batch_size = 32 train_transforms = torchvision.transforms.Compose([ torchvision.transforms.RandomCrop(32, padding=4), torchvision.transforms.RandomHorizontalFlip(), torchvision.transforms.ToTensor(), cifar10_normalization(), ]) test_transforms = torchvision.transforms.Compose([ torchvision.transforms.ToTensor(), cifar10_normalization(), ]) cifar10_dm = CIFAR10DataModule( batch_size=batch_size, train_transforms=train_transforms, test_transforms=test_transforms, val_transforms=test_transforms, )
OneCycleLR
class LitCifar10MobileNetV2(pl.LightningModule): def __init__(self, lr=0.05): super().__init__() self.save_hyperparameters() self.model = MobileNetV2() def forward(self, x): out = self.model(x) return F.log_softmax(out, dim=1) def training_step(self, batch, batch_idx): x, y = batch logits = F.log_softmax(self.model(x), dim=1) loss = F.nll_loss(logits, y) self.log('train_loss', loss) return loss def evaluate(self, batch, stage=None): x, y = batch logits = self(x) loss = F.nll_loss(logits, y) preds = torch.argmax(logits, dim=1) acc = accuracy(preds, y) if stage: self.log(f'{stage}_loss', loss, prog_bar=True) self.log(f'{stage}_acc', acc, prog_bar=True) def validation_step(self, batch, batch_idx): self.evaluate(batch, 'val') def test_step(self, batch, batch_idx): self.evaluate(batch, 'test') def xconfigure_optimizers(self): optimizer = torch.optim.SGD(self.parameters(), lr=self.hparams.lr, momentum=0.9) #optimizer = torch.optim.SGD(self.parameters(), lr=self.hparams.lr, momentum=0.9, weight_decay=5e-4) scheduler_dict = { 'scheduler': CyclicLR(optimizer, base_lr=0.001, max_lr=0.1,step_size_up=5, mode="triangular2"), 'interval': 'step', } return {'optimizer': optimizer, 'lr_scheduler': scheduler_dict} def configure_optimizers(self): print("###") print(self.hparams) optimizer = torch.optim.SGD(self.parameters(), lr=self.hparams.lr, momentum=0.9, weight_decay=5e-4) steps_per_epoch = 45000 // batch_size scheduler_dict = { #'scheduler': ExponentialLR(optimizer, gamma=0.1), #'interval': 'epoch', 'scheduler': OneCycleLR(optimizer, max_lr=0.1, pct_start=0.1, epochs=self.trainer.max_epochs, steps_per_epoch=steps_per_epoch), #'scheduler': OneCycleLR(optimizer, max_lr=0.1, pct_start=0.25, epochs=self.trainer.max_epochs, steps_per_epoch=steps_per_epoch), 'interval': 'step', } return {'optimizer': optimizer, 'lr_scheduler': scheduler_dict}
%%time model = LitCifar10MobileNetV2(lr=0.05) model.datamodule = cifar10_dm lr_monitor = LearningRateMonitor(logging_interval='step') gpu_stats = GPUStatsMonitor() #early_stopping = EarlyStopping(monitor='val_loss', patience=3) trainer = pl.Trainer( gpus=1, max_epochs=50, #auto_scale_batch_size=True, #auto_lr_find = True, progress_bar_refresh_rate=50, logger=pl.loggers.TensorBoardLogger('lightning_logs/', name='mobilenet2'), callbacks=[lr_monitor, gpu_stats], ) trainer.fit(model, cifar10_dm) trainer.test(model, datamodule=cifar10_dm);
GPU available: True, used: True TPU available: None, using: 0 TPU cores | Name | Type | Params -------------------------------------- 0 | model | MobileNetV2 | 2.3 M -------------------------------------- 2.3 M Trainable params 0 Non-trainable params 2.3 M Total params 9.188 Total estimated model params size (MB) ... -------------------------------------------------------------------------------- DATALOADER:0 TEST RESULTS {'test_acc': 0.8402000069618225, 'test_loss': 0.4621441066265106} -------------------------------------------------------------------------------- CPU times: user 47min 20s, sys: 4min 17s, total: 51min 37s Wall time: 56min 25s
# Start tensorboard. %reload_ext tensorboard %tensorboard --logdir lightning_logs/
CyclicLR
class LitCifar10MobileNetV2(pl.LightningModule): def __init__(self, lr=0.05): super().__init__() self.save_hyperparameters() self.model = MobileNetV2() def forward(self, x): out = self.model(x) return F.log_softmax(out, dim=1) def training_step(self, batch, batch_idx): x, y = batch logits = F.log_softmax(self.model(x), dim=1) loss = F.nll_loss(logits, y) self.log('train_loss', loss) return loss def evaluate(self, batch, stage=None): x, y = batch logits = self(x) loss = F.nll_loss(logits, y) preds = torch.argmax(logits, dim=1) acc = accuracy(preds, y) if stage: self.log(f'{stage}_loss', loss, prog_bar=True) self.log(f'{stage}_acc', acc, prog_bar=True) def validation_step(self, batch, batch_idx): self.evaluate(batch, 'val') def test_step(self, batch, batch_idx): self.evaluate(batch, 'test') def configure_optimizers(self): #print("###") #print(self.hparams) #optimizer = torch.optim.RMSprop(self.parameters(), lr=self.hparams.lr, momentum=0.9) #optimizer = torch.optim.AdamW(self.parameters(), lr=self.hparams.lr) optimizer = torch.optim.SGD(self.parameters(), lr=self.hparams.lr, momentum=0.9, weight_decay=5e-4) steps_per_epoch = 45000 // batch_size scheduler_dict = { #'scheduler': ExponentialLR(optimizer, gamma=0.1), #'interval': 'epoch', #'scheduler': OneCycleLR(optimizer, max_lr=0.05, pct_start=0.25, epochs=self.trainer.max_epochs, steps_per_epoch=steps_per_epoch), #'#scheduler': CyclicLR(optimizer, base_lr=0.0001, max_lr=0.05,step_size_up=steps_per_epoch*2,mode="triangular2"), 'scheduler': CyclicLR(optimizer, base_lr=0.0001, max_lr=0.1, step_size_up=steps_per_epoch*2, mode="triangular2"), #'scheduler': CosineAnnealingLR(optimizer, T_max=200), 'interval': 'step', } return {'optimizer': optimizer, 'lr_scheduler': scheduler_dict}
%%time model = LitCifar10MobileNetV2(lr=0.05) model.datamodule = cifar10_dm lr_monitor = LearningRateMonitor(logging_interval='step') #gpu_stats = GPUStatsMonitor() #early_stopping = EarlyStopping(monitor='val_loss', patience=3) trainer = pl.Trainer( gpus=1, max_epochs=50, #auto_scale_batch_size=True, #auto_lr_find = True, progress_bar_refresh_rate=50, logger=pl.loggers.TensorBoardLogger('lightning_logs/', name='mobilenet2'), callbacks=[lr_monitor], # deterministic=True ) trainer.fit(model, cifar10_dm) trainer.test(model, datamodule=cifar10_dm);
GPU available: True, used: True TPU available: None, using: 0 TPU cores | Name | Type | Params -------------------------------------- 0 | model | MobileNetV2 | 2.3 M -------------------------------------- 2.3 M Trainable params 0 Non-trainable params 2.3 M Total params 9.188 Total estimated model params size (MB) ... -------------------------------------------------------------------------------- DATALOADER:0 TEST RESULTS {'test_acc': 0.906499981880188, 'test_loss': 0.3456864356994629} -------------------------------------------------------------------------------- CPU times: user 46min 57s, sys: 2min 25s, total: 49min 23s Wall time: 51min 54s
batch_size = 50 train_transforms = torchvision.transforms.Compose([ torchvision.transforms.RandomCrop(32, padding=4), torchvision.transforms.RandomHorizontalFlip(), torchvision.transforms.ToTensor(), cifar10_normalization(), ]) test_transforms = torchvision.transforms.Compose([ torchvision.transforms.ToTensor(), cifar10_normalization(), ]) cifar10_dm = CIFAR10DataModule( batch_size=batch_size, train_transforms=train_transforms, test_transforms=test_transforms, val_transforms=test_transforms, )
class LitCifar10(pl.LightningModule): def __init__(self, lr=0.05): super().__init__() self.save_hyperparameters() self.model = MobileNet() def forward(self, x): out = self.model(x) return F.log_softmax(out, dim=1) def training_step(self, batch, batch_idx): x, y = batch logits = F.log_softmax(self.model(x), dim=1) loss = F.nll_loss(logits, y) self.log('train_loss', loss) return loss def evaluate(self, batch, stage=None): x, y = batch logits = self(x) loss = F.nll_loss(logits, y) preds = torch.argmax(logits, dim=1) acc = accuracy(preds, y) if stage: self.log(f'{stage}_loss', loss, prog_bar=True) self.log(f'{stage}_acc', acc, prog_bar=True) def validation_step(self, batch, batch_idx): self.evaluate(batch, 'val') def test_step(self, batch, batch_idx): self.evaluate(batch, 'test') def configure_optimizers(self): if False: optimizer = torch.optim.Adam(self.parameters(), lr=self.hparams.lr, weight_decay=0, eps=1e-3) else: optimizer = torch.optim.SGD(self.parameters(), lr=self.hparams.lr, momentum=0.9, weight_decay=5e-4) return { 'optimizer': optimizer, 'lr_scheduler': ReduceLROnPlateau(optimizer, 'max', patience=4, factor=0.8, verbose=True, threshold=0.0001, threshold_mode='abs', cooldown=1, min_lr=1e-5), 'monitor': 'val_acc' } def xconfigure_optimizers(self): #print("###") #print(self.hparams) optimizer = torch.optim.SGD(self.parameters(), lr=self.hparams.lr, momentum=0.9, weight_decay=5e-4) steps_per_epoch = 45000 // batch_size scheduler_dict = { #'scheduler': ExponentialLR(optimizer, gamma=0.1), #'interval': 'epoch', 'scheduler': OneCycleLR(optimizer, max_lr=0.1, pct_start=0.2, epochs=self.trainer.max_epochs, steps_per_epoch=steps_per_epoch), #'scheduler': CyclicLR(optimizer, base_lr=0.001, max_lr=0.1, step_size_up=steps_per_epoch*2, mode="triangular2"), #'scheduler': CyclicLR(optimizer, base_lr=0.001, max_lr=0.1, step_size_up=steps_per_epoch, mode="exp_range", gamma=0.85), #'scheduler': CosineAnnealingLR(optimizer, T_max=200), 'interval': 'step', } return {'optimizer': optimizer, 'lr_scheduler': scheduler_dict}
%%time model = LitCifar10(lr=0.05) model.datamodule = cifar10_dm trainer = pl.Trainer( gpus=1, max_epochs=150, auto_scale_batch_size=True, auto_lr_find = True, progress_bar_refresh_rate=100, logger=pl.loggers.TensorBoardLogger('tblogs/', name='mobilenet'), callbacks=[LearningRateMonitor(logging_interval='step')], ) trainer.fit(model, cifar10_dm) trainer.test(model, datamodule=cifar10_dm);
GPU available: True, used: True TPU available: None, using: 0 TPU cores Files already downloaded and verified Files already downloaded and verified | Name | Type | Params ------------------------------------ 0 | model | MobileNet | 3.2 M ------------------------------------ 3.2 M Trainable params 0 Non-trainable params 3.2 M Total params 12.869 Total estimated model params size (MB) (...) Epoch 11: reducing learning rate of group 0 to 4.0000e-02. Epoch 19: reducing learning rate of group 0 to 3.2000e-02. Epoch 29: reducing learning rate of group 0 to 2.5600e-02. Epoch 37: reducing learning rate of group 0 to 2.0480e-02. Epoch 51: reducing learning rate of group 0 to 1.6384e-02. Epoch 58: reducing learning rate of group 0 to 1.3107e-02. Epoch 69: reducing learning rate of group 0 to 1.0486e-02. Epoch 79: reducing learning rate of group 0 to 8.3886e-03. Epoch 88: reducing learning rate of group 0 to 6.7109e-03. Epoch 95: reducing learning rate of group 0 to 5.3687e-03. Epoch 102: reducing learning rate of group 0 to 4.2950e-03. Epoch 117: reducing learning rate of group 0 to 3.4360e-03. Epoch 124: reducing learning rate of group 0 to 2.7488e-03. Epoch 132: reducing learning rate of group 0 to 2.1990e-03. Epoch 138: reducing learning rate of group 0 to 1.7592e-03. Epoch 144: reducing learning rate of group 0 to 1.4074e-03. -------------------------------------------------------------------------------- DATALOADER:0 TEST RESULTS {'test_acc': 0.8924999833106995, 'test_loss': 0.4093822240829468} -------------------------------------------------------------------------------- CPU times: user 1h 45min 54s, sys: 14min 37s, total: 2h 31s Wall time: 2h 2min 19s
class LitCifar10(pl.LightningModule): def __init__(self, lr=0.05): super().__init__() self.save_hyperparameters() self.model = MobileNetV2() def forward(self, x): out = self.model(x) return F.log_softmax(out, dim=1) def training_step(self, batch, batch_idx): x, y = batch logits = F.log_softmax(self.model(x), dim=1) loss = F.nll_loss(logits, y) self.log('train_loss', loss) return loss def evaluate(self, batch, stage=None): x, y = batch logits = self(x) loss = F.nll_loss(logits, y) preds = torch.argmax(logits, dim=1) acc = accuracy(preds, y) if stage: self.log(f'{stage}_loss', loss, prog_bar=True) self.log(f'{stage}_acc', acc, prog_bar=True) def validation_step(self, batch, batch_idx): self.evaluate(batch, 'val') def test_step(self, batch, batch_idx): self.evaluate(batch, 'test') def configure_optimizers(self): if False: optimizer = torch.optim.Adam(self.parameters(), lr=self.hparams.lr, weight_decay=0, eps=1e-3) else: optimizer = torch.optim.SGD(self.parameters(), lr=self.hparams.lr, momentum=0.9, weight_decay=5e-4) return { 'optimizer': optimizer, 'lr_scheduler': ReduceLROnPlateau(optimizer, 'max', patience=4, factor=0.8, verbose=True, threshold=0.0001, threshold_mode='abs', cooldown=1, min_lr=1e-5), 'monitor': 'val_acc' } def xconfigure_optimizers(self): #print("###") #print(self.hparams) optimizer = torch.optim.SGD(self.parameters(), lr=self.hparams.lr, momentum=0.9, weight_decay=5e-4) steps_per_epoch = 45000 // batch_size scheduler_dict = { #'scheduler': ExponentialLR(optimizer, gamma=0.1), #'interval': 'epoch', 'scheduler': OneCycleLR(optimizer, max_lr=0.1, pct_start=0.2, epochs=self.trainer.max_epochs, steps_per_epoch=steps_per_epoch), #'scheduler': CyclicLR(optimizer, base_lr=0.001, max_lr=0.1, step_size_up=steps_per_epoch*2, mode="triangular2"), #'scheduler': CyclicLR(optimizer, base_lr=0.001, max_lr=0.1, step_size_up=steps_per_epoch, mode="exp_range", gamma=0.85), #'scheduler': CosineAnnealingLR(optimizer, T_max=200), 'interval': 'step', } return {'optimizer': optimizer, 'lr_scheduler': scheduler_dict}
%%time model = LitCifar10(lr=0.05) model.datamodule = cifar10_dm trainer = pl.Trainer( gpus=1, max_epochs=150, auto_scale_batch_size=True, auto_lr_find = True, progress_bar_refresh_rate=100, logger=pl.loggers.TensorBoardLogger('tblogs/', name='mobilenet2'), callbacks=[LearningRateMonitor(logging_interval='step')], ) trainer.fit(model, cifar10_dm) trainer.test(model, datamodule=cifar10_dm);
GPU available: True, used: True TPU available: None, using: 0 TPU cores Files already downloaded and verified Files already downloaded and verified | Name | Type | Params -------------------------------------- 0 | model | MobileNetV2 | 2.3 M -------------------------------------- 2.3 M Trainable params 0 Non-trainable params 2.3 M Total params 9.188 Total estimated model params size (MB) (...) Epoch 16: reducing learning rate of group 0 to 4.0000e-02. Epoch 29: reducing learning rate of group 0 to 3.2000e-02. Epoch 39: reducing learning rate of group 0 to 2.5600e-02. Epoch 45: reducing learning rate of group 0 to 2.0480e-02. Epoch 51: reducing learning rate of group 0 to 1.6384e-02. Epoch 60: reducing learning rate of group 0 to 1.3107e-02. Epoch 69: reducing learning rate of group 0 to 1.0486e-02. Epoch 79: reducing learning rate of group 0 to 8.3886e-03. Epoch 85: reducing learning rate of group 0 to 6.7109e-03. Epoch 96: reducing learning rate of group 0 to 5.3687e-03. Epoch 102: reducing learning rate of group 0 to 4.2950e-03. Epoch 113: reducing learning rate of group 0 to 3.4360e-03. Epoch 119: reducing learning rate of group 0 to 2.7488e-03. Epoch 128: reducing learning rate of group 0 to 2.1990e-03. Epoch 135: reducing learning rate of group 0 to 1.7592e-03. Epoch 143: reducing learning rate of group 0 to 1.4074e-03. Epoch 149: reducing learning rate of group 0 to 1.1259e-03. (...) -------------------------------------------------------------------------------- DATALOADER:0 TEST RESULTS {'test_acc': 0.9147999882698059, 'test_loss': 0.3481239378452301} -------------------------------------------------------------------------------- CPU times: user 3h 55min 40s, sys: 41min 56s, total: 4h 37min 36s Wall time: 4h 39min 26s
以上