PyTorch Lightning 1.1: research : CIFAR10 (MobileNet)
作成 : (株)クラスキャット セールスインフォメーション
作成日時 : 02/20/2021 (1.1.x)
* 本ページは、以下のリソースを参考にして遂行した実験結果のレポートです:
* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。
★ 無料セミナー実施中 ★ クラスキャット主催 人工知能 & ビジネス Web セミナー

人工知能とビジネスをテーマにウェビナー (WEB セミナー) を定期的に開催しています。スケジュールは弊社 公式 Web サイト でご確認頂けます。
- お住まいの地域に関係なく Web ブラウザからご参加頂けます。事前登録 が必要ですのでご注意ください。
- Windows PC のブラウザからご参加が可能です。スマートデバイスもご利用可能です。
クラスキャットは人工知能・テレワークに関する各種サービスを提供しております :
| 人工知能研究開発支援 | 人工知能研修サービス | テレワーク & オンライン授業を支援 |
| PoC(概念実証)を失敗させないための支援 (本支援はセミナーに参加しアンケートに回答した方を対象としています。) | ||
◆ お問合せ : 本件に関するお問い合わせ先は下記までお願いいたします。
| 株式会社クラスキャット セールス・マーケティング本部 セールス・インフォメーション |
| E-Mail:sales-info@classcat.com ; WebSite: https://www.classcat.com/ |
| Facebook: https://www.facebook.com/ClassCatJP/ |
research: CIFAR10 (MobileNet)
結果
50 エポック : OneCycleLR
- MobileNetV2 – {‘test_acc’: 0.8402000069618225, ‘test_loss’: 0.4621441066265106} – Wall time: 56min 25s
50 エポック : CyclicLR
- MobileNetV2 – {‘test_acc’: 0.906499981880188, ‘test_loss’: 0.3456864356994629} – Wall time: 51min 54s
150 エポック : ReduceLROnPlateau
- MobileNet – {‘test_acc’: 0.8924999833106995, ‘test_loss’: 0.4093822240829468} – Wall time: 2h 2min 19s (Tesla K80)
- MobileNetV2 – {‘test_acc’: 0.9147999882698059, ‘test_loss’: 0.3481239378452301} – Wall time: 4h 39min 26s (Tesla K80)
コード
import torch
import torch.nn as nn
import torch.nn.functional as F
class Block(nn.Module):
'''expand + depthwise + pointwise'''
def __init__(self, in_planes, out_planes, expansion, stride):
super(Block, self).__init__()
self.stride = stride
planes = expansion * in_planes
self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, stride=1, padding=0, bias=False)
self.bn1 = nn.BatchNorm2d(planes)
self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, padding=1, groups=planes, bias=False)
self.bn2 = nn.BatchNorm2d(planes)
self.conv3 = nn.Conv2d(planes, out_planes, kernel_size=1, stride=1, padding=0, bias=False)
self.bn3 = nn.BatchNorm2d(out_planes)
self.shortcut = nn.Sequential()
if stride == 1 and in_planes != out_planes:
self.shortcut = nn.Sequential(
nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=1, padding=0, bias=False),
nn.BatchNorm2d(out_planes),
)
def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
out = F.relu(self.bn2(self.conv2(out)))
out = self.bn3(self.conv3(out))
out = out + self.shortcut(x) if self.stride==1 else out
return out
class MobileNetV2(nn.Module):
# (expansion, out_planes, num_blocks, stride)
cfg = [(1, 16, 1, 1),
(6, 24, 2, 1), # NOTE: change stride 2 -> 1 for CIFAR10
(6, 32, 3, 2),
(6, 64, 4, 2),
(6, 96, 3, 1),
(6, 160, 3, 2),
(6, 320, 1, 1)]
def __init__(self, num_classes=10):
super(MobileNetV2, self).__init__()
# NOTE: change conv1 stride 2 -> 1 for CIFAR10
self.conv1 = nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(32)
self.layers = self._make_layers(in_planes=32)
self.conv2 = nn.Conv2d(320, 1280, kernel_size=1, stride=1, padding=0, bias=False)
self.bn2 = nn.BatchNorm2d(1280)
self.linear = nn.Linear(1280, num_classes)
def _make_layers(self, in_planes):
layers = []
for expansion, out_planes, num_blocks, stride in self.cfg:
strides = [stride] + [1]*(num_blocks-1)
for stride in strides:
layers.append(Block(in_planes, out_planes, expansion, stride))
in_planes = out_planes
return nn.Sequential(*layers)
def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
out = self.layers(out)
out = F.relu(self.bn2(self.conv2(out)))
# NOTE: change pooling kernel_size 7 -> 4 for CIFAR10
out = F.avg_pool2d(out, 4)
out = out.view(out.size(0), -1)
out = self.linear(out)
return out
net = MobileNetV2() print(net) x = torch.randn(2,3,32,32) y = net(x) print(y.size())
MobileNetV2(
(conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(layers): Sequential(
(0): Block(
(conv1): Conv2d(32, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
(bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(shortcut): Sequential(
(0): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Block(
(conv1): Conv2d(16, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=96, bias=False)
(bn2): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(shortcut): Sequential(
(0): Conv2d(16, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(2): Block(
(conv1): Conv2d(24, 144, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(144, 144, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=144, bias=False)
(bn2): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(144, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(shortcut): Sequential()
)
(3): Block(
(conv1): Conv2d(24, 144, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(144, 144, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=144, bias=False)
(bn2): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(144, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(shortcut): Sequential()
)
(4): Block(
(conv1): Conv2d(32, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192, bias=False)
(bn2): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(shortcut): Sequential()
)
(5): Block(
(conv1): Conv2d(32, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192, bias=False)
(bn2): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(shortcut): Sequential()
)
(6): Block(
(conv1): Conv2d(32, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(192, 192, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=192, bias=False)
(bn2): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(192, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(shortcut): Sequential()
)
(7): Block(
(conv1): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384, bias=False)
(bn2): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(384, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(shortcut): Sequential()
)
(8): Block(
(conv1): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384, bias=False)
(bn2): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(384, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(shortcut): Sequential()
)
(9): Block(
(conv1): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384, bias=False)
(bn2): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(384, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(shortcut): Sequential()
)
(10): Block(
(conv1): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384, bias=False)
(bn2): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(384, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(shortcut): Sequential(
(0): Conv2d(64, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(11): Block(
(conv1): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(576, 576, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=576, bias=False)
(bn2): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(576, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(shortcut): Sequential()
)
(12): Block(
(conv1): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(576, 576, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=576, bias=False)
(bn2): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(576, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(shortcut): Sequential()
)
(13): Block(
(conv1): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(576, 576, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=576, bias=False)
(bn2): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(576, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(shortcut): Sequential()
)
(14): Block(
(conv1): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False)
(bn2): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(shortcut): Sequential()
)
(15): Block(
(conv1): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False)
(bn2): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(shortcut): Sequential()
)
(16): Block(
(conv1): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False)
(bn2): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(960, 320, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(shortcut): Sequential(
(0): Conv2d(160, 320, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(conv2): Conv2d(320, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn2): BatchNorm2d(1280, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(linear): Linear(in_features=1280, out_features=10, bias=True)
)
torch.Size([2, 10])
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device
device(type='cuda')
from torchsummary import summary
summary(MobileNetV2().to('cuda'), (3, 32, 32))
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 32, 32, 32] 864
BatchNorm2d-2 [-1, 32, 32, 32] 64
Conv2d-3 [-1, 32, 32, 32] 1,024
BatchNorm2d-4 [-1, 32, 32, 32] 64
Conv2d-5 [-1, 32, 32, 32] 288
BatchNorm2d-6 [-1, 32, 32, 32] 64
Conv2d-7 [-1, 16, 32, 32] 512
BatchNorm2d-8 [-1, 16, 32, 32] 32
Conv2d-9 [-1, 16, 32, 32] 512
BatchNorm2d-10 [-1, 16, 32, 32] 32
Block-11 [-1, 16, 32, 32] 0
Conv2d-12 [-1, 96, 32, 32] 1,536
BatchNorm2d-13 [-1, 96, 32, 32] 192
Conv2d-14 [-1, 96, 32, 32] 864
BatchNorm2d-15 [-1, 96, 32, 32] 192
Conv2d-16 [-1, 24, 32, 32] 2,304
BatchNorm2d-17 [-1, 24, 32, 32] 48
Conv2d-18 [-1, 24, 32, 32] 384
BatchNorm2d-19 [-1, 24, 32, 32] 48
Block-20 [-1, 24, 32, 32] 0
Conv2d-21 [-1, 144, 32, 32] 3,456
BatchNorm2d-22 [-1, 144, 32, 32] 288
Conv2d-23 [-1, 144, 32, 32] 1,296
BatchNorm2d-24 [-1, 144, 32, 32] 288
Conv2d-25 [-1, 24, 32, 32] 3,456
BatchNorm2d-26 [-1, 24, 32, 32] 48
Block-27 [-1, 24, 32, 32] 0
Conv2d-28 [-1, 144, 32, 32] 3,456
BatchNorm2d-29 [-1, 144, 32, 32] 288
Conv2d-30 [-1, 144, 16, 16] 1,296
BatchNorm2d-31 [-1, 144, 16, 16] 288
Conv2d-32 [-1, 32, 16, 16] 4,608
BatchNorm2d-33 [-1, 32, 16, 16] 64
Block-34 [-1, 32, 16, 16] 0
Conv2d-35 [-1, 192, 16, 16] 6,144
BatchNorm2d-36 [-1, 192, 16, 16] 384
Conv2d-37 [-1, 192, 16, 16] 1,728
BatchNorm2d-38 [-1, 192, 16, 16] 384
Conv2d-39 [-1, 32, 16, 16] 6,144
BatchNorm2d-40 [-1, 32, 16, 16] 64
Block-41 [-1, 32, 16, 16] 0
Conv2d-42 [-1, 192, 16, 16] 6,144
BatchNorm2d-43 [-1, 192, 16, 16] 384
Conv2d-44 [-1, 192, 16, 16] 1,728
BatchNorm2d-45 [-1, 192, 16, 16] 384
Conv2d-46 [-1, 32, 16, 16] 6,144
BatchNorm2d-47 [-1, 32, 16, 16] 64
Block-48 [-1, 32, 16, 16] 0
Conv2d-49 [-1, 192, 16, 16] 6,144
BatchNorm2d-50 [-1, 192, 16, 16] 384
Conv2d-51 [-1, 192, 8, 8] 1,728
BatchNorm2d-52 [-1, 192, 8, 8] 384
Conv2d-53 [-1, 64, 8, 8] 12,288
BatchNorm2d-54 [-1, 64, 8, 8] 128
Block-55 [-1, 64, 8, 8] 0
Conv2d-56 [-1, 384, 8, 8] 24,576
BatchNorm2d-57 [-1, 384, 8, 8] 768
Conv2d-58 [-1, 384, 8, 8] 3,456
BatchNorm2d-59 [-1, 384, 8, 8] 768
Conv2d-60 [-1, 64, 8, 8] 24,576
BatchNorm2d-61 [-1, 64, 8, 8] 128
Block-62 [-1, 64, 8, 8] 0
Conv2d-63 [-1, 384, 8, 8] 24,576
BatchNorm2d-64 [-1, 384, 8, 8] 768
Conv2d-65 [-1, 384, 8, 8] 3,456
BatchNorm2d-66 [-1, 384, 8, 8] 768
Conv2d-67 [-1, 64, 8, 8] 24,576
BatchNorm2d-68 [-1, 64, 8, 8] 128
Block-69 [-1, 64, 8, 8] 0
Conv2d-70 [-1, 384, 8, 8] 24,576
BatchNorm2d-71 [-1, 384, 8, 8] 768
Conv2d-72 [-1, 384, 8, 8] 3,456
BatchNorm2d-73 [-1, 384, 8, 8] 768
Conv2d-74 [-1, 64, 8, 8] 24,576
BatchNorm2d-75 [-1, 64, 8, 8] 128
Block-76 [-1, 64, 8, 8] 0
Conv2d-77 [-1, 384, 8, 8] 24,576
BatchNorm2d-78 [-1, 384, 8, 8] 768
Conv2d-79 [-1, 384, 8, 8] 3,456
BatchNorm2d-80 [-1, 384, 8, 8] 768
Conv2d-81 [-1, 96, 8, 8] 36,864
BatchNorm2d-82 [-1, 96, 8, 8] 192
Conv2d-83 [-1, 96, 8, 8] 6,144
BatchNorm2d-84 [-1, 96, 8, 8] 192
Block-85 [-1, 96, 8, 8] 0
Conv2d-86 [-1, 576, 8, 8] 55,296
BatchNorm2d-87 [-1, 576, 8, 8] 1,152
Conv2d-88 [-1, 576, 8, 8] 5,184
BatchNorm2d-89 [-1, 576, 8, 8] 1,152
Conv2d-90 [-1, 96, 8, 8] 55,296
BatchNorm2d-91 [-1, 96, 8, 8] 192
Block-92 [-1, 96, 8, 8] 0
Conv2d-93 [-1, 576, 8, 8] 55,296
BatchNorm2d-94 [-1, 576, 8, 8] 1,152
Conv2d-95 [-1, 576, 8, 8] 5,184
BatchNorm2d-96 [-1, 576, 8, 8] 1,152
Conv2d-97 [-1, 96, 8, 8] 55,296
BatchNorm2d-98 [-1, 96, 8, 8] 192
Block-99 [-1, 96, 8, 8] 0
Conv2d-100 [-1, 576, 8, 8] 55,296
BatchNorm2d-101 [-1, 576, 8, 8] 1,152
Conv2d-102 [-1, 576, 4, 4] 5,184
BatchNorm2d-103 [-1, 576, 4, 4] 1,152
Conv2d-104 [-1, 160, 4, 4] 92,160
BatchNorm2d-105 [-1, 160, 4, 4] 320
Block-106 [-1, 160, 4, 4] 0
Conv2d-107 [-1, 960, 4, 4] 153,600
BatchNorm2d-108 [-1, 960, 4, 4] 1,920
Conv2d-109 [-1, 960, 4, 4] 8,640
BatchNorm2d-110 [-1, 960, 4, 4] 1,920
Conv2d-111 [-1, 160, 4, 4] 153,600
BatchNorm2d-112 [-1, 160, 4, 4] 320
Block-113 [-1, 160, 4, 4] 0
Conv2d-114 [-1, 960, 4, 4] 153,600
BatchNorm2d-115 [-1, 960, 4, 4] 1,920
Conv2d-116 [-1, 960, 4, 4] 8,640
BatchNorm2d-117 [-1, 960, 4, 4] 1,920
Conv2d-118 [-1, 160, 4, 4] 153,600
BatchNorm2d-119 [-1, 160, 4, 4] 320
Block-120 [-1, 160, 4, 4] 0
Conv2d-121 [-1, 960, 4, 4] 153,600
BatchNorm2d-122 [-1, 960, 4, 4] 1,920
Conv2d-123 [-1, 960, 4, 4] 8,640
BatchNorm2d-124 [-1, 960, 4, 4] 1,920
Conv2d-125 [-1, 320, 4, 4] 307,200
BatchNorm2d-126 [-1, 320, 4, 4] 640
Conv2d-127 [-1, 320, 4, 4] 51,200
BatchNorm2d-128 [-1, 320, 4, 4] 640
Block-129 [-1, 320, 4, 4] 0
Conv2d-130 [-1, 1280, 4, 4] 409,600
BatchNorm2d-131 [-1, 1280, 4, 4] 2,560
Linear-132 [-1, 10] 12,810
================================================================
Total params: 2,296,922
Trainable params: 2,296,922
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.01
Forward/backward pass size (MB): 27.37
Params size (MB): 8.76
Estimated Total Size (MB): 36.14
----------------------------------------------------------------
! pip install pytorch-lightning pytorch-lightning-bolts -qU
import torch import torch.nn as nn import torch.nn.functional as F from torch.optim.lr_scheduler import OneCycleLR, CyclicLR, ExponentialLR, CosineAnnealingLR from torch.optim.swa_utils import AveragedModel, update_bn import torchvision import pytorch_lightning as pl from pytorch_lightning.callbacks import LearningRateMonitor, GPUStatsMonitor, EarlyStopping from pytorch_lightning.metrics.functional import accuracy from pl_bolts.datamodules import CIFAR10DataModule from pl_bolts.transforms.dataset_normalizations import cifar10_normalization
pl.seed_everything(7);
Global seed set to 7
batch_size = 32
train_transforms = torchvision.transforms.Compose([
torchvision.transforms.RandomCrop(32, padding=4),
torchvision.transforms.RandomHorizontalFlip(),
torchvision.transforms.ToTensor(),
cifar10_normalization(),
])
test_transforms = torchvision.transforms.Compose([
torchvision.transforms.ToTensor(),
cifar10_normalization(),
])
cifar10_dm = CIFAR10DataModule(
batch_size=batch_size,
train_transforms=train_transforms,
test_transforms=test_transforms,
val_transforms=test_transforms,
)
OneCycleLR
class LitCifar10MobileNetV2(pl.LightningModule):
def __init__(self, lr=0.05):
super().__init__()
self.save_hyperparameters()
self.model = MobileNetV2()
def forward(self, x):
out = self.model(x)
return F.log_softmax(out, dim=1)
def training_step(self, batch, batch_idx):
x, y = batch
logits = F.log_softmax(self.model(x), dim=1)
loss = F.nll_loss(logits, y)
self.log('train_loss', loss)
return loss
def evaluate(self, batch, stage=None):
x, y = batch
logits = self(x)
loss = F.nll_loss(logits, y)
preds = torch.argmax(logits, dim=1)
acc = accuracy(preds, y)
if stage:
self.log(f'{stage}_loss', loss, prog_bar=True)
self.log(f'{stage}_acc', acc, prog_bar=True)
def validation_step(self, batch, batch_idx):
self.evaluate(batch, 'val')
def test_step(self, batch, batch_idx):
self.evaluate(batch, 'test')
def xconfigure_optimizers(self):
optimizer = torch.optim.SGD(self.parameters(), lr=self.hparams.lr, momentum=0.9)
#optimizer = torch.optim.SGD(self.parameters(), lr=self.hparams.lr, momentum=0.9, weight_decay=5e-4)
scheduler_dict = {
'scheduler': CyclicLR(optimizer, base_lr=0.001, max_lr=0.1,step_size_up=5, mode="triangular2"),
'interval': 'step',
}
return {'optimizer': optimizer, 'lr_scheduler': scheduler_dict}
def configure_optimizers(self):
print("###")
print(self.hparams)
optimizer = torch.optim.SGD(self.parameters(), lr=self.hparams.lr, momentum=0.9, weight_decay=5e-4)
steps_per_epoch = 45000 // batch_size
scheduler_dict = {
#'scheduler': ExponentialLR(optimizer, gamma=0.1),
#'interval': 'epoch',
'scheduler': OneCycleLR(optimizer, max_lr=0.1, pct_start=0.1, epochs=self.trainer.max_epochs, steps_per_epoch=steps_per_epoch),
#'scheduler': OneCycleLR(optimizer, max_lr=0.1, pct_start=0.25, epochs=self.trainer.max_epochs, steps_per_epoch=steps_per_epoch),
'interval': 'step',
}
return {'optimizer': optimizer, 'lr_scheduler': scheduler_dict}
%%time
model = LitCifar10MobileNetV2(lr=0.05)
model.datamodule = cifar10_dm
lr_monitor = LearningRateMonitor(logging_interval='step')
gpu_stats = GPUStatsMonitor()
#early_stopping = EarlyStopping(monitor='val_loss', patience=3)
trainer = pl.Trainer(
gpus=1,
max_epochs=50,
#auto_scale_batch_size=True,
#auto_lr_find = True,
progress_bar_refresh_rate=50,
logger=pl.loggers.TensorBoardLogger('lightning_logs/', name='mobilenet2'),
callbacks=[lr_monitor, gpu_stats],
)
trainer.fit(model, cifar10_dm)
trainer.test(model, datamodule=cifar10_dm);
GPU available: True, used: True
TPU available: None, using: 0 TPU cores
| Name | Type | Params
--------------------------------------
0 | model | MobileNetV2 | 2.3 M
--------------------------------------
2.3 M Trainable params
0 Non-trainable params
2.3 M Total params
9.188 Total estimated model params size (MB)
...
--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': 0.8402000069618225, 'test_loss': 0.4621441066265106}
--------------------------------------------------------------------------------
CPU times: user 47min 20s, sys: 4min 17s, total: 51min 37s
Wall time: 56min 25s
# Start tensorboard. %reload_ext tensorboard %tensorboard --logdir lightning_logs/




CyclicLR
class LitCifar10MobileNetV2(pl.LightningModule):
def __init__(self, lr=0.05):
super().__init__()
self.save_hyperparameters()
self.model = MobileNetV2()
def forward(self, x):
out = self.model(x)
return F.log_softmax(out, dim=1)
def training_step(self, batch, batch_idx):
x, y = batch
logits = F.log_softmax(self.model(x), dim=1)
loss = F.nll_loss(logits, y)
self.log('train_loss', loss)
return loss
def evaluate(self, batch, stage=None):
x, y = batch
logits = self(x)
loss = F.nll_loss(logits, y)
preds = torch.argmax(logits, dim=1)
acc = accuracy(preds, y)
if stage:
self.log(f'{stage}_loss', loss, prog_bar=True)
self.log(f'{stage}_acc', acc, prog_bar=True)
def validation_step(self, batch, batch_idx):
self.evaluate(batch, 'val')
def test_step(self, batch, batch_idx):
self.evaluate(batch, 'test')
def configure_optimizers(self):
#print("###")
#print(self.hparams)
#optimizer = torch.optim.RMSprop(self.parameters(), lr=self.hparams.lr, momentum=0.9)
#optimizer = torch.optim.AdamW(self.parameters(), lr=self.hparams.lr)
optimizer = torch.optim.SGD(self.parameters(), lr=self.hparams.lr, momentum=0.9, weight_decay=5e-4)
steps_per_epoch = 45000 // batch_size
scheduler_dict = {
#'scheduler': ExponentialLR(optimizer, gamma=0.1),
#'interval': 'epoch',
#'scheduler': OneCycleLR(optimizer, max_lr=0.05, pct_start=0.25, epochs=self.trainer.max_epochs, steps_per_epoch=steps_per_epoch),
#'#scheduler': CyclicLR(optimizer, base_lr=0.0001, max_lr=0.05,step_size_up=steps_per_epoch*2,mode="triangular2"),
'scheduler': CyclicLR(optimizer, base_lr=0.0001, max_lr=0.1, step_size_up=steps_per_epoch*2, mode="triangular2"),
#'scheduler': CosineAnnealingLR(optimizer, T_max=200),
'interval': 'step',
}
return {'optimizer': optimizer, 'lr_scheduler': scheduler_dict}
%%time
model = LitCifar10MobileNetV2(lr=0.05)
model.datamodule = cifar10_dm
lr_monitor = LearningRateMonitor(logging_interval='step')
#gpu_stats = GPUStatsMonitor()
#early_stopping = EarlyStopping(monitor='val_loss', patience=3)
trainer = pl.Trainer(
gpus=1,
max_epochs=50,
#auto_scale_batch_size=True,
#auto_lr_find = True,
progress_bar_refresh_rate=50,
logger=pl.loggers.TensorBoardLogger('lightning_logs/', name='mobilenet2'),
callbacks=[lr_monitor],
# deterministic=True
)
trainer.fit(model, cifar10_dm)
trainer.test(model, datamodule=cifar10_dm);
GPU available: True, used: True
TPU available: None, using: 0 TPU cores
| Name | Type | Params
--------------------------------------
0 | model | MobileNetV2 | 2.3 M
--------------------------------------
2.3 M Trainable params
0 Non-trainable params
2.3 M Total params
9.188 Total estimated model params size (MB)
...
--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': 0.906499981880188, 'test_loss': 0.3456864356994629}
--------------------------------------------------------------------------------
CPU times: user 46min 57s, sys: 2min 25s, total: 49min 23s
Wall time: 51min 54s
![]()


batch_size = 50
train_transforms = torchvision.transforms.Compose([
torchvision.transforms.RandomCrop(32, padding=4),
torchvision.transforms.RandomHorizontalFlip(),
torchvision.transforms.ToTensor(),
cifar10_normalization(),
])
test_transforms = torchvision.transforms.Compose([
torchvision.transforms.ToTensor(),
cifar10_normalization(),
])
cifar10_dm = CIFAR10DataModule(
batch_size=batch_size,
train_transforms=train_transforms,
test_transforms=test_transforms,
val_transforms=test_transforms,
)
class LitCifar10(pl.LightningModule):
def __init__(self, lr=0.05):
super().__init__()
self.save_hyperparameters()
self.model = MobileNet()
def forward(self, x):
out = self.model(x)
return F.log_softmax(out, dim=1)
def training_step(self, batch, batch_idx):
x, y = batch
logits = F.log_softmax(self.model(x), dim=1)
loss = F.nll_loss(logits, y)
self.log('train_loss', loss)
return loss
def evaluate(self, batch, stage=None):
x, y = batch
logits = self(x)
loss = F.nll_loss(logits, y)
preds = torch.argmax(logits, dim=1)
acc = accuracy(preds, y)
if stage:
self.log(f'{stage}_loss', loss, prog_bar=True)
self.log(f'{stage}_acc', acc, prog_bar=True)
def validation_step(self, batch, batch_idx):
self.evaluate(batch, 'val')
def test_step(self, batch, batch_idx):
self.evaluate(batch, 'test')
def configure_optimizers(self):
if False:
optimizer = torch.optim.Adam(self.parameters(), lr=self.hparams.lr, weight_decay=0, eps=1e-3)
else:
optimizer = torch.optim.SGD(self.parameters(), lr=self.hparams.lr, momentum=0.9, weight_decay=5e-4)
return {
'optimizer': optimizer,
'lr_scheduler': ReduceLROnPlateau(optimizer, 'max', patience=4, factor=0.8, verbose=True, threshold=0.0001, threshold_mode='abs', cooldown=1, min_lr=1e-5),
'monitor': 'val_acc'
}
def xconfigure_optimizers(self):
#print("###")
#print(self.hparams)
optimizer = torch.optim.SGD(self.parameters(), lr=self.hparams.lr, momentum=0.9, weight_decay=5e-4)
steps_per_epoch = 45000 // batch_size
scheduler_dict = {
#'scheduler': ExponentialLR(optimizer, gamma=0.1),
#'interval': 'epoch',
'scheduler': OneCycleLR(optimizer, max_lr=0.1, pct_start=0.2, epochs=self.trainer.max_epochs, steps_per_epoch=steps_per_epoch),
#'scheduler': CyclicLR(optimizer, base_lr=0.001, max_lr=0.1, step_size_up=steps_per_epoch*2, mode="triangular2"),
#'scheduler': CyclicLR(optimizer, base_lr=0.001, max_lr=0.1, step_size_up=steps_per_epoch, mode="exp_range", gamma=0.85),
#'scheduler': CosineAnnealingLR(optimizer, T_max=200),
'interval': 'step',
}
return {'optimizer': optimizer, 'lr_scheduler': scheduler_dict}
%%time
model = LitCifar10(lr=0.05)
model.datamodule = cifar10_dm
trainer = pl.Trainer(
gpus=1,
max_epochs=150,
auto_scale_batch_size=True,
auto_lr_find = True,
progress_bar_refresh_rate=100,
logger=pl.loggers.TensorBoardLogger('tblogs/', name='mobilenet'),
callbacks=[LearningRateMonitor(logging_interval='step')],
)
trainer.fit(model, cifar10_dm)
trainer.test(model, datamodule=cifar10_dm);
GPU available: True, used: True
TPU available: None, using: 0 TPU cores
Files already downloaded and verified
Files already downloaded and verified
| Name | Type | Params
------------------------------------
0 | model | MobileNet | 3.2 M
------------------------------------
3.2 M Trainable params
0 Non-trainable params
3.2 M Total params
12.869 Total estimated model params size (MB)
(...)
Epoch 11: reducing learning rate of group 0 to 4.0000e-02.
Epoch 19: reducing learning rate of group 0 to 3.2000e-02.
Epoch 29: reducing learning rate of group 0 to 2.5600e-02.
Epoch 37: reducing learning rate of group 0 to 2.0480e-02.
Epoch 51: reducing learning rate of group 0 to 1.6384e-02.
Epoch 58: reducing learning rate of group 0 to 1.3107e-02.
Epoch 69: reducing learning rate of group 0 to 1.0486e-02.
Epoch 79: reducing learning rate of group 0 to 8.3886e-03.
Epoch 88: reducing learning rate of group 0 to 6.7109e-03.
Epoch 95: reducing learning rate of group 0 to 5.3687e-03.
Epoch 102: reducing learning rate of group 0 to 4.2950e-03.
Epoch 117: reducing learning rate of group 0 to 3.4360e-03.
Epoch 124: reducing learning rate of group 0 to 2.7488e-03.
Epoch 132: reducing learning rate of group 0 to 2.1990e-03.
Epoch 138: reducing learning rate of group 0 to 1.7592e-03.
Epoch 144: reducing learning rate of group 0 to 1.4074e-03.
--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': 0.8924999833106995, 'test_loss': 0.4093822240829468}
--------------------------------------------------------------------------------
CPU times: user 1h 45min 54s, sys: 14min 37s, total: 2h 31s
Wall time: 2h 2min 19s
class LitCifar10(pl.LightningModule):
def __init__(self, lr=0.05):
super().__init__()
self.save_hyperparameters()
self.model = MobileNetV2()
def forward(self, x):
out = self.model(x)
return F.log_softmax(out, dim=1)
def training_step(self, batch, batch_idx):
x, y = batch
logits = F.log_softmax(self.model(x), dim=1)
loss = F.nll_loss(logits, y)
self.log('train_loss', loss)
return loss
def evaluate(self, batch, stage=None):
x, y = batch
logits = self(x)
loss = F.nll_loss(logits, y)
preds = torch.argmax(logits, dim=1)
acc = accuracy(preds, y)
if stage:
self.log(f'{stage}_loss', loss, prog_bar=True)
self.log(f'{stage}_acc', acc, prog_bar=True)
def validation_step(self, batch, batch_idx):
self.evaluate(batch, 'val')
def test_step(self, batch, batch_idx):
self.evaluate(batch, 'test')
def configure_optimizers(self):
if False:
optimizer = torch.optim.Adam(self.parameters(), lr=self.hparams.lr, weight_decay=0, eps=1e-3)
else:
optimizer = torch.optim.SGD(self.parameters(), lr=self.hparams.lr, momentum=0.9, weight_decay=5e-4)
return {
'optimizer': optimizer,
'lr_scheduler': ReduceLROnPlateau(optimizer, 'max', patience=4, factor=0.8, verbose=True, threshold=0.0001, threshold_mode='abs', cooldown=1, min_lr=1e-5),
'monitor': 'val_acc'
}
def xconfigure_optimizers(self):
#print("###")
#print(self.hparams)
optimizer = torch.optim.SGD(self.parameters(), lr=self.hparams.lr, momentum=0.9, weight_decay=5e-4)
steps_per_epoch = 45000 // batch_size
scheduler_dict = {
#'scheduler': ExponentialLR(optimizer, gamma=0.1),
#'interval': 'epoch',
'scheduler': OneCycleLR(optimizer, max_lr=0.1, pct_start=0.2, epochs=self.trainer.max_epochs, steps_per_epoch=steps_per_epoch),
#'scheduler': CyclicLR(optimizer, base_lr=0.001, max_lr=0.1, step_size_up=steps_per_epoch*2, mode="triangular2"),
#'scheduler': CyclicLR(optimizer, base_lr=0.001, max_lr=0.1, step_size_up=steps_per_epoch, mode="exp_range", gamma=0.85),
#'scheduler': CosineAnnealingLR(optimizer, T_max=200),
'interval': 'step',
}
return {'optimizer': optimizer, 'lr_scheduler': scheduler_dict}
%%time
model = LitCifar10(lr=0.05)
model.datamodule = cifar10_dm
trainer = pl.Trainer(
gpus=1,
max_epochs=150,
auto_scale_batch_size=True,
auto_lr_find = True,
progress_bar_refresh_rate=100,
logger=pl.loggers.TensorBoardLogger('tblogs/', name='mobilenet2'),
callbacks=[LearningRateMonitor(logging_interval='step')],
)
trainer.fit(model, cifar10_dm)
trainer.test(model, datamodule=cifar10_dm);
GPU available: True, used: True
TPU available: None, using: 0 TPU cores
Files already downloaded and verified
Files already downloaded and verified
| Name | Type | Params
--------------------------------------
0 | model | MobileNetV2 | 2.3 M
--------------------------------------
2.3 M Trainable params
0 Non-trainable params
2.3 M Total params
9.188 Total estimated model params size (MB)
(...)
Epoch 16: reducing learning rate of group 0 to 4.0000e-02.
Epoch 29: reducing learning rate of group 0 to 3.2000e-02.
Epoch 39: reducing learning rate of group 0 to 2.5600e-02.
Epoch 45: reducing learning rate of group 0 to 2.0480e-02.
Epoch 51: reducing learning rate of group 0 to 1.6384e-02.
Epoch 60: reducing learning rate of group 0 to 1.3107e-02.
Epoch 69: reducing learning rate of group 0 to 1.0486e-02.
Epoch 79: reducing learning rate of group 0 to 8.3886e-03.
Epoch 85: reducing learning rate of group 0 to 6.7109e-03.
Epoch 96: reducing learning rate of group 0 to 5.3687e-03.
Epoch 102: reducing learning rate of group 0 to 4.2950e-03.
Epoch 113: reducing learning rate of group 0 to 3.4360e-03.
Epoch 119: reducing learning rate of group 0 to 2.7488e-03.
Epoch 128: reducing learning rate of group 0 to 2.1990e-03.
Epoch 135: reducing learning rate of group 0 to 1.7592e-03.
Epoch 143: reducing learning rate of group 0 to 1.4074e-03.
Epoch 149: reducing learning rate of group 0 to 1.1259e-03.
(...)
--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'test_acc': 0.9147999882698059, 'test_loss': 0.3481239378452301}
--------------------------------------------------------------------------------
CPU times: user 3h 55min 40s, sys: 41min 56s, total: 4h 37min 36s
Wall time: 4h 39min 26s



以上