HuggingFace Diffusers 0.12 : ノートブック : Stable Diffusion の画像-to-画像変換パイプライン (翻訳/解説)

翻訳 : (株)クラスキャットセールスインフォメーション
作成日時 : 02/23/2023 (v0.12.1)

* 本ページは、HuggingFace Diffusers の以下のドキュメントを翻訳した上で適宜、補足説明したものです：

Image2Image Pipeline for Stable Diffusion using 🧨 Diffusers

* サンプルコードの動作確認はしておりますが、必要な場合には適宜、追加改変しています。
* ご自由にリンクを張って頂いてかまいませんが、sales-info@classcat.com までご一報いただけると嬉しいです。

クラスキャット人工知能研究開発支援サービス

◆ クラスキャットは人工知能・テレワークに関する各種サービスを提供しています。お気軽にご相談ください :

人工知能研究開発支援
1. 人工知能研修サービス(経営者層向けオンサイト研修)
2. テクニカルコンサルティングサービス
3. 実証実験(プロトタイプ構築)
4. アプリケーションへの実装
人工知能研修サービス
PoC(概念実証)を失敗させないための支援

◆ 人工知能とビジネスをテーマに WEB セミナーを定期的に開催しています。スケジュール。

お住まいの地域に関係なく Web ブラウザからご参加頂けます。事前登録 が必要ですのでご注意ください。

◆ お問合せ : 本件に関するお問い合わせ先は下記までお願いいたします。

株式会社クラスキャット セールス・マーケティング本部セールス・インフォメーション
sales-info@classcat.com ; Web: www.classcat.com ; ClassCatJP

HuggingFace Diffusers 0.12 : ノートブック : Stable Diffusion の画像-to-画像変換パイプライン

このノートブックは、🤗 Hugging Face 🧨 Diffusers ライブラリを使用して Stable Diffusion モデルによるテキスト誘導画像-to-画像生成に対するカスタム diffusers パイプラインを作成する方法を示します。

Stable Diffusion モデルへの一般的なイントロダクションについてはこちらを参照してください。

!nvidia-smi

Thu Sep  8 18:58:28 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   42C    P0    27W / 300W |      0MiB / 16160MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

!pip install diffusers==0.11.1 transformers ftfy accelerate

🤗 Hugging Face ハブのプライベート & gated モデルを使用するには、ログインが必要です。(このノートブックの CompVis/stable-diffusion-v1-4 のような) パブリック・チェックポイントだけを使用している場合にはこのステップはスキップできます。

from huggingface_hub import notebook_login

notebook_login()

Image2Image パイプライン

import inspect
import warnings
from typing import List, Optional, Union

import torch
from tqdm.auto import tqdm

from diffusers import StableDiffusionImg2ImgPipeline

Load the pipeline

device = "cuda"
model_path = "CompVis/stable-diffusion-v1-4"

pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
    model_path,
    torch_dtype=torch.float16,
)
pipe = pipe.to(device)

初期画像をダウンロードしてそれを前処理し、それをパイプラインに渡すことができます。

import requests
from io import BytesIO
from PIL import Image

url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"

response = requests.get(url)
init_img = Image.open(BytesIO(response.content)).convert("RGB")
init_img = init_img.resize((768, 512))
init_img

プロンプトを定義してパイプラインを実行します。

prompt = "A fantasy landscape, trending on artstation"

ここで、strength は 0.0 と 1.0 間の値で、入力画像に追加されるノイズの総量を制御します。1.0 に近づく値は多くのバリエーションを可能にしますが、意味的に入力と一貫していない画像も生成します。

generator = torch.Generator(device=device).manual_seed(1024)
image = pipe(prompt=prompt, image=init_img, strength=0.75, guidance_scale=7.5, generator=generator).images[0]

image

image = pipe(prompt=prompt, image=init_img, strength=0.5, guidance_scale=7.5, generator=generator).images[0]

image

ご覧のように、strength に対して低い値を使用するとき、生成画像は元の init_image に近くなります。

Now using LMSDiscreteScheduler

from diffusers import LMSDiscreteScheduler

lms = LMSDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.scheduler = lms

generator = torch.Generator(device=device).manual_seed(1024)
image = pipe(prompt=prompt, image=init_img, strength=0.75, guidance_scale=7.5, generator=generator).images[0]

image

以上

月	火	水	木	金	土	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28