MXNet Architecture : MXNet システム・アーキテクチャ (翻訳・解説)
翻訳 : (株)クラスキャット セールスインフォメーション
日時 : 02/21/2017
* 本ページは、MXNet 本家サイトの “Architecture : MXNet System Architecture” を翻訳した上で適宜、補足説明したものです:
http://mxnet.io/architecture/overview.html
* このページの画像は github から引用しています。
MXNet システム・アーキテクチャ
図は MXNet の主要なモジュールとコンポーネントとその相互作用を示します。
モジュールは :
- Runtime Dependency Engine: Schedules and executes the operations according to their read/write dependency.
- Storage Allocator: Efficiently allocates and recycles memory blocks for GPU and CPU processors.
- Resource Manager: Manages global resources, such as the random number generator and temporal space.
- NDArray: 動的非同期 n-次元配列、MXNet のための柔軟な命令型プログラムを提供します。
- Symbolic Execution: Static symbolic graph executor, which provides efficient symbolic graph execution and optimization.
- Operator: 静的な foward と勾配計算 (backprop) を定義する Operators (演算子) 。
- SimpleOp: NDArray 演算子と記号型演算子に一体となって及ぶ Operators (演算子)。
- Symbol Construction: Symbolic construction, which provides a way to construct a computation graph (net configuration).
- KVStore: Key-value ストア I/F for 簡単なパラメータ同期。
- Data Loading(IO): Efficient distributed data loading and augmentation.
MXNet システム・コンポーネント
Execution Engine
You can use MXNet’s engine not only for deep learning, but for any domain-specific problem. It’s designed to solve a general problem: execute a bunch of functions following their dependencies. Execution of any two functions with dependencies should be serialized. To boost performance, functions with no dependencies can be executed in parallel. For a general discussion of this topic, see the Note on Dependency Engine.
Interface
KVStore: マルチ・デバイスとマルチ・コンピュータ
MXNet はデータ同期に two-level パラメータ・サーバを使用します。
- On the first layer, data are synchronized over multiple devices within a single worker machine. A device could be a GPU card, CPU card, or other computational unit. We often use the sequential consistency model, also known as BSP, on this level.
- On the second layer, data are synchronized over multiple workers by way of servers. We can use either a sequential consistency model for guaranteed convergence or a (partial)-asynchronous model for better system performance.
KVStore
MXNet implemented the two-level parameter server in class KVStore. We currently provide the following three types. Given the batch size b.
kvstore type #devices #workers #ex per device #ex per update max delayN dist_sync k n b / k b × n 0 dist_async k n b / k b inf
以上