MXNet Architecture : MXNet システム・アーキテクチャ (翻訳・解説)
翻訳 : (株)クラスキャットセールスインフォメーション
日時 : 02/21/2017

* 本ページは、MXNet 本家サイトの “Architecture : MXNet System Architecture” を翻訳した上で適宜、補足説明したものです：
http://mxnet.io/architecture/overview.html
* このページの画像は github から引用しています。

MXNet システム・アーキテクチャ

図は MXNet の主要なモジュールとコンポーネントとその相互作用を示します。
モジュールは :

Runtime Dependency Engine: Schedules and executes the operations according to their read/write dependency.
Storage Allocator: Efficiently allocates and recycles memory blocks for GPU and CPU processors.
Resource Manager: Manages global resources, such as the random number generator and temporal space.
NDArray: 動的非同期 n-次元配列、MXNet のための柔軟な命令型プログラムを提供します。
Symbolic Execution: Static symbolic graph executor, which provides efficient symbolic graph execution and optimization.
Operator: 静的な foward と勾配計算 (backprop) を定義する Operators (演算子) 。
SimpleOp: NDArray 演算子と記号型演算子に一体となって及ぶ Operators (演算子)。
Symbol Construction: Symbolic construction, which provides a way to construct a computation graph (net configuration).
KVStore: Key-value ストア I/F for 簡単なパラメータ同期。
Data Loading(IO): Efficient distributed data loading and augmentation.

MXNet システム・コンポーネント

Execution Engine

You can use MXNet’s engine not only for deep learning, but for any domain-specific problem. It’s designed to solve a general problem: execute a bunch of functions following their dependencies. Execution of any two functions with dependencies should be serialized. To boost performance, functions with no dependencies can be executed in parallel. For a general discussion of this topic, see the Note on Dependency Engine.

Interface

KVStore: マルチ・デバイスとマルチ・コンピュータ

MXNet はデータ同期に two-level パラメータ・サーバを使用します。

On the first layer, data are synchronized over multiple devices within a single worker machine. A device could be a GPU card, CPU card, or other computational unit. We often use the sequential consistency model, also known as BSP, on this level.
On the second layer, data are synchronized over multiple workers by way of servers. We can use either a sequential consistency model for guaranteed convergence or a (partial)-asynchronous model for better system performance.

KVStore

MXNet implemented the two-level parameter server in class KVStore. We currently provide the following three types. Given the batch size b.

kvstore type #devices #workers #ex per device #ex per update max delayN

dist_sync k n b / k b × n 0

dist_async k n b / k b inf

以上

月	火	水	木	金	土	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28

kvstore type #devices #workers #ex per device #ex per update max delayN
dist_sync k n b / k b × n 0
dist_async k n b / k b inf