MXNet Architecure : MXNet システム・アーキテクチャ

MXNet Architecture : MXNet システム・アーキテクチャ (翻訳・解説)
翻訳 : (株)クラスキャット セールスインフォメーション
日時 : 02/21/2017

* 本ページは、MXNet 本家サイトの “Architecture : MXNet System Architecture” を翻訳した上で適宜、補足説明したものです:
    http://mxnet.io/architecture/overview.html
* このページの画像は github から引用しています。

 

MXNet システム・アーキテクチャ

 
図は MXNet の主要なモジュールとコンポーネントとその相互作用を示します。
モジュールは :

  • Runtime Dependency Engine: Schedules and executes the operations according to their read/write dependency.
  • Storage Allocator: Efficiently allocates and recycles memory blocks for GPU and CPU processors.
  • Resource Manager: Manages global resources, such as the random number generator and temporal space.
  • NDArray: 動的非同期 n-次元配列、MXNet のための柔軟な命令型プログラムを提供します。
  • Symbolic Execution: Static symbolic graph executor, which provides efficient symbolic graph execution and optimization.
  • Operator: 静的な foward と勾配計算 (backprop) を定義する Operators (演算子) 。
  • SimpleOp: NDArray 演算子と記号型演算子に一体となって及ぶ Operators (演算子)。
  • Symbol Construction: Symbolic construction, which provides a way to construct a computation graph (net configuration).
  • KVStore: Key-value ストア I/F for 簡単なパラメータ同期。
  • Data Loading(IO): Efficient distributed data loading and augmentation.

 

MXNet システム・コンポーネント

Execution Engine

You can use MXNet’s engine not only for deep learning, but for any domain-specific problem. It’s designed to solve a general problem: execute a bunch of functions following their dependencies. Execution of any two functions with dependencies should be serialized. To boost performance, functions with no dependencies can be executed in parallel. For a general discussion of this topic, see the Note on Dependency Engine.

Interface

 

KVStore: マルチ・デバイスとマルチ・コンピュータ

MXNet はデータ同期に two-level パラメータ・サーバを使用します。

  • On the first layer, data are synchronized over multiple devices within a single worker machine. A device could be a GPU card, CPU card, or other computational unit. We often use the sequential consistency model, also known as BSP, on this level.
  • On the second layer, data are synchronized over multiple workers by way of servers. We can use either a sequential consistency model for guaranteed convergence or a (partial)-asynchronous model for better system performance.

    KVStore

    MXNet implemented the two-level parameter server in class KVStore. We currently provide the following three types. Given the batch size b.

    kvstore type #devices #workers #ex per device #ex per update max delayN

    dist_sync k n b / k b × n 0

    dist_async k n b / k b inf
     

    以上