Skip to content

λDNN is a cost-efficient function resource provisioning framework to minimize the monetary cost and guarantee the performance for DDNN training workloads in serverless platforms.

Notifications You must be signed in to change notification settings

icloud-ecnu/lambdadnn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

λDNN: Achieving Predictable Distributed DNN Training with Serverless Architectures

λDNN is a cost-efficient function resource provisioning framework to minimize the monetary cost and guarantee the performance for DDNN training workloads in serverless platforms.

Overview of λDNN

λDNN framework running on AWS Lambda and comprises two pieces of modules: a training performance predictor and a function resource provisioner. To guarantee the objective DDNN training time, the resource provisioner further identifies the cost-efficient serverless function resource provisioning plan. Once the cost-efficient resource provisioning plan is determined, the function allocator finally sets up a number of functions with an appropriate amount of memory.

Modeling DDNN Training Performance In Serverless Platforms

In general, the DNN model requires a number of iterations (denoted by k) to converge to an objective training loss value. Accordingly, the DDNN training time T can be calculated by summing up the loading time, and the computation time, as well as the communication time, which is given by

The loading time is calculated as
Given n provisioned functions, the computation time tcomp of model gradients is defined as
The data communication time is calculated as
The objective is to minimize the monetary cost of provisioned function resources, while guaranteeing the performance of DDNN training workloads. The optimization problem is formally defined as

Publication

Fei Xu, Yiling Qin, Li Chen, Zhi Zhou, Fangming Liu, “λDNN: Achieving Predictable Distributed DNN Training with Serverless Architectures,” IEEE Transactions on Computers, 2022, 71(2): 450-463. DOI:10.1109/TC.2021.3054656.

@article{xu2021lambdadnn,
  title={$\lambda$dnn: Achieving predictable distributed DNN training with serverless architectures},
  author={Xu, Fei and Qin, Yiling and Chen, Li and Zhou, Zhi and Liu, Fangming},
  journal={IEEE Transactions on Computers},
  volume={71},
  number={2},
  pages={450--463},
  year={2021},
  publisher={IEEE}
}

About

λDNN is a cost-efficient function resource provisioning framework to minimize the monetary cost and guarantee the performance for DDNN training workloads in serverless platforms.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages