The website uses cookies to optimize your user experience. Using this website grants us the permission to collect certain information essential to the provision of our services to you, but you may change the cookie settings within your browser any time you wish. Learn more
I agree
Text direction?

Summary of GSoC 2019 (Run GPU sharing workloads with Kubernetes + kubeflow )

Student: Jianbo Ma(majb2114@zju.edu.cn)

Mentor: Harry Zhang (@resouer) ,Kai Zhang(@wsxiaozhang) ,Jian He (@jian-he)


Being able to participate in GSoC is a lucky thing for me. In the past three months, I have improved my engineering ability with the help of my mentors. I am very grateful for this. Now GSoC 2019 is nearly over, this is my summary of this stage of work.

Project description

GPUSharing is an open source project which could share GPU by leveraging Kubernetes scheduling and Device Plugin extensibility.
Arena is a command-line interface for the data scientists to run and monitor the machine learning training jobs and check their results in an easy way.In the backend, it is based on Kubernetes, helm and Kubeflow. But the data scientists can have very little knowledge about kubernetes. It's goal is to make the data scientists feel like to work on a single machine but with the Power of GPU clusters indeed.


  • Integrate arena with GPUSharing in tensorflow-serving situation.
  • Integrate Nvidia MPS as the option for isolation

Stage 1: Integrate arena with GPUSharing in tensorflow-serving situation.


  • Finish an end to end tf-serving task using GPUShare.
  • Check the GPUMemory resource of kubernetes cluster.
  • Finish a user_guide of tf-serving with GPUShare.


1. per_process_gpu_memory_fraction

Per_process_gpu_memory_fraction is a fraction that each process occupies of the GPU memory space. The value is between 0.0 and 1.0 (with 0.0 as the default)
If 1.0, the server will allocate all the memory when the server starts,
If 0.0, Tensorflow will automatically select a valupe.

For example, If we want the serving job to occupy half of the GPU resources,we can set per_process_gpu_memory_fraction equals to 0.5.

2. The design process.

Goals:After users submit the serving task,we need to calculate the correct per_process_gpu_memory_fraction and convert it as a parameter of serving-task.

per_process_gpu_memory_fraction=(required GPUMemory)/(total GPUMemory in allocated GPU card).

  • The gpumemory serving task requires will be transformed into spec.container.resource.limits.aliyun.com/gpu-mem.
  • After GPUShare scheduler-extender and device-plugin,environmental variable will be generated.
  • Required GPUMemory equals to ALIYUN_COM_GPU_MEM_CONTAINER,total GPUMemory in GPU card equals to ALIYUN_COM_GPU_MEM_DEV.
  • per_process_gpu_memory_fraction=$ALIYUN_COM_GPU_MEM_CONTAINER/$ALIYUN_COM_GPU_MEM_DEV
  • If in GPUShare situation,convert per_process_gpu_memory_fraction in the task.

3. The design diagram.


Stage 2: Integrate Nvidia MPS as the option for isolation


  • Investigate how to use MPS.
  • Test the capacity of MPS.
  • Integrate MPS with GPUShare,simplify user operations.

Design and result


User_guide and Integration

To do

Test if GPU thread is controled by MPS.


Related Notes
Get a free MyMarkup account to save this article and view it later on any device.
Create account

End User License Agreement

Summary | 3 Annotations
2020/06/30 07:04
2020/06/30 07:04
Nvidia MPS as the option for isolation
2020/06/30 07:04