Most of my research happens in improving computer system infastructure for deep learning applications. More specifically, I’m currently working on making GPU more flexible in terms of sharing to facilitate better scheduling and improve resource utilization. For a complete list of my past experience, please refer to my CV (last updated: Nov. 2018).
The following is a list containing all my publications, plus posters.
AI for Seciety - A Michigan AI Symposium ( MichiganAI'18 )
GPU computing is becoming increasingly more popular with the proliferation of deep learning (DL) applications. However, unlike traditional resources such as CPU or the network, modern GPUs do not natively support fine-grained sharing primitives. Consequently, implementing common policies such as time sharing and preemption are expensive. Worse, when a DL application cannot completely use a GPU’s resources, the GPU cannot be efficiently shared between multiple applications, leading to GPU underutilization.
We present Salus to enable two GPU sharing primitives: fast job switching and memory sharing, in order to achieve fine-grained GPU sharing among multiple DL applications. Salus implements an efficient, consolidated execution service that exposes the GPU to different DL applications, and enforces fine-grained sharing by performing iteration scheduling and addressing associated memory management issues. We show that these primitives can then be used to implement flexible sharing policies such as fairness, prioritization, and packing for various use cases. Our integration of Salus with TensorFlow and evaluation on popular DL jobs show that Salus can improve the average completion time of DL training jobs by 3.19×, GPU utilization for hyper-parameter tuning by 2.38×, and GPU utilization of DL inference applications by 42× over not sharing the GPU and 7× over NVIDIA MPS with small overhead.
SysML Conference 2018 ( SysML'18 )
In this paper, we present Salus, a framework-independent runtime to enable fine-grained sharing of a single GPU among multiple memory-intensive CNN applications. Salus implements an efficient, consolidated execution service that exposes the GPU to different CNN applications and enforces fine-grained sharing by performing low-level memory management, managing GPU task scheduling, and addressing associated issues such as deadlock prevention and GPU-to-host memory paging. Not only can Salus enable multiple CNN jobs to share a single GPU, it can enforce sharing policies to provide fairness and prioritization as well. Our integration of Salus with TensorFlow shows that it can improve GPU utilization by up to 20x.
The 16th Workshop on Hot Topics in Operating Systems ( HotOS'17 )
In recent years, deep learning has pervaded many areas of computing due to the confluence of an explosive growth of large-scale computing capabilities, availability of datasets, and advances in learning techniques. While this rapid growth has resulted in diverse deep learning frameworks, it has also led to inefficiencies for both the users and developers of these frameworks. Specifically, adopting useful techniques across frameworks – both to perform learning tasks and to optimize performance – involves significant repetitions and reinventions.
In this paper, we observe that despite their diverse origins, many of these frameworks share architectural similarities. We argue that by introducing a common representation of learning tasks and a hardware abstraction model to capture compute heterogeneity, we might be able to relieve machine learning researchers from dealing with low-level systems issues and systems researchers from being tied to any specific framework. We expect this decoupling to accelerate progress in both domains.