I am a Senior Deep Learning Computer Architect at NVIDIA. My work and research interests include developing the SW stack and optimizing the GPU architecture performance for deep learning acceleration. Before I joined NVIDIA, I worked for SK hynix as a HW engineer, where I made a major contribution to many projects on phase-change memory design and memory system performance optimization. I received my Ph.D. in Electrical and Computer Engineering from the University of Texas at Austin, where I worked with professor Mattan Erez. My dissertation, Efficient Deep Neural Network Model Training by Reducing Memory and Compute Demands, covers SW, HW, and algorithm co-design for performance-efficient deep neural network model training.
Education
Ph.D. in Electrical and Computer Engineering, The Univerisity of Texas at Austin, Austin, TX
12/2019
MS. in Electrical and Computer Engineering, The Univerisity of Texas at Austin, Austin, TX
05/2018
BS. in Electrical Engineering, Hanyang University, Republic of Korea
02/2008
Research and Work Experience
NVIDIA (Santa Clara, CA)
Sr. Deep Learning Computer Architect @Deep learning architecture group
01/2019 - Current
GPU computer architecture SW stack implemenation and optimization for deep learning workload acceleration
Machine learning acceleration (Algorithm, SW implementation, workload scheduling, and HW optimization) High performance and energy efficient memory system design (Micro)Architecture level fault injection modeling and application fault tolerance analysis
Microsoft (Redmond, WA)
Research intern @AI & advanced architecture group
05/2019 - 08/2019
Deep learning model performance analysis, accelerator architecture design space exploration
NVIDIA (Santa Clara, CA)
Deep learning architecture intern @Deep learning architecture group
05/2018 - 08/2018
Deep learning workload analysis, GPU deep learning kernel analysis for fast network model training
NVIDIA Research (Austin, TX)
Research intern @Architecture group
05/2017 - 08/2017
Deep learning workload analysis, GPU memory system modeling and optimization for CNN model training
Hewlett Packard Labs (Palo Alto, CA)
Research intern @Platform architecture group
05/2016 - 08/2016
Persistent memory system architecture, memory centric computing, DRAM cache simulator design
SK hynix (Ichon, Republic of Korea)
DRAM design & performance evaluation engineer @DRAM product planning and enabling team
04/2012 - 07/2015
DDR4-Extension features development and evaluation to improve memory system parallelism Next generation DRAM features evaluation and proposal Major contribution to DDR4 & DDR4-Extension JEDEC standardization as the representative of SK hynix
PCRAM architecture design and memory core functionality optimization PCRAM data interface and physical layout design Wafer-level PCRAM functionality analysis and SLC and MLC cell transition characteristic evaluation
Publication
Reducing Activation Recomputation in Large Transformer Models, Vijay Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, Bryan Catanzaro Publication @arXiv, May 2022.[Paper]
BranchNet: A Convolutional Neural Network to Predict Hard-To-Predict Branches, Siavash Zangeneh, Stephen Pruett, Sangkug Lym, Yale N. Patt IEEE/ACM 53rd International Symposium on Microarchitecture (MICRO), Oct 2020[Paper]
FlexSA: Flexible Systolic Array Architecture for Efficient Pruned DNN Model Training, Sangkug Lym and Mattan Erez Publication @arXiv, Mar 2020.[Paper]
Near Data Acceleration with Concurrent Host Access, Benjamin Y Cho, Yongkee Kwon, Sangkug Lym, Mattan Erez ACM/IEEE 47th International Symposium on Computer Architecture (ISCA), Jun 2020.[Paper]
PruneTrain: Fast Neural Network Training by Dynamic Sparse Model Reconfiguration (Best paper finalist), Sangkug Lym, Esha Choukse, Siavash Zangeneh, Wei Wen, Sujay Sanghavi, and Mattan Erez International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Nov 2019 [Paper][Slides][Code]
Mini-batch Serialization: CNN Training with Inter-layer Data Reuse, Sangkug Lym, Armand Behroozi, Wei Wen, Ge Li, Yongkee Kwon, and Mattan Erez The 2nd Conference on Machine Learning and Systems (MLSys), Apr 2019[Paper][Slides][Code][Video]
DeLTA: GPU Performance Model for Deep Learning Applications with In-depth Memory System Traffic Analysis, Sangkug Lym, Donghyuk Lee, Mike O'Connor, Niladrish Chatterjee, and Mattan Erez IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Mar 2019[Paper][Slides]
Evaluating and Accelerating High-Fidelity Error Injection for HPC, Chun-Kai Chang, Sangkug Lym, Nicholas Kelly, Michael B. Sullivan, and Mattan Erez International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Nov 2018[Paper]
Hamartia: A Fast and Accurate Error Injection Framework, Chun-Kai Chang, Sangkug Lym, Nicholas Kelly, Michael B. Sullivan, and Mattan Erez SELSE (invited to the best SELSE papers session in DSN 2018), Apr 2018[Paper][Code]
ERUCA: Efficient DRAM Resource Utilization and Resource Conflict Avoidance for Memory System Parallelism, Sangkug Lym, Heonjae Ha, Yongkee Kwon, Chun-Kai Chang, Jungrae Kim, and Mattan Erez IEEE 24th International Symposium on High-Performance Computer Architecture (HPCA), Feb 2018[Paper][Slides]
DUO: Exposing On-chip Redundancy to Rank-Level ECC for High Reliability, Seong-Lyong Gong, Jungrae Kim, Sangkug Lym, Michael Sullivan, Howard David, and Mattan Erez IEEE 24th International Symposium on High-Performance Computer Architecture (HPCA), Feb 2018[Paper]
All-Inclusive ECC: Thorough End-to- End Protection for Reliable Computer Memory, Jungrae Kim, Michael Sullivan, Sangkug Lym, and Mattan Erez ACM/IEEE 43rd International Symposium on Computer Architecture (ISCA), June 2016.[Paper]
Invited Talks
"PruneTrain: Fast Neural Network Training by Dynamic Sparse Model Reconfiguration", SC'19, Denver, CO
"Mini-batch Serialization: CNN Training with Inter-layer Data Reuse", SysML'19, Stanford, CA
"DeLTA:GPU Performance Model for Deep Learning Applications with In-depth Memory System Traffic Analysis", ISPASS'19, Madison, Wisconsin
"ERUCA: Efficient DRAM Resource Utilization and Resource Conflict Avoidance for Memory System Parallelism", HPCA'18, Vienna, Austria
"JEDEC DDR4 Workshop: DDR4 Operation Specifics", 2013 DDR4 Workshop, Santa Clara, CA [Video]