About

About me

I am a Senior Deep Learning Computer Architect at NVIDIA. My work and research interests include developing the SW stack and optimizing the GPU architecture performance for deep learning acceleration. Before I joined NVIDIA, I worked for SK hynix as a HW engineer, where I made a major contribution to many projects on phase-change memory design and memory system performance optimization. I received my Ph.D. in Electrical and Computer Engineering from the University of Texas at Austin, where I worked with professor Mattan Erez. My dissertation, Efficient Deep Neural Network Model Training by Reducing Memory and Compute Demands, covers SW, HW, and algorithm co-design for performance-efficient deep neural network model training.

Education

Ph.D. in Electrical and Computer Engineering, The Univerisity of Texas at Austin, Austin, TX

12/2019
MS. in Electrical and Computer Engineering, The Univerisity of Texas at Austin, Austin, TX

05/2018
BS. in Electrical Engineering, Hanyang University, Republic of Korea

02/2008

Research and Work Experience

NVIDIA (Santa Clara, CA)

Sr. Deep Learning Computer Architect @Deep learning architecture group

01/2019 - Current

GPU computer architecture
SW stack implemenation and optimization for deep learning workload acceleration

The University of Texas at Austin (Austin, Tx)

Graduate research assistant @LPH Group

08/2015 - 12/2019

Machine learning acceleration (Algorithm, SW implementation, workload scheduling, and HW optimization)
High performance and energy efficient memory system design
(Micro)Architecture level fault injection modeling and application fault tolerance analysis

Microsoft (Redmond, WA)

Research intern @AI & advanced architecture group

05/2019 - 08/2019

Deep learning model performance analysis, accelerator architecture design space exploration

NVIDIA (Santa Clara, CA)

Deep learning architecture intern @Deep learning architecture group

05/2018 - 08/2018

Deep learning workload analysis, GPU deep learning kernel analysis for fast network model training

NVIDIA Research (Austin, TX)

Research intern @Architecture group

05/2017 - 08/2017

Deep learning workload analysis, GPU memory system modeling and optimization for CNN model training

Hewlett Packard Labs (Palo Alto, CA)

Research intern @Platform architecture group

05/2016 - 08/2016

Persistent memory system architecture, memory centric computing, DRAM cache simulator design

SK hynix (Ichon, Republic of Korea)

DRAM design & performance evaluation engineer @DRAM product planning and enabling team

04/2012 - 07/2015

DDR4-Extension features development and evaluation to improve memory system parallelism
Next generation DRAM features evaluation and proposal
Major contribution to DDR4 & DDR4-Extension JEDEC standardization as the representative of SK hynix

SK hynix (Ichon, Republic of Korea)

PCRAM architecture & circuit design engineer @Next generation memory design team

12/2007 - 04/2012

PCRAM architecture design and memory core functionality optimization
PCRAM data interface and physical layout design
Wafer-level PCRAM functionality analysis and SLC and MLC cell transition characteristic evaluation

Publication

Reducing Activation Recomputation in Large Transformer Models,
Vijay Korthikanti, Jared Casper, Sangkug Lym, Lawrence McAfee, Michael Andersch, Mohammad Shoeybi, Bryan Catanzaro
Publication @arXiv, May 2022. [Paper]

BranchNet: A Convolutional Neural Network to Predict Hard-To-Predict Branches,
Siavash Zangeneh, Stephen Pruett, Sangkug Lym, Yale N. Patt
IEEE/ACM 53rd International Symposium on Microarchitecture (MICRO), Oct 2020 [Paper]

FlexSA: Flexible Systolic Array Architecture for Efficient Pruned DNN Model Training,
Sangkug Lym and Mattan Erez
Publication @arXiv, Mar 2020. [Paper]

Near Data Acceleration with Concurrent Host Access,
Benjamin Y Cho, Yongkee Kwon, Sangkug Lym, Mattan Erez
ACM/IEEE 47th International Symposium on Computer Architecture (ISCA), Jun 2020. [Paper]

PruneTrain: Fast Neural Network Training by Dynamic Sparse Model Reconfiguration (Best paper finalist),
Sangkug Lym, Esha Choukse, Siavash Zangeneh, Wei Wen, Sujay Sanghavi, and Mattan Erez
International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Nov 2019
[Paper] [Slides] [Code]

Mini-batch Serialization: CNN Training with Inter-layer Data Reuse,
Sangkug Lym, Armand Behroozi, Wei Wen, Ge Li, Yongkee Kwon, and Mattan Erez
The 2nd Conference on Machine Learning and Systems (MLSys), Apr 2019 [Paper] [Slides] [Code] [Video]

DeLTA: GPU Performance Model for Deep Learning Applications with In-depth Memory System Traffic Analysis,
Sangkug Lym, Donghyuk Lee, Mike O'Connor, Niladrish Chatterjee, and Mattan Erez
IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Mar 2019 [Paper] [Slides]

Evaluating and Accelerating High-Fidelity Error Injection for HPC,
Chun-Kai Chang, Sangkug Lym, Nicholas Kelly, Michael B. Sullivan, and Mattan Erez
International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Nov 2018 [Paper]

Hamartia: A Fast and Accurate Error Injection Framework,
Chun-Kai Chang, Sangkug Lym, Nicholas Kelly, Michael B. Sullivan, and Mattan Erez
SELSE (invited to the best SELSE papers session in DSN 2018), Apr 2018 [Paper] [Code]

ERUCA: Efficient DRAM Resource Utilization and Resource Conflict Avoidance for Memory System Parallelism,
Sangkug Lym, Heonjae Ha, Yongkee Kwon, Chun-Kai Chang, Jungrae Kim, and Mattan Erez
IEEE 24th International Symposium on High-Performance Computer Architecture (HPCA), Feb 2018 [Paper] [Slides]

DUO: Exposing On-chip Redundancy to Rank-Level ECC for High Reliability,
Seong-Lyong Gong, Jungrae Kim, Sangkug Lym, Michael Sullivan, Howard David, and Mattan Erez
IEEE 24th International Symposium on High-Performance Computer Architecture (HPCA), Feb 2018 [Paper]

All-Inclusive ECC: Thorough End-to- End Protection for Reliable Computer Memory,
Jungrae Kim, Michael Sullivan, Sangkug Lym, and Mattan Erez
ACM/IEEE 43rd International Symposium on Computer Architecture (ISCA), June 2016. [Paper]

Invited Talks

"PruneTrain: Fast Neural Network Training by Dynamic Sparse Model Reconfiguration", SC'19, Denver, CO
"Mini-batch Serialization: CNN Training with Inter-layer Data Reuse", SysML'19, Stanford, CA
"DeLTA:GPU Performance Model for Deep Learning Applications with In-depth Memory System Traffic Analysis", ISPASS'19, Madison, Wisconsin
"ERUCA: Efficient DRAM Resource Utilization and Resource Conflict Avoidance for Memory System Parallelism", HPCA'18, Vienna, Austria
"JEDEC DDR4 Workshop: DDR4 Operation Specifics", 2013 DDR4 Workshop, Santa Clara, CA [Video]
"JEDEC DDR4 Workshop: DDR4 Operation Specifics", 2013 DDR4 Workshop, Hsinchu, Tiwan

Journal Review

IEEE Transactions on Knowledge and Data Engineering (TKDE)
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)

Related Graduate Courses

Computer Architecture
VLSI Design (EE 360R)
Computer Architecture (EE 382N 1)
High Speed Computer Arithmetic (EE 382N 14)
GPUs and Parallel Processors: Parallelism & Locality (EE 382N 20)
Topics in Computer Architecture: User System Interplay (EE 382N 22)
Superscalar Computer Architecture (EE 382N 17)
Dependable Computing (EE 382M)
Advanced Topics in Compiler (CS 380C)
Machine Learning
Machine Learning (CS 391L)
Neural Network (CS 394N)
Prediction Mechanisms in Computer Architecture (CS 395T)