Sehoon Kim

Ph.D. Student in BAIR at UC Berkeley
Contact: sehoonkim at berkeley dot edu
CV: link (latest update: Sep. 2022)
Advisor: Prof. Kurt Keutzer
Research Interests: Efficient AI, HW-SW Codesign, AI Systems

About


I am a 3rd year Ph.D. student in Berkeley AI Research (BAIR) at EECS, UC Berkeley. I am working on a wide rage of full-stack approaches for efficient AI and deep learning under the supervision of Prof. Kurt Keutzer. More specifically, my research interests lie in:

  • Efficient algorithms and model compression techniques for low-cost deep learning inference at the edge
  • Efficient model architecture design and hardware-aware neural architecture search (NAS)
  • Hardware-software co-design and co-optimization
  • AI systems for efficient model training and inference

Before joining UC Berkeley, I was an undergrad majoring in Electrical and Computer Engineering (ECE) at Seoul National University, where I was ranked 1st in the entire class of 2020 (overall GPA: 4.29/4.30, major GPA: 4.30/4.30). In my undergrad years, I was honored to work with Prof. Jangwoo Kim on computer architectures, and with Prof. Byung-Gon Chun on deep learning software systems.

Selected Publications


Click to view the full list of publications.

Big Little Transformer Decoder [code]

Sehoon Kim, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Amir Gholami, Kurt Keutzer
Preprint, 2023

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition [code] [NeMo official]

Sehoon Kim*, Amir Gholami*, Albert Shaw†, Nicholas Lee†, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer
Conference on Neural Information Processing Systems (NeurIPS), 2022

A Fast Post-Training Pruning Framework for Transformers [code]

Woosuk Kwon*, Sehoon Kim*, Michael W. Mahoney, Joseph Hassoun, Kurt Keutzer, Amir Gholami
Conference on Neural Information Processing Systems (NeurIPS), 2022

Learned Token Pruning for Transformers [code]

Sehoon Kim*, Sheng Shen*, David Thorsley*, Amir Gholami*, Woosuk Kwon, Joseph Hassoun, Kurt Keutzer
Conference on Knowledge Discovery and Data Mining (KDD), 2022

Integer-only Zero-shot Quantization for Efficient Speech Recognition [code]

Sehoon Kim, Amir Gholami, Zhewei Yao, Nicholas Lee, Patrick Wang, Anirudda Nrusimha, Bohan Zhai, Tianren Gao, Michael W. Mahoney, Kurt Keutzer,
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

I-BERT: Integer-only BERT Quantization [code] [HF official]

Sehoon Kim*, Amir Gholami*, Zhewei Yao*, Michael W. Mahoney, Kurt Keutzer
International Conference on Machine Learning (ICML, Oral), 2021

Publications


Full Stack Optimization of Transformer Inference: a Survey

Sehoon Kim*, Coleman Hooper*, Thanakul Wattanawong, Minwoo Kang, Ruohan Yan, Hasan Genc, Grace Dinh, Qijing Huang, Kurt Keutzer, Michael W. Mahoney, Yakun Sophia Shao, Amir Gholami
Preprint, 2023

Big Little Transformer Decoder [code]

Sehoon Kim, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Amir Gholami, Kurt Keutzer
Preprint, 2023

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition [code] [NeMo official]

Sehoon Kim*, Amir Gholami*, Albert Shaw†, Nicholas Lee†, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer
Conference on Neural Information Processing Systems (NeurIPS), 2022

A Fast Post-Training Pruning Framework for Transformers [code]

Woosuk Kwon*, Sehoon Kim*, Michael W. Mahoney, Joseph Hassoun, Kurt Keutzer, Amir Gholami
Conference on Neural Information Processing Systems (NeurIPS), 2022

Learned Token Pruning for Transformers [code]

Sehoon Kim*, Sheng Shen*, David Thorsley*, Amir Gholami*, Woosuk Kwon, Joseph Hassoun, Kurt Keutzer
Conference on Knowledge Discovery and Data Mining (KDD), 2022

Integer-only Zero-shot Quantization for Efficient Speech Recognition [code]

Sehoon Kim, Amir Gholami, Zhewei Yao, Nicholas Lee, Patrick Wang, Anirudda Nrusimha, Bohan Zhai, Tianren Gao, Michael W. Mahoney, Kurt Keutzer,
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Hessian-Aware Pruning and Optimal Neural Implant [code]

Shixing Yu*, Zhewei Yao*, Amir Gholami*, Zhen Dong*, Sehoon Kim, Michael W. Mahoney, Kurt Keutzer
Winter Conference on Applications of Computer Vision (WACV), 2022

A Survey of Quantization Methods for Efficient Neural Network Inference

Amir Gholami*, Sehoon Kim*, Zhen Dong*, Zhewei Yao*, Michael W. Mahoney, Kurt Keutzer
Book Chapter: Low-Power Computer Vision: Improving the Efficiency of Artificial Intelligence, 2021

WindTunnel: Towards Differentiable ML Pipelines Beyond a Single Model

Gyeong-In Yu, Saeed Amizadeh, Sehoon Kim, Artidoro Pagnoni, Ce Zhang, Byung-Gon Chun, Markus Weimer, Matteo Interlandi
International Conference on Very Large Data Bases (VLDB), 2021

Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs

Taebum Kim, Eunji Jeong, Geon-Woo Kim, Yunmo Koo, Sehoon Kim, Gyeong-In Yu, Byung-Gon Chun
Conference on Neural Information Processing Systems (NeurIPS), 2021

I-BERT: Integer-only BERT Quantization [code] [HF official]

Sehoon Kim*, Amir Gholami*, Zhewei Yao*, Michael W. Mahoney, Kurt Keutzer
International Conference on Machine Learning (ICML, Long talk), 2021

Memory-Efficient Hardware Performance Counters with Approximate-Counting Algorithms

Jingyi Xu, Sehoon Kim, Borivoje Nikolic, Yakun Sophia Shao
International Symposium on Performance Analysis of Systems and Software (ISPASS), 2021