
Sehoon Kim
Ph.D. Student in
BAIR
at UC Berkeley
Contact: sehoonkim at berkeley dot edu
CV: link (latest update: Sep. 2022)
Advisor: Prof. Kurt Keutzer
Research Interests:
Efficient AI, HW-SW Codesign, AI Systems
About
I am a 3rd year Ph.D. student in Berkeley AI Research (BAIR) at EECS, UC Berkeley. I am working on a wide rage of full-stack approaches for efficient AI and deep learning under the supervision of Prof. Kurt Keutzer. More specifically, my research interests lie in:
- Efficient algorithms and model compression techniques for low-cost deep learning inference at the edge
- Efficient model architecture design and hardware-aware neural architecture search (NAS)
- Hardware-software co-design and co-optimization
- AI systems for efficient model training and inference
Before joining UC Berkeley, I was an undergrad majoring in Electrical and Computer Engineering (ECE) at Seoul National University, where I was ranked 1st in the entire class of 2020 (overall GPA: 4.29/4.30, major GPA: 4.30/4.30). In my undergrad years, I was honored to work with Prof. Jangwoo Kim on computer architectures, and with Prof. Byung-Gon Chun on deep learning software systems.
Selected Publications
Click to view the full list of publications.

Big Little Transformer Decoder [code]
Sehoon Kim, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Amir Gholami, Kurt Keutzer
Preprint, 2023

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition [code] [NeMo official]
Sehoon Kim*, Amir Gholami*, Albert Shaw†, Nicholas Lee†,
Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer
Conference on Neural Information Processing Systems (NeurIPS), 2022

A Fast Post-Training Pruning Framework for Transformers [code]
Woosuk Kwon*, Sehoon Kim*, Michael W. Mahoney, Joseph Hassoun, Kurt Keutzer, Amir Gholami
Conference on Neural Information Processing Systems (NeurIPS), 2022

Learned Token Pruning for Transformers [code]
Sehoon Kim*, Sheng Shen*, David Thorsley*, Amir Gholami*, Woosuk Kwon, Joseph Hassoun, Kurt Keutzer
Conference on Knowledge Discovery and Data Mining (KDD), 2022

Integer-only Zero-shot Quantization for Efficient Speech Recognition [code]
Sehoon Kim, Amir Gholami, Zhewei Yao, Nicholas Lee, Patrick Wang, Anirudda Nrusimha, Bohan Zhai,
Tianren Gao, Michael W. Mahoney, Kurt Keutzer,
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

I-BERT: Integer-only BERT Quantization [code] [HF official]
Sehoon Kim*, Amir Gholami*, Zhewei Yao*, Michael W. Mahoney, Kurt Keutzer
International Conference on Machine Learning (ICML, Oral), 2021
Publications
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim*, Coleman Hooper*, Thanakul Wattanawong, Minwoo Kang, Ruohan Yan, Hasan Genc, Grace Dinh, Qijing Huang, Kurt Keutzer, Michael W. Mahoney, Yakun Sophia Shao, Amir Gholami
Preprint, 2023
Big Little Transformer Decoder [code]
Sehoon Kim, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Amir Gholami, Kurt Keutzer
Preprint, 2023
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition [code] [NeMo official]
Sehoon Kim*, Amir Gholami*, Albert Shaw†, Nicholas Lee†,
Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer
Conference on Neural Information Processing Systems (NeurIPS), 2022
A Fast Post-Training Pruning Framework for Transformers [code]
Woosuk Kwon*, Sehoon Kim*, Michael W. Mahoney, Joseph Hassoun, Kurt Keutzer, Amir Gholami
Conference on Neural Information Processing Systems (NeurIPS), 2022
Learned Token Pruning for Transformers [code]
Sehoon Kim*, Sheng Shen*, David Thorsley*, Amir Gholami*, Woosuk Kwon, Joseph Hassoun, Kurt Keutzer
Conference on Knowledge Discovery and Data Mining (KDD), 2022
Integer-only Zero-shot Quantization for Efficient Speech Recognition [code]
Sehoon Kim, Amir Gholami, Zhewei Yao, Nicholas Lee, Patrick Wang, Anirudda Nrusimha, Bohan Zhai,
Tianren Gao, Michael W. Mahoney, Kurt Keutzer,
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Hessian-Aware Pruning and Optimal Neural Implant [code]
Shixing Yu*, Zhewei Yao*, Amir Gholami*, Zhen Dong*, Sehoon Kim, Michael W. Mahoney, Kurt Keutzer
Winter Conference on Applications of Computer Vision (WACV), 2022
A Survey of Quantization Methods for Efficient Neural Network Inference
Amir Gholami*, Sehoon Kim*, Zhen Dong*, Zhewei Yao*, Michael W. Mahoney, Kurt Keutzer
Book Chapter: Low-Power Computer Vision: Improving the Efficiency of Artificial Intelligence, 2021
WindTunnel: Towards Differentiable ML Pipelines Beyond a Single Model
Gyeong-In Yu, Saeed Amizadeh, Sehoon Kim, Artidoro Pagnoni, Ce Zhang, Byung-Gon Chun, Markus Weimer, Matteo Interlandi
International Conference on Very Large Data Bases (VLDB), 2021
Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs
Taebum Kim, Eunji Jeong, Geon-Woo Kim, Yunmo Koo, Sehoon Kim, Gyeong-In Yu, Byung-Gon Chun
Conference on Neural Information Processing Systems (NeurIPS), 2021
I-BERT: Integer-only BERT Quantization [code] [HF official]
Sehoon Kim*, Amir Gholami*, Zhewei Yao*, Michael W. Mahoney, Kurt Keutzer
International Conference on Machine Learning (ICML, Long talk), 2021
Memory-Efficient Hardware Performance Counters with Approximate-Counting Algorithms
Jingyi Xu, Sehoon Kim, Borivoje Nikolic, Yakun Sophia Shao
International Symposium on Performance Analysis of Systems and Software (ISPASS), 2021