Research
My research interest spans a wide range of deep generative models (AR, flow, GAN, diffusion,
etc.) applied to sequential data. Specifically, I am working on building multi-modal large language models
with a focus on audio.
During my Ph.D., I focused on time-domain waveform data (speech and audio) to advance generative modeling for audio.
I am also broadly interested in speech and audio applications, including text-to-speech, voice conversion, music generation, neural audio codecs, and audio language models.
Representative papers are highlighted.
|
|
Edit-A-Video: Single Video Editing with Object-Aware Consistency
Chaehun Shin*,
Heeseung Kim*,
Che Hyun Lee,
Sang-gil Lee,
Sungroh Yoon
Asian Conference on Machine Learning (ACML), Best Paper Award, 2023
project page /
arXiv
Edit-A-Video is a diffusion-based one-shot video editing model that solves a background inconsistency problem using a new sparse-causal mask blending method.
|
|
BigVGAN: A Universal Neural Vocoder with Large-Scale Training
Sang-gil Lee,
Wei Ping,
Boris Ginsburg,
Bryan Catanzaro,
Sungroh Yoon
International Conference on Learning Representations (ICLR), 2023
project page /
arXiv /
code /
demo
BigVGAN is a universal audio synthesizer that achieves unprecedented zero-shot performance on various
unseen
environments using anti-aliased periodic nonlinearity and large-scale training.
|
|
PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive
Prior
Sang-gil Lee,
Heeseung Kim,
Chaehun Shin,
Xu Tan,
Chang Liu,
Qi Meng,
Tao Qin,
Wei Chen,
Sungroh Yoon,
Tie-Yan Liu
International Conference on Learning Representations (ICLR), 2022
project page /
arXiv /
code /
poster
PriorGrad presents an efficient method for constructing a data-dependent non-standard Gaussian prior for
training and sampling from diffusion models applied to speech synthesis.
|
|
NanoFlow: Scalable Normalizing Flows with Sublinear Parameter Complexity
Sang-gil Lee,
Sungwon Kim,
Sungroh Yoon
Neural Information Processing Systems (NeurIPS), 2020
arXiv /
code /
poster
NanoFlow uses a single neural network for multiple transformation stages in normalizing flows, which
provides an efficient compression for flow-based generative models.
|
|
FloWaveNet: A Generative Flow for Raw Audio
Sungwon Kim,
Sang-gil Lee,
Jongyoon Song,
Jaehyeon Kim,
Sungroh Yoon
International Conference on Machine Learning (ICML), 2019
arXiv /
code /
demo /
poster
FloWaveNet is one of the first flow-based generative models for fast and parallel synthesis of audio waveforms, enabling a likelihood-based neural vocoder without any auxiliary loss.
|
|
One-Shot Learning for Text-to-SQL Generation
Dongjun Lee,
Jaesik Yoon,
Jongyoon Song,
Sang-gil Lee,
Sungroh Yoon
arXiv preprint, 2019
arXiv
Template-based one-shot text-to-SQL generative model based on a Candidate Search Network & Pointer
Network.
|
|
Polyphonic Music Generation with Sequence Generative Adversarial Networks
Sang-gil Lee,
Uiwon Hwang,
Seonwoo Min,
Sungroh Yoon
arXiv preprint, 2017
arXiv /
code
This work investigates an efficient musical word representation from polyphonic MIDI data for SeqGAN, simultaneously capturing chords and melodies with dynamic timings.
|
|
An Efficient Approach to Boosting Performance of Deep Spiking Network Training
Seongsik Park,
Sang-gil Lee,
Hyunha Nam,
Sungroh Yoon
Neural Information Processing Systems (NIPS) Workshop on Computing with Spikes, 2016
arXiv
Investigates various initialization and backward control schemes of the membrane potential for training
deep spiking networks.
|
|
Research Scientist @ NVIDIA
Jan 2024 - Current
In the Applied Deep Learning Research team, I am working on building multi-modal large language models with a focus on audio.
Sep 2021 - Jan 2022
As a research intern, I worked on improving neural vocoders for high quality speech and audio synthesis, advised by
Wei Ping and
Boris Ginsburg.
|
|
Senior Research Engineer @ Qualcomm AI Research
Feb 2023 - Jan 2024
I developed a framework for Text-to-Speech (TTS) research and development, optimized for deployment on edge devices.
|
|
Research Intern @ Microsoft Research Asia
Dec 2020 - May 2021
I worked on diffusion-based generative models for speech synthesis, advised by
Xu Tan,
Chang Liu,
Qi Meng, and
Tao Qin.
Dec 2018 - Feb 2019
I worked on the Antigen Map
Project,
where I applied sequence models to predict antigens from genetic sequences, advised by
Bin Shao.
|
|
Research Intern @ Kakao Corporation
Jul 2019 - Sep 2019
I worked on improving speech synthesis and voice conversion models, advised by
Jaehyeon Kim and Jaekyong Bae.
|
|
Ph.D. in Seoul National University
Electrical and Computer Engineering
Sep 2016 - Feb 2023
Dissertation: Deep Generative Model for Waveform Synthesis
Integrated M.S./Ph.D. Program. Advisor: Sungroh Yoon.
Dual B.S. in Seoul National University
Electrical and Computer Engineering / Applied Biology and Chemistry
Mar 2010 - Aug 2016
Cum Laude
|
Projects
During my time at DSAIL, I collaborated with Seoul
National University Hospital on a computer-aided diagnosis project for liver cancer.
The project yielded a high-performance medical object detection model to help reduce human errors from radiologists for the early detection of liver disease.
|
|
Robust End-to-End Focal Liver Lesion Detection Using Unregistered Multiphase Computed
Tomography Images
Sang-gil Lee*,
Eunji Kim*,
Jae Seok Bae*,
Jung Hoon Kim,
Sungroh Yoon
IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI), 2021
arXiv /
code
GSSD++ provides robustness to unregistered multi-phase CT images for detecting liver lesions using
attention-guided multi-phase alignment with deformable convolutions.
|
|
Liver Lesion Detection from Weakly-Labeled Multi-phase CT Volumes with a Grouped Single Shot
MultiBox Detector
Sang-gil Lee,
Jae Seok Bae,
Hyunjae Kim,
Jung Hoon Kim,
Sungroh Yoon
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI),
2018
arXiv /
code
GSSD pioneers a focal liver lesion detection model from multi-phase CT images, which reflects a
real-world clinical practice of radiologists.
|
Invited Talks, Honors, and Awards
|
- Invited Talk "Deep Generative Model for Speech and Audio", Soongsil
University, 2023
- Invited Talk "Towards Universal Neural Waveform Synthesis", Naver, 2022
- Invited Talk "On Neural Waveform Synthesis", Supertone, 2022
- Invited Talk "Prior Enhancement for Deep Generative Models", Hyundai
AIRS,
2022
- Student Conference Scholarship, Google, 2022
- Invited Talk "Neural Speech Synthesis: a 2021 Landscape", NVIDIA,
2021
- Graduate Student of the Year, DSAIL, Seoul National University, 2019
- Best Paper Award, Hyundai AIR Lab (currently AIRS), 2019
- Stars of Tomorrow (Excellent Intern), Microsoft Research Asia,
2019
- Invited Talk "RNN Plus Alpha: Is RNN the False Prophet?", Naver CLOVA,
2018
- Cum Laude, Seoul National University, 2016
- Academic Performance Scholarship, Seoul National University, 2010 -
2016
- Academic Scholarship (fully funded), SBS Foundation, 2010 -
2016
|
I am a PC hardware enthusiast, always eager to learn about computers in my free time.
As a hobbyist DJ, I enjoy house music. My mixes on YouTube
|
Last update: Jan 2024. Template borrowed from here.
|
|