Hi, I’m Minkyu Kim 👋

I am an AI Researcher at AITRICS. My research is in multimodal learning, covering vision–language and speech systems. I am broadly interested in making these systems efficient and reliable in deployment, from model compression and pruning to evaluating how they behave across diverse users, languages, and real-world conditions.

Before AITRICS, I received my M.S. in Artificial Intelligence from POSTECH (August 2025), advised by Prof. Jaeho Lee at the Efficient Learning Lab, where I worked on language-guided multimodal learning, including text-guided image compression. I previously worked with Prof. Tae-Hyun Oh on audio captioning. I earned my B.S. in Electrical and Electronics Engineering from Chung-Ang University.

Research Interests

I work on multimodal learning, with a focus on building vision–language and speech systems that are both efficient and reliable.

Vision–language and audio–language modeling (text-guided image compression, audio captioning)
Efficient models: neural compression and LLM depth pruning
Robustness and evaluation of multimodal systems under real-world distribution shift, including linguistic and demographic variation

News 📰

2026.06 “When Multiple Scripts Matter: Evaluating ASR in Clinical Settings” accepted to Interspeech 2026.
2025.11 Joined AITRICS as an AI Researcher.
2025.08 Received M.S. in Artificial Intelligence from POSTECH.

Publication 📜

(* means ‘equal contribution’)

Preprints / Under Review

Minkyu Kim, Juhwan Choi, YoungBin Kim, “Not Safe for All: Auditing the Dialect Penalty in Text-to-Image Safety Pipelines”, 2026

Minkyu Kim*, Vincent-Daniel Yun*, Youngrae Kim*, Suin Cho, Woosang Lim, Sunwoo Lee, “Rethinking Layer Redundancy: Calibration Matters More Than Search in LLM Depth Pruning”, 2026

Published

Jean Seo*, Minkyu Kim*, Jeonguk Lee, Jisoo Jung, Wooseok Han, Eunho Yang, “When Multiple Scripts Matter: Evaluating ASR in Clinical Settings”, Interspeech, 2026

Hagyeong Lee*, Minkyu Kim*, Jun-Hyuk Kim, Seungeon Kim, Dokwan Oh, Jaeho Lee, “Neural Image Compression with Text-guided Encoding for both Pixel-level and Perceptual Fidelity”, ICML, 2024

Minkyu Kim*, Kim Sung-Bin*, Tae-Hyun Oh, “Prefix Tuning for Automated Audio Captioning”, ICASSP (Oral), 2023

Services 💼

(Peer Review) IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)
(Peer Review) AdaptFM Workshop @ ICML 2026