I am an AI Researcher at AITRICS. My research is in multimodal learning, spanning vision–language and speech systems. I am currently interested in how these systems behave under linguistic variation, such as how dialect affects text-to-image safety filters and how speech recognition degrades under real-world distribution shift.
Before AITRICS, I received my M.S. in Artificial Intelligence from POSTECH (August 2025), advised by Prof. Jaeho Lee at the Efficient Learning Lab, where I worked on language-guided multimodal learning, including text-guided image compression. I previously worked with Prof. Tae-Hyun Oh on audio captioning. I earned my B.S. in Electrical and Electronics Engineering from Chung-Ang University.
I work on multimodal learning, with a focus on how vision–language and speech systems behave across different user groups and conditions.
(* means ‘equal contribution’)
Minkyu Kim, Juhwan Choi, YoungBin Kim, “Not Safe for All: Auditing the Dialect Penalty in Text-to-Image Safety Pipelines”, 2026
Minkyu Kim*, Vincent-Daniel Yun*, Youngrae Kim*, Suin Cho, Woosang Lim, Sunwoo Lee, “Rethinking Layer Redundancy: Calibration Matters More Than Search in LLM Depth Pruning”, 2026
Jean Seo*, Minkyu Kim*, Jeonguk Lee, Jisoo Jung, Wooseok Han, Eunho Yang, “When Multiple Scripts Matter: Evaluating ASR in Clinical Settings”, Interspeech, 2026
Hagyeong Lee*, Minkyu Kim*, Jun-Hyuk Kim, Seungeon Kim, Dokwan Oh, Jaeho Lee, “Neural Image Compression with Text-guided Encoding for both Pixel-level and Perceptual Fidelity”, ICML, 2024
Minkyu Kim*, Kim Sung-Bin*, Tae-Hyun Oh, “Prefix Tuning for Automated Audio Captioning”, ICASSP (Oral), 2023