avatar

Patrick Amadeus Irawan

PhD Student
MBZUAI
patrick.irawan@mbzuai.ac.ae


About Me

Research Interests

I am interested in building multimodal systems that reasonably leverage its perceptual & semantic inputs to make grounded decisions and responses, rather than relying on shortcuts from a single modality.
  1. Modality Utilization.
    I study how multimodal models decide what to attend to, and why they often rely on dominant signals (e.g., language) instead of fully using other multimodal inputs. This leads to shortcut learning, hallucination, and weak grounding.
    • Synthetic-VQA-NLE: synthetic explainable VQA generation pipeline, evaluates VQA reasoning of SOTA VLMs (at the time)
    • SeeingCulture: shows how lack of domain awareness harms visual grounding and segmentation ability
    • ConfusedTourists: evaluates how semantically-aligned perturbation context changes trigger biased visual grounding
    • CountingTricks: exposes straightforward perceptual counting ability of VLMs and how attention-balancing RL post-training may help mitigate this
  2. Post-Training & Cross-Modal Alignment.
    To address these issues, I work on post-training methods that improve how models use and connect modalities, including distillation and dynamic supervision signals (action-conditioned signals, RL). The goal is to recover missing abilities and strengthen cross-modal alignment.
    • LinguDistill: uses cross-modal distillation to recover degraded language abilities in VLMs
    • Synthetic-VQA-NLE: also supports more grounded supervision signals
    • Ongoing: world model evaluation for plan–action consistency, and memory-based VLMs to improve long-term grounding
  3. Large-Scale Evaluation.
    I also keen on being critical at designing faithful large-scale evaluation setups to understand how models behave under distribution shifts, missing modalities, or limited resources, with focus on reliability and robustness at scale.
    • WorldCuisine: benchmarks domains-specific cultural and multilingual reasoning in VQA
    • SEACrowd: builds large-scale multimodal datasets hub for underrepresented languages in 4+ modality mixtures
    • ProxyLM: efficient multilingual performance prediction system leveraging data and language features
    • DataRubrics: studies data quality and faithful benchmark creation aspects

Updates

Publications

2026
  1. LinguDistill: Recovering Linguistic Ability in Vision Language Models via Selective Cross-Modal Distillation teaser Preprint
    Patrick Amadeus Irawan, Elang Fuadi, Satendra Kumar, Alham Fikri Aji, Yova Kementchedjhieva
    Preprint, 2026.
    Proposes a selective distillation strategy to recover linguistic competence in VLMs without giving up multimodal capability.

  2. Vision Language Models are Confused Tourists teaser CVPR 2026 Findings
    Patrick Amadeus Irawan, Ikhlasul Akmal Hanif, Muhammad Dehan Al Kautsar, Genta Indra Winata, Fajri Koto, Alham Fikri Aji
    Computer Vision and Pattern Recognition Conference (CVPR), 2026 Findings.
    Studies how VLMs misread culturally conflicting visual situations, exposing grounding failures that are invisible to standard benchmarks.

  3. M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG teaser CVPR 2026
    David Anugraha, Patrick Amadeus Irawan, Anshul Singh, En-Shiun Annie Lee, Genta Indra Winata
    Computer Vision and Pattern Recognition Conference (CVPR), 2026.
    Evaluates whether multimodal retrieval actually helps multilingual and multicultural question answering at scale, and where it fails.

2025
  1. Seeing Culture: A Benchmark for Visual Reasoning and Grounding teaser EMNLP 2025
    Burak Satar, Zhixin Ma, Patrick Amadeus Irawan, Wilfried A. Mulyawan, Jing Jiang, Ee-Peng Lim, Chong-Wah Ngo
    Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025.
    Builds a benchmark for culture-sensitive visual reasoning and grounding, pushing evaluation beyond object recognition into contextual interpretation.

  2. Entropy2Vec: Crosslingual Language Modeling Entropy as End-to-End Learnable Language Representations teaser MRL @ EMNLP 2025
    Patrick Amadeus Irawan, Ryandito Diandaru, Belati Jagad Bintang Syuhada, Randy Zakya Suchrady, Alham Fikri Aji, Genta Indra Winata, Fajri Koto, Samuel Cahyawijaya
    Multilingual Representation Learning Workshop at EMNLP, 2025.
    Introduces entropy-based crosslingual representations that treat language modeling uncertainty as an end-to-end learnable signal.

  3. WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines teaser NAACL 2025
    Genta Indra Winata, Frederikus Hudi, Patrick Amadeus Irawan, David Anugraha, Rifki Afina Putri, Yutong Wang, Adam Nohejl, Ubaidillah Ariq Prathama, Nedjma Ousidhoum, and others
    North American Chapter of the Association for Computational Linguistics (NAACL), 2025.
    Co-leads a benchmark that tests multilingual and multicultural VQA through food, culture, and visual context rather than English-centric priors.

  4. ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models teaser NAACL 2025
    David Anugraha, Genta Indra Winata, Chenyue Li, Patrick Amadeus Irawan, En-Shiun Annie Lee
    North American Chapter of the Association for Computational Linguistics (NAACL), 2025.
    Predicts multilingual model performance with cheaper proxy models, reducing evaluation cost when exploring large design spaces.

  5. Towards Efficient and Robust VQA-NLE Data Generation with Large Vision-Language Models teaser COLING 2025
    Patrick Amadeus Irawan, Genta Indra Winata, Samuel Cahyawijaya, Ayu Purwarianti
    International Conference on Computational Linguistics (COLING), 2025.
    Develops a more efficient pipeline for generating VQA explanations with VLMs, improving synthetic supervision for grounded reasoning.

  6. Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability teaser Preprint
    Genta Indra Winata, David Anugraha, Emmy Liu, Alham Fikri Aji, Shou-Yi Hung, Aditya Parashar, Patrick Amadeus Irawan, and others
    Preprint, 2025.
    Proposes an automated scorecard for dataset quality and accountability, making data auditing more systematic and comparable.

2024
  1. SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages teaser EMNLP 2024
    Holy Lovenia, Rahmad Mahendra, Salsabil Maulana Akbar, Lester James V Miranda, Jennifer Santoso, Elyanah Aco, ..., Patrick Amadeus Irawan, and others
    Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024.
    Contributes to a multilingual multimodal data hub and benchmark suite centered on Southeast Asian languages, expanding evaluation beyond high-resource settings.

  2. Leveraging IoT and Machine Learning for Efficient Rice Stock Monitoring and Prediction teaser APSIPA ASC 2024
    Nana Sutisna, Aditya Prawira Nugroho, Christopher Jeffrey, Patrick Amadeus Irawan, Rizky Ramadhana, Ronggur Mahendra, Michael Jonathan, Infall Syafalni, Trio Adiono
    Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2024.
    Applies machine learning and sensor systems to real-world stock monitoring, showing the engineering side of my research background.

Experience & Service

Selected Experience

Reviewing