avatar

Patrick Amadeus Irawan

PhD Student
MBZUAI
patrick.irawan@mbzuai.ac.ae


About Me

Research Interests

I am interested in creating a multimodal systems that truly perceives all inputs to make grounded judgments and responses, rather than relying on modality-specific shortcuts or shallow priors.
  1. Modality Utilization. I study how multimodal models allocate attention and capacity across modalities, and why they often under-utilize informative signals in favor of dominant ones. My work focuses on understanding when and why models fail to fully exploit available modalities, leading to sparse utilization, shortcut learning, hallucination, or language over-reliance. This direction is reflected in the Synthetic-VQA-NLE framework, which enables the generation of explainable and sound synthetic VQA data; SeeingCulture, which diagnoses the lack of domain awareness that distorts visual grounding; and ConfusedTourists, which points out how perturbing context can trigger biased behavior in grounding systems.

  2. Post-Training on Action-Conditioned Supervision Signals. To mitigate the above problems, I work on both training and non-training adaptations to improve cross-modal grounding and utilization. My current interests include post-training signals that recover or sharpen modality-specific abilities, distillation, and supervision schemes that could eventually support action-conditioned multimodal systems in which perception, language, and decision-making are tightly coupled. This direction is reflected most directly in LinguDistill, for its cross-modal distillation attempt to recover the language-centric ability of VLMs, and also connects back to Synthetic-VQA-NLE as an earlier step toward better grounded supervision. As you are reading this webpage, I am also actively leading a world model evaluation research to assess plan-action cross-modal consistency, as well as involved in a memory-based VLMs research to boost modality grounding and prevent modality-specific forgetting.

  3. Large-Scale Evaluation. I also often investigate model robustness under varying resource conditions and modality availability. This involves designing evaluation protocols and infrastructure that probe modality reliance, cross-modal consistency, and broad inclusivity at scale. This direction is reflected in award-winning WorldCuisine, SEACrowd both as a paper and through active organization involvement, ProxyLM, and DataRubrics, which together study scalable benchmarking, performance prediction, and data quality assessment.

Updates

Publications

2026
  1. LinguDistill: Recovering Linguistic Ability in Vision Language Models via Selective Cross-Modal Distillation teaser Preprint
    Patrick Amadeus Irawan, Elang Fuadi, Satendra Kumar, Alham Fikri Aji, Yova Kementchedjhieva
    Preprint, 2026.
    Proposes a selective distillation strategy to recover linguistic competence in VLMs without giving up multimodal capability.

  2. Vision Language Models are Confused Tourists teaser CVPR 2026 Findings
    Patrick Amadeus Irawan, Ikhlasul Akmal Hanif, Muhammad Dehan Al Kautsar, Genta Indra Winata, Fajri Koto, Alham Fikri Aji
    Computer Vision and Pattern Recognition Conference (CVPR), 2026 Findings.
    Studies how VLMs misread culturally conflicting visual situations, exposing grounding failures that are invisible to standard benchmarks.

  3. M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG teaser CVPR 2026
    David Anugraha, Patrick Amadeus Irawan, Anshul Singh, En-Shiun Annie Lee, Genta Indra Winata
    Computer Vision and Pattern Recognition Conference (CVPR), 2026.
    Evaluates whether multimodal retrieval actually helps multilingual and multicultural question answering at scale, and where it fails.

2025
  1. Seeing Culture: A Benchmark for Visual Reasoning and Grounding teaser EMNLP 2025
    Burak Satar, Zhixin Ma, Patrick Amadeus Irawan, Wilfried A. Mulyawan, Jing Jiang, Ee-Peng Lim, Chong-Wah Ngo
    Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025.
    Builds a benchmark for culture-sensitive visual reasoning and grounding, pushing evaluation beyond object recognition into contextual interpretation.

  2. Entropy2Vec: Crosslingual Language Modeling Entropy as End-to-End Learnable Language Representations teaser MRL @ EMNLP 2025
    Patrick Amadeus Irawan, Ryandito Diandaru, Belati Jagad Bintang Syuhada, Randy Zakya Suchrady, Alham Fikri Aji, Genta Indra Winata, Fajri Koto, Samuel Cahyawijaya
    Multilingual Representation Learning Workshop at EMNLP, 2025.
    Introduces entropy-based crosslingual representations that treat language modeling uncertainty as an end-to-end learnable signal.

  3. WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines teaser NAACL 2025
    Genta Indra Winata, Frederikus Hudi, Patrick Amadeus Irawan, David Anugraha, Rifki Afina Putri, Yutong Wang, Adam Nohejl, Ubaidillah Ariq Prathama, Nedjma Ousidhoum, and others
    North American Chapter of the Association for Computational Linguistics (NAACL), 2025.
    Co-leads a benchmark that tests multilingual and multicultural VQA through food, culture, and visual context rather than English-centric priors.

  4. ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models teaser NAACL 2025
    David Anugraha, Genta Indra Winata, Chenyue Li, Patrick Amadeus Irawan, En-Shiun Annie Lee
    North American Chapter of the Association for Computational Linguistics (NAACL), 2025.
    Predicts multilingual model performance with cheaper proxy models, reducing evaluation cost when exploring large design spaces.

  5. Towards Efficient and Robust VQA-NLE Data Generation with Large Vision-Language Models teaser COLING 2025
    Patrick Amadeus Irawan, Genta Indra Winata, Samuel Cahyawijaya, Ayu Purwarianti
    International Conference on Computational Linguistics (COLING), 2025.
    Develops a more efficient pipeline for generating VQA explanations with VLMs, improving synthetic supervision for grounded reasoning.

  6. Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability teaser Preprint
    Genta Indra Winata, David Anugraha, Emmy Liu, Alham Fikri Aji, Shou-Yi Hung, Aditya Parashar, Patrick Amadeus Irawan, and others
    Preprint, 2025.
    Proposes an automated scorecard for dataset quality and accountability, making data auditing more systematic and comparable.

2024
  1. SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages teaser EMNLP 2024
    Holy Lovenia, Rahmad Mahendra, Salsabil Maulana Akbar, Lester James V Miranda, Jennifer Santoso, Elyanah Aco, ..., Patrick Amadeus Irawan, and others
    Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024.
    Contributes to a multilingual multimodal data hub and benchmark suite centered on Southeast Asian languages, expanding evaluation beyond high-resource settings.

  2. Leveraging IoT and Machine Learning for Efficient Rice Stock Monitoring and Prediction teaser APSIPA ASC 2024
    Nana Sutisna, Aditya Prawira Nugroho, Christopher Jeffrey, Patrick Amadeus Irawan, Rizky Ramadhana, Ronggur Mahendra, Michael Jonathan, Infall Syafalni, Trio Adiono
    Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2024.
    Applies machine learning and sensor systems to real-world stock monitoring, showing the engineering side of my research background.

Experience & Service

Selected Experience

Reviewing