About Me

Here is a virtual but warm greeting from Ziqiao, as you just came across this page! As many of my friends found it hard to pronounce my name (马子乔, pronounced as dzuh-chyow in Mandarin), it's absolutely fine to just call me Martin alternatively.

Quick Pointers

CV (2024) / Research Overview / CSE595 NLP / Google Scholar / Semantic Scholar

For fun...

Bio / Profile Photo

Ziqiao (Martin) Ma is a 4th year Ph.D. candidate at the University of Michigan in Computer Science and Engineering advised by Professor Joyce Chai. His work has been supported in part by the Weinberg Cognitive Science Fellowship. He is also a part-time researcher at Adobe Research. Previously, he worked with Amazon AGI. He received an Outstanding Paper Award at ACL 2023, and an Amazon Alexa Prize Award. He taught Natural Language Processing and won an Outstanding Graduate Student Instructor Award. He co-organized Bi-Align @ ICLR 2025, SpLU-RoboNLP @ ACL 2024, and co-instructed tutorial on Learning Language through Grounding @ NAACL 2025.

My Research (TL;DR)

The three constant themes of my research are Language, Interaction, and Embodiment from a scalable and cognitive angle.

My ultimate goal is to enable Mechanistic Alignment & Grounding for Interactive Cognition (aka MAGIC). I include a more dynamic and spontaneous document of my thoughts in my Research Blueprint.

Selected Awards/Recognitions

Selected Fellowships/Scholarships

Updates

⚠ ⚠ ⚠

News

Archived news...
  • [May. 2023] I started my intern with Amazon Alexa AI (Amazon AGI)!!
  • [Sep. 2022] I will serve as the Poster/Demo Session Chair for Michigan AI Symposium 2022.
  • [Aug. 2022] I will be the Graduate Student Instructor for EECS 595 (NLP) in Fall 2022 at Michigan.
  • [Mar. 2021] I will join the family of the Michigan AI as a Ph.D. student this fall. Go Blue!
  • [Dec. 2020] I will be the Instructional Aide for EECS 492 (Intro. AI) in Winter 2021 at Michigan.
  • -->

Paper Alerts

Archived news...
  • [Apr. 2023] One paper to appear in IJCAI 2023, see you in Macau :)
  • [Oct. 2022] Two papers to appear in EMNLP 2022, and I will serve as an on-site volunteer in Abu Dhabi :)

Seminar Talks

Previous talks...
  • [20240712] Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations @ Deep Learning: Classics and Trends (DLCT). Host: Rosanne Liu
  • [20240705] Language Grounding to the Visual World and Human Interactions: How Far Are We from Embodied Dialogue Agents @ Data Science Group, KAIST.
  • [20240627] Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations @ CoCoDev Seminar. Host: Abdellah Fourtassi
  • [20240529] Language Grounding to the Visual World and Human Interactions: How Far Are We from Embodied Dialogue Agents @ University of Maryland. Host: Furong Huang

Experiences

Education

Industry

Current Teaching

Guest Lectures

Academic Services

The 1st Workshop on Bidirectional Human-AI Alignment (Bi-Align @ ICLR 2025)

Co-organizer

[Homepage/CFP][OpenReview]

The 4th International Combined Workshop on Spatial Language Understanding and Grounded Communication for Robotics (SpLU-RoboNLP @ ACL 2024)

Co-organizer

[Homepage/CFP][OpenReview][Proceedings]

The 5th Michigan AI Symposium: AI & Accessibility (2022)

Poster/Demo Session Chair

[Homepage/CFP]

Publications

Show by... ( Recent Selection / Cognitive AI Selection / Publication Year / All Research Topics )

Research Topics: Language Grounding to Visual Perception / Language Grounding to Human Interaction / Language-Guided Multimodal Generation / Embodied/Situated Language Intelligence / Alignment and (Inter)active Learning / Teaching & Community Services

* indicates equal contributions; § indicates correspondence and mentoring.

Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities
Zheyuan Zhang*, Fengyuan Hu*, Jayjun Lee*, Freda Shi, Parisa Kordjamshidi, Joyce Chai, Ziqiao Ma§

The 1st Pluralistic Alignment Workshop @ NeurIPS 2024

Web / Paper

TL;DR...
  • We introduce COMFORT, a protocol to evaluate spatial reasoning in VLMs across multilingual and ambiguous frames of reference (FoR);
  • VLMs exhibit poor robustness and consistency, lack the flexibility to accommodate multiple FoRs, and fail to adhere to language-specific or culture-specific conventions in cross-lingual tests.
Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions
Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Ziqiao Ma, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Sushrita Rakshit, Chenglei Si, Yutong Xie, Jeffrey P. Bigham, Frank Bentley, Joyce Chai, Zachary Lipton, Qiaozhu Mei, Rada Mihalcea, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, David Jurgens

Preprint 2024

Paper

TL;DR...
  • We present a conceptual framework of Bidirectional Human-AI Alignment to organize the literature from a human-centered perspective;
  • We survey studies of aligning AI to humans, that ensures AI produces the intended outcomes determined by humans;
  • We survey studies of aligning humans to AI, which aims to help individuals and society adjust to AI advancements both cognitively and behaviorally.
Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models
Yue Zhang*, Ziqiao Ma*, Jialu Li*, Yanyuan Qiao*, Zun Wang*, Joyce Chai, Qi Wu, Mohit Bansal, Parisa Kordjamshidi

TMLR 2024 (Survey Certificate)

Paper

TL;DR...
  • We provide a top-down review on using foundation models to address VLN challenges;
  • The roles of foundation models are divided into world model, human model, and the VLN agent.
Multi-Object Hallucination in Vision-Language Models
Xuweiyi Chen*, Ziqiao Ma*, Xuejun Zhang*, Sihan Xu, Shengyi Qian, Jianing Yang, David Fouhey, Joyce Chai

NeurIPS 2024 / The 3rd Workshop on Advances in Language and Vision Research (ALVR) @ ACL 2024

Web / Paper

TL;DR...
  • We introduce ROPE, an evaluation protocol for hallucination across multiple objects using visual referring prompts;
  • VLMs hallucinate more with multiple objects, are influenced by object class distribution, and exhibit behavior driven by data-specific and intrinsic factors.
Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations
Ziqiao Ma*, Zekun Wang*, Joyce Chai

The 1st Workshop on LLMs and Cognition (LLMCog) @ ICML 2024 (Oral)

Paper / Code

TL;DR...
  • We introduce a trial-and-demonstration (TnD) learning framework that incorporates three components: student trials, teacher demonstrations, and a reward conditioned on language competence at various developmental stages;
  • TnD accelerates word representation learning for student models of equal and smaller numbers of parameters, and both trials and demonstrations matter.
  • We further show that the teacher's choices of words influence students' word-specific learning efficiency, and a practice-makes-perfect effect is evident by a strong correlation between the frequency of words in trials and their respective learning curves.
DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences
Yidong Huang, Jacob Sansom, Ziqiao Ma§, Felix Gervits, Joyce Chai

IROS 2024 / The 1st Vision and Language for Autonomous Driving and Robotics Workshop (VLADR) @ CVPR 2024

Web / Paper

TL;DR...
  • We introduce DriVLMe, a video-language-model-based agent to facilitate natural and effective communication between humans and autonomous vehicles that perceive the environment and navigate;
  • We develop DriVLMe from both embodied experiences in a simulated environment and social experiences from real human dialogue.
GroundHog: Grounding Large Language Models to Holistic Segmentation
Yichi Zhang, Ziqiao Ma, Xiaofeng Gao, Suhaila Shakiah, Qiaozi Gao, Joyce Chai

CVPR 2024

Web / Paper

TL;DR...
  • We introduce GroundHog, a multimodal large language model grounded in holistic segmentation, using a masked feature extractor and unified grounding masks for fine-grained visual understanding.
  • Trained on the curated M3G2 dataset, GroundHog outperforms in language grounding tasks, reduces object hallucination, and offers improved diagnosis for complex visual inputs.
Inversion-Free Image Editing with Language-Guided Diffusion Models
Sihan Xu*, Yidong Huang*, Jiayi Pan, Ziqiao Ma§, Joyce Chai

CVPR 2024

Web / Paper / GitHub /

TL;DR...
  • We derive Denoising Diffusion Consistent Model (DDCM), showing that when the initial sample is known, a special variance schedule reduces the denoising step to the same form as the multi-step consistency sampling;
  • DDCM implies a inversion-free strategy without explicit inversion in sampling for image editing;
  • We further unify the attention control mechanisms in an inference time algorithm for text-guided editing, taking less than 3 seconds per edit.
CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation
Sihan Xu*, Ziqiao Ma*, Yidong Huang, Honglak Lee, Joyce Chai

NeurIPS 2023

Web / Paper / GitHub /

TL;DR...
  • We introduce CycleNet from theoretic derivations, a model that incorporates cycle consistency (and a self-supervision loss) into diffusion model to regularize image manipulation;
  • CycleNet is robust even with very limited training data (around 2k) and requires minimal computational resources (1 GPU) to train.
Towards A Holistic Landscape of Situated Theory of Mind in Large Language Models
Ziqiao Ma, Jacob Sansom, Run Peng, Joyce Chai

EMNLP 2023 (Findings)

Paper / GitHub

TL;DR...
  • We taxonomize machine ToM into 7 mental state categories and delineate existing benchmarks to identify under-explored aspects of ToM;
  • Pilot studies for a holistic and situated evaluation of ToM to break ToM into individual components and treat LLMs as an agent who is physically situated in environments and socially situated in interactions with humans.
Towards Collaborative Plan Acquisition through Theory of Mind Modeling in Situated Dialogue
Cristian-Paul Bara, Ziqiao Ma, Yingzhuo Yu, Julie Shah, Joyce Chai

IJCAI 2023

Paper / GitHub

TL;DR...
  • We study collaborative plan acquisition in human-AI tasks, where agents predict missing task knowledge for themselves and their partners using perceptual and dialogue history;
  • We show that predicting a partner's missing knowledge, coupled with explicit modeling of dialogue moves and mental states, leads to better collaboration.
World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models
Ziqiao Ma*, Jiayi Pan*, Joyce Chai

ACL 2023 (🏆 Outstanding Paper Award)

Paper / GitHub / Poster

TL;DR...
  • We introduce OctoBERT, a visually grounded language model designed to acquire grounding ability during pre-training and enable fast mapping of new words through few-shot learning without explicit grounding supervision;
  • Visual grounding accelerates grounded word representation learning;
  • Imageability aligns positively with human intuition and prediction metrics, while concreteness shows opposite correlations -> need for language learning agents to acquire word meanings through physical interactions!
NLP Reproducibility For All: Understanding Experiences of Beginners
Shane Storks, Keunwoo Peter Yu, Ziqiao Ma, Joyce Chai

ACL 2023 (Theme Track)

Paper / GitHub / Poster

TL;DR...
  • We studied 93 NLP students replicating recent NLP papers;
  • Programming skills and paper comprehension had limited impact on effort, while accessible documentation, coding practices, and data availability were critical.
SEAGULL: An Embodied Agent for Instruction Following through Situated Dialog
Yichi Zhang, Jianing Yang, Keunwoo Peter Yu, Yinpei Dai, Shane Storks Yuwei Bao, Jiayi Pan, Nikhil Devraj, Ziqiao Ma, Joyce Chai

Alexa Prize SimBot Challenge 2023 (🏆 1st Place)

Paper

TL;DR...
  • SEAGULL is the winning solution to the Alexa Prize SimBot Challenge;
  • SEAGULL features a modular system combining neural and symbolic components for language understanding, vision, state tracking, and policy execution.
DOROTHIE: Spoken Dialogue for Handling Unexpected Situations in Interactive Autonomous Driving Agents
Ziqiao Ma, Benjamin VanDerPloeg*, Cristian-Paul Bara*, Huang Yidong*, Eui-In Kim, Felix Gervits, Matthew Marge, Joyce Chai

EMNLP 2022 (Findings)

Paper / GitHub / Data

TL;DR...
  • We present DOROTHIE, a simulation platform for studying sensorimotor-grounded dialogue in dynamic autonomous driving scenarios;
  • We create the Situated Dialogue Navigation (SDN), a continuous outdoor VLN benchmark with real human dialogue.
DANLI: Deliberative Agent for Following Natural Language Instructions
Yichi Zhang, Jianing Yang, Shane Storks, Jiayi Pan, Nikhil Devraj, Ziqiao Ma, Keunwoo Peter Yu, Yuwei Bao, Joyce Chai

EMNLP 2022 (Oral)

Paper / GitHub

TL;DR...
  • We propose DANLI, a neuro-symbolic deliberative agent for embodied AI that proactively reasons and plans using neural and symbolic representations;
  • DANLI outperforms reactive baselines by over 70% on the TEACh benchmark.
Partition-Based Active Learning for Graph Neural Networks
Jiaqi Ma*, Ziqiao Ma*, Joyce Chai, Qiaozhu Mei

TMLR 2023 (Survey Certificate) / The 8th International Workshop on Deep Learning on Graphs (DLG) @ KDD 2022 (Oral)

Paper / GitHub / Poster

TL;DR...
  • We introduce GraphPart, a partition-based active learning method for GNN that selects representative nodes from graph partitions for querying;
  • GraphPart is motivated by classification error analysis under smoothness assumptions;
  • GraphPart outperforms existing active learning methods across benchmarks and budget constraints and reduces the accuracy disparity compared to random training node selection across most datasets.

If you like my figures here, I highly recommend you also visit SiX's homepage.

Misc

Fun Facts

Game Design (More)

I seriously considered a career in game design, and although I ultimately chose a different path, it provided excellent preparation for my work in embodied AI research, which often involves intensive programming with simulators.

Here are some of the projects we worked on:

Contracts
Zekai Fan, Shiyu Qu, Juan Rivera Plata, Yihao Huang, Ziqiao Martin Ma

[trailer][itch.io][indidb][tigsource]

  • A turn-based tactic video game.

Mentoring

I understand that access to research oppotunities can be hard, particularly for beginners and the underrepresented. If there is a match in research interests, I am happy to collaborate with undergrads and masters when I have the bandwidth. Please check my Mentoring FAQ and fill out our application form to indicate that you want to collaborate with me.
I've been fortunate to have (co-)mentered and collaborated with these amazingly talented young researchers:

Random Tours

Chat?

If you would like to have a random (virtual) coffee chat with me, please visit my calendly page. I am happy to talk if you want to share your stress or just want to chat about life in general (when I have time), but be sure to check out the On-Campus Mental Health Resources @ Michigan.

Get In Touch

You are welcome to drop me a message :)

  • Phone

    xxx-xxx-xxxx
  • marstin0607
  • ziqiao_ma
  • Address

    Bob and Betty Beyster Building 4909,
    2260 Hayward Street,
    Ann Arbor, MI 48109.