[Home] Martin Ziqiao Ma

About Me

Here is a virtual greeting from Ziqiao. As many of my friends found it hard to pronounce my name (马子乔, pronounced as /ma˨˩˦ tsɨ˧˥ tɕʰiɑʊ˧˥/ in Mandarin), it's absolutely fine to just call me Martin alternatively.

Scholar / Blogs / CSE595 (NLP) / Chat?

For fun...

[Games] Play Floating / Zelda / Flappy Bird Dual Online

Bio / Profile Photo

I am a Ph.D. candidate at the University of Michigan in Computer Science and Engineering advised by Professor Joyce Chai. My work has been supported in part by the Weinberg Cognitive Science Fellowship. Previously, I worked part-time at Adobe Research, and also interned with MIT-IBM Watson AI Lab and Amazon Science. My research has received an Outstanding Paper Award at ACL 2023, and an Amazon Alexa Prize Award.

I co-instructed Natural Language Processing at UMich and won an Outstanding Graduate Student Instructor Award. I co-organized several workshops, including ERA @ CVPR 2026, CDL @ ACL 2026, SpaVLE @ NeurIPS 2025, LAW @ NeurIPS 2025, Bi-Align @ ICLR/CHI 2025, SpLU-RoboNLP @ ACL 2024, and co-instructed tutorials on Benchmarking @ NeurIPS 2025 and Language Grounding @ NAACL 2025.

My Research (TL;DR)

I am a language person at heart, but I believe in semantic externalism and embodied cognition, and thus the hardest questions in computational linguistics should not and can not be answered by language itself.

The three constant themes of my research are language, interaction, and embodiment, from a scalable and cognitive angle. As a linguist, I am interested in language grounding, language development, psycholinguistics, multilingualism, spoken language and pragmatics. As a machine learning researcher, I am interested in predictive and comparative evaluation as well as continual learning with minimal and natural supervision, e.g., self-supervised learning, learning and inference with no train-test boundary, learning with natural cross-modal correspondence.

Learning with natural supervision: grounding and alignment

Grounding language to the visual/physical world: [object-grounded encoder VLM (OctoBERT)], [pixel-grounded generative VLM (GroundHog)], [pixel-grounded video diffusion model (VEGGIE)].
Grounding language to human interactions: [theory of mind and mental states], [to learn from feedback], [to provide feedback], [to pluralistic preferences], [to plan collaboratively].

Learning with minimal supervision: modeling long and multimodal context

Self-supervised learning: [space-time reconstruction model (4D-LRM)], [next-embedding prediction (NEPA)].
Multi-modal representation learning: [learning word-object representations], [learning text-audio-visual representations].
Active and continue learning: [active learning on graphs], [fast spatial memory (Elastic TTT)].

Scientific inquiry: comparative evaluation and interpretability for (psycho)linguistics

Human-like supervision + scaling law ↔ developmental psychology: [visual grounding helps language acquisition], [trials and demos help language acquisition].
Long-context modeling ↔ memory systems: [elastic consolidation], [surprise and rehabituation].
Cognitively motivated analysis of (vision-)language model behaviors: [theory of mind (2D-ATOMS)], [object hallucination (ROPE)], [perspective taking (COMFORT)], [pragmatic principles (RefOI)], [world modeling (WM-ABench)], [eye gaze].
Behavioral and mechanistic interpretability of representations and learning dynamics: [the VLM-Lens toolkit], [the trajectory bank (TraBank) toolkit], [emergent grounding], [metonymic grounding].

Industry applications: multimodal interactive agents

Embodied dialogue agents (robots, autonomous vehicles, virtual assistants, etc.): [the simworld series], [deliberative planning (DANLI)], [modular embodied dialogue agent (SEAGULL)], [learning from exceptions (DOROTHIE)], [video-language navigation agent (DriVLMe)], [2.5D augmented VLA (AimBot)].
Visual content creation and design (visual generative models): [cycle consistency in diffusion (CycleNet)], [inversion-free image editing (InfEdit)], [grounded video reasoning and editing (VEGGIE)].

Updates

⚠ ⚠ ⚠

My collaborators (alphabet order) Andrew Yang (Multimodality, Memory, Graphs), Dezhi Luo (Cogsci, Philosophy, Consciousness), Ding Zhong (Vision, Multimodality, Reasoning), Jiawei Ren (Multimodality, Agent), Junyu Zhang (Reasoning, RL, Embodied AI), Shuyu Wu (Multimodality, Interpretability), Xiaokang Ye (Agents, Multimodality, Embodied AI), Xiaoxi Luo (Historical Linguistics, Interpretability), Xueyang Yu (Vision, Multimodality, Embodied AI), are looking for grad school opportunities. They are extremely talented, self-motivated, and pleasant to work with. Please consider them if you have openings!

Blogs

[Dec. 2025] Omnimodality from First Principles: Why We Bet in Autoregression in Latent Space (w/ Sihan Xu & Wenhao Chai)
[Dec. 2025] Test-Time Training Done Better: From Plastic Adaptation to Elastic Memory Consolidation
[Oct. 2025] An Open-Notebook Exploration of Emergent Grounding in Language Models (w/ Freda Shi)

News

[Pinned] I have joined ACL Year-Round Mentorship since 2025. Come to our monthly mentoring sessions and let's grow together :)
[Pinned] We are building GrowAI, an open-source community uniting researchers interested in human-like artificial general intelligence and growing AI like a child at scale.
[Dec. 2025] The Workshop and Challenge on Embodied Reasoning in Action (ERA) will be with CVPR 2026. Look forward to your best work and join us in Denver :)
[Dec. 2025] The 1st Workshop on Computational Developmental Linguistics (CDL) will be with ACL 2026. Look forward to your best work and join us in San Diego :)
[Aug. 2025] Our Tutorial on the Science of Benchmarking will be with NeurIPS 2025. See you in San Diego :)
[Jul. 2025] Workshop on Space in Vision, Language, and Embodied AI (SpaVLE) and Workshop on Bridging Language, Agent, and World Models (LAW) will be with NeurIPS 2025. Look forward to your best work and join us in San Diego :)

Archived news...

[May. 2025] I started my intern with MIT-IBM Watson!
[Jan. 2025] The 1st Workshop on Bidirectional Human-AI Alignment (Bi-Align) will be with ICLR 2025, with a Special Interest Group session at CHI 2025. Look forward to your best work and join us in Singapore and Yokohama :)
[Dec. 2024] The Tutorial on Learning Language through Grounding will be with NAACL 2025. See you in Albuquerque :)
[Aug. 2024] I will be the Graduate Student Instructor for CSE 595 (NLP) in Fall 2024 at Michigan. Check out our LLM-edition of the course!
[July 2024] Excited to be selected to receive the Weinberg Cognitive Science Fellowship!!
[Jan. 2024] I started my intern with Adobe Research and will work as a part-time researcher.
[Oct. 2023] The 4th International Combined Workshop on Spatial Language Understanding and Grounded Communication for Robotics (SpLU-RoboNLP) will be with ACL 2024. Look forward to your best work and join us in Bangkok :)
[Jul. 2023] Our paper "World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models" was selected for the Outstanding Paper Award at ACL 2023!!
[Jun. 2023] We won the 1st place in the Alexa Prize SimBot Challenge!!
[May. 2023] I started my intern with Amazon Alexa AI (Amazon AGI)!!
[Sep. 2022] I will serve as the Poster/Demo Session Chair for Michigan AI Symposium 2022.
[Aug. 2022] I will be the Graduate Student Instructor for EECS 595 (NLP) in Fall 2022 at Michigan.
[Mar. 2021] I will join the family of the Michigan AI as a Ph.D. student this fall. Go Blue!
[Dec. 2020] I will be the Instructional Aide for EECS 492 (Intro. AI) in Winter 2021 at Michigan.

Seminar and Technical Talks

[20251212] The Science of Evaluation and Benchmarking.
[20250416] Grounding Lexical Semantics in the Era of Vision-Language Models @ Theoretical and Computational Neuroscience Journal Club, JHU.
[20250326] Language Grounding to the Visual World and Human Interactions: How Far Are We from Embodied Dialogue Agents @ HAAG Seminar, Georgia Institute of Technology.
[20250206] Bridging Minds and Machines: Cognitive Insights for Developing and Evaluating AI Systems @ Foreseer Group, UMich.
[20241205] Language Grounding to the Visual World and Human Interactions: How Far Are We from Embodied Dialogue Agents @ University of Washington.
[20241203] Seeing What You See: Perceptual Perspective-Taking Towards a Situated Machine Theory of Mind @ Cognitive Science Seminar Series, UMich.

Previous talks...

[20240712] Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations @ Deep Learning: Classics and Trends (DLCT).
[20240705] Language Grounding to the Visual World and Human Interactions: How Far Are We from Embodied Dialogue Agents @ Data Science Group, KAIST.
[20240627] Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations @ CoCoDev Seminar.
[20240529] Language Grounding to the Visual World and Human Interactions: How Far Are We from Embodied Dialogue Agents @ University of Maryland.

Experiences

Industry

MIT-IBM Watson AI Lab, Cambridge | May 2025 - Aug 2025
Adobe Research, San Jose | Jan 2024 - May 2025
Amazon Science, Sunnyvale | May 2023 - Aug 2023

Selected Fellowships/Scholarships

Weinberg Cognitive Science Fellowship, 2024.
Rackham Doctoral Intern Fellowship, 2024.
John Wu & Jane Sun Excellence Scholarship, 2017.

Selected Research/Academic Awards

Outstanding Paper Award, ACL 2023.
First Place, Amazon Alexa Prize, 1st SimBot Challenge, 2023.
James B. Angell Scholar, University of Michigan, 2021.

Selected Service Recognitions

Outstanding Reviewer, EACL 2024.
Outstanding GSI Award, University of Michigan, 2022.

Current Teaching

[EECS 595 / CSE 595 / SI 561 / LING 541), Natural Language Processing (Fall 2022, Fall 2024), University of Michigan.
- Links: [Slides] [Homework] (To be made public, stay tuned)
[EECS 492] Introduction to Artificial Intelligence (Winter 2021), University of Michigan.
- Links: [Prolog Tutorial]

Tutorials and Guest Lectures

The Science of Benchmarking @ NeurIPS 2025

Co-instructor

[Homepage] [Slides] [Recording]

Learning Language through Grounding @ NAACL 2025

Co-instructor

[Homepage] [Abstract] [Slides] [Recording]

[EECS 542], Advanced Topics in Computer Vision (Nov. 2025), University of Michigan. Host: Prof. Stella Yu
[CSE 895], Selected Topics on Large Language Models (Mar. 2025), Michigan State University. Host: Prof. Parisa Kordjamshidi
[DSC 250], Advanced Data Mining (Mar. 2025), University of California San Diego. Host: Prof. Zhiting Hu
[COMPSCI 396], Reasoning and Planning in the Foundation Model Era (Feb. 2025), Northwestern University. Host: Prof. Manling Li

Previous guest lectures...

[DSC 291], Machine Learning with Few Labels (May. 2024), University of California San Diego. Host: Prof. Zhiting Hu
[PSYCH 745], Psychology of Language (Nov. 2023), University of Michigan. Host: Prof. Julie Boland
[EECS 692], Advanced Artificial Intelligence (Jan. 2022; Feb. 2025), University of Michigan. Host: Prof. Joyce Chai

Academic Services

Workshop on Embodied Reasoning in Action (ERA @ CVPR 2026)

Co-organizer

[Homepage/CFP] [OpenReview]

Workshop on Computational Developmental Linguistics (CDL @ ACL 2026)

Co-organizer

[Homepage/CFP] [OpenReview]

Workshop on Space in Vision, Language, and Embodied AI (SpaVLE @ NeurIPS 2025)

Co-organizer

[Homepage/CFP] [OpenReview]

Workshop on Bridging Language, Agent, and World Models for Reasoning and Planning (LAW @ NeurIPS 2025)

Co-organizer

[Homepage/CFP] [OpenReview]

The 1st Workshop / Special Interest Group on Bidirectional Human-AI Alignment (Bi-Align @ ICLR 2025 / CHI 2025)

Co-organizer

[Homepage/CFP] [OpenReview]

The 4th International Combined Workshop on Spatial Language Understanding and Grounded Communication for Robotics (SpLU-RoboNLP @ ACL 2024)

Co-organizer

[Homepage/CFP] [OpenReview] [Proceedings]

The 5th Michigan AI Symposium: AI & Accessibility (2022)

Poster/Demo Chair

[Homepage/CFP]

Mentor, ACL Year-Round Mentorship Program, since 2025.
Seminar Tsar, Michigan AI Seminar Series, 2023-2025.
Area Chair of ACL ARR (Multimodality and Language Grounding Track).
Program Committee/Reviewer: NLP venues (e.g., ARR, ACL, EMNLP, NAACL, EACL, COLING, COLM, ...), ML/DM venues (e.g., ICLR, ICML, NeurIPS, AISTATS, KDD, ECML, TMLR, DMLR, TNNLS...), CV venues (ICCV, CVPR, ECCV, WACV, ...), Cognitive Science venues (COGSCI), and sometimes AI/Robotics/HCI/CSS venues (AAAI, IJCAI, RA-L, HRI, CHI, ICWSM, ...).

Personal

Publications [.bib]

Show by... ( Recent Selection / All by Year / All by Topics )

Research Flavor: Rigorous Academia Flavor / Scalable Industry Flavor

Research Topics: Minimally Supervised Learning / Continue and Active Learning / Interpretability / Multimodality / Interaction & Agency / Embodiment & Robot Learning / Teaching & Community Service

* indicates equal contributions; § indicates correspondence and mentoring.

Coming Soon

Preprint 2026

Paper / Homepage / GitHub

TL;DR...

Coming Soon

Fast Spatial Memory with Scalable Elastic Test-Time Training

Ziqiao Ma, Xueyang Yu*, Haoyu Zhen, Yuncong Yang, Joyce Chai, Chuang Gan

Blog Post

Blog /

TL;DR...

Large-Chunk TTT updates remain fully plastic, as the fast weights in each chunk drift freely in parameter space at inference time. We reframe TTT as continual learning.
LaCET introduces an additional Fisher-weighted elastic consolidation term, so fast weights can keep adapting across chunks without unconstrained drift.

Next-Embedding Prediction Makes Strong Vision Learners

Sihan Xu, Ziqiao Ma, Wenhao Chai, Xuweiyi Chen, Weiyang Jin, Joyce Chai, Saining Xie, Stella X. Yu

Preprint 2025

Paper / Homepage / Blog / GitHub / Weights

TL;DR...

NEPA (Next-Embedding Predictive Autoregression) relies solely on a next-embedding prediction loss to learn broad, generalizable models for diverse downstream vision problems.
NEPA requires no offline encoders and let autoregression operates on the embeddings from the encoder directly.
We train modern Vision Transformers with NEPA and achieve competitive performance after supervised fine-tuning.

DeliveryBench: Can Agents Earn Profit in Real World?

Lingjun Mao, Jiawei Ren, Kun Zhou, Jixuan Chen, Ziqiao Ma, Lianhui Qin

Preprint

Paper / Homepage / GitHub

TL;DR...

DeliveryBench is a benchmark to test agents' native workflows (strategies) under (1) a set of constraints and (2) explicit expense/income functions (i.e., profit optimization over long horizons).

ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation

Yongyuan Liang, Wei Chow, Feng Li, Ziqiao Ma, Xiyao Wang, Jiageng Mao, Jiuhai Chen, Jiatao Gu, Yue Wang§, Furong Huang§

ICLR 2026

Paper / Homepage / GitHub / Dataset

TL;DR...

ROVER, the first benchmark targeting reciprocal cross-modal reasoning where one modality guides, verifies, or refines outputs in another;
Cross-modal reasoning strongly correlates with visual generation performance, while current models show limited visually-augmented reasoning capabilities.

The Mechanistic Emergence of Symbol Grounding in Language Models

Ziqiao Ma, Shuyu Wu, Xiaoxi Luo*, Yidong Huang, Josue Torres-Fonseca, Freda Shi§, Joyce Chai§

Workshop on Interpreting Cognition in Deep Learning Models @ NeurIPS 2025

Paper / Blog / GitHub

TL;DR...

Grounding concentrates in middle-layer computations and is implemented through the aggregate mechanism, where attention heads aggregate the environmental ground to support the prediction of linguistic forms;
This phenomenon replicates across data (text-only and text-image) and architectures (Transformers and SSMs), but not in unidirectional LSTMs.

4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time

Ziqiao Ma, Xuweiyi Chen, Shoubin Yu, Sai Bi, Kai Zhang, Chen Ziwen, Sihan Xu, Jianing Yang, Zexiang Xu, Kalyan Sunkavalli, Mohit Bansal, Joyce Chai, Hao Tan

NeurIPS 2025

Paper / Homepage / GitHub

TL;DR...

4D-LRM adopts a clean and minimal Transformer design to reconstruct dynamic objects from sparse, posed views across arbitrary times and viewpoints.
4D-LRM unifies space and time by predicting 4D Gaussian primitives directly from multi-view tokens.
4D-LRM scales effectively with data and model size with strong generalization and efficient inference.

SimWorld-Robotics: Synthesizing Photorealistic and Dynamic Urban Environments for Multimodal Robot Navigation and Collaboration

Yan Zhuang, Jiawei Ren, Xiaokang Ye, Jianzhi Shen, Ruixuan Zhang, Tianai Yue, Muhammad Faayez, Xuhong He, Xiyan Zhang, Ziqiao Ma, Lianhui Qin, Zhiting Hu, Tianmin Shu.

NeurIPS 2025

Paper / Homepage / GitHub

TL;DR...

Coming Soon

SimWorld: An Open-ended Simulator for Agents in Physical and Social Worlds

Xiaokang Ye, Jiawei Ren, Yan Zhuang, Xuhong He, Yiming Liang, Yiqing Yang, Mrinaal Dogra, Xianrui Zhong, Eric Liu, Kevin Benavente, Rajiv Mandya Nagaraju, Dhruv Vivek Sharma, Ziqiao Ma, Tianmin Shu, Zhiting Hu, Lianhui Qin.

NeurIPS 2025 (Spotlight)

Paper / Homepage / GitHub

TL;DR...

Coming Soon

Position: Towards Bidirectional Human-AI Alignment

Hua Shen, Tiffany Knearem, Reshmi Ghosh, Kenan Alkiek, Kundan Krishna, Yachuan Liu, Ziqiao Ma, Savvas Petridis, Yi-Hao Peng, Li Qiwei, Sushrita Rakshit, Chenglei Si, Yutong Xie, Jeffrey P. Bigham, Frank Bentley, Joyce Chai, Zachary Lipton, Qiaozhu Mei, Rada Mihalcea, Michael Terry, Diyi Yang, Meredith Ringel Morris, Paul Resnick, David Jurgens

NeurIPS 2025 (Position Paper Track)

Paper

TL;DR...

We present a conceptual framework of Bidirectional Human-AI Alignment to organize the literature from a human-centered perspective;
We survey studies of aligning AI to humans, that ensures AI produces the intended outcomes determined by humans;
We survey studies of aligning humans to AI, which aims to help individuals and society adjust to AI advancements both cognitively and behaviorally.

From Behavioral Performance to Internal Competence: Interpreting Vision-Language Models with VLM-Lens

Hala Sheta, Eric Huang, Shuyu Wu*, Ilia Alenabi, Jiajun Hong, Ryker Lin, Ruoxi Ning, Daniel Wei, Jialin Yang, Jiawei Zhou, Ziqiao Ma, Freda Shi

EMNLP 2025 (Demo Track)

Paper / Homepage / GitHub / Demo

TL;DR...

VLM-Lens is a toolkit designed to support the extraction of intermediate outputs from any layer during the forward pass of open-source VLMs.
VLM-Lens integrates various interpretability and analysis pipelines (probing, attention visualization, PCA, concept similarity, etc.) to facilitate the interpretation of VLMs.

Communication and Verification in LLM Agents towards Collaboration under Information Asymmetry

Run Peng, Ziqiao Ma, Amy Pang, Sikai Li, Zhang Xi-Jia, Yingzhuo Yu, Cristian-Paul Bara, Joyce Chai

Workshop on Multi-Agent Systems in the Era of Foundation Models: Opportunities, Challenges and Futures @ ICML 2025

Paper / GitHub

TL;DR...

We study LLM agents in task collaboration with information asymmetry, where agents have disparities in their knowledge and skills and need to work together to complete a shared task;
Agents w/o communication can achieve high performance but lower trust from human evaluators.

AimBot: A Simple Auxiliary Visual Cue to Enhance Spatial Awareness of Visuomotor Policies

Yinpei Dai, Jayjun Lee, Yichi Zhang, Ziqiao Ma, Jianing Yang, Amir Zadeh, Chuan Li, Nima Fazeli, Joyce Chai.

CoRL 2025 / The 1st Workshop on 3D-LLM/VLA: Bridging Language, Vision and Action in 3D Environments @ CVPR 2025

Paper / Homepage / GitHub

TL;DR...

Explicit spatial cues (e.g., shooting lines and scope reticles) serve as a 2.5D augmentation that improves any VLA models.
Such spatial visual guidance is interpretable and improves proprioceptive state encoding.

SimWorld: A World Simulator for Scaling Photorealistic Multi-Agent Interactions

Yan Zhuang, Jiawei Ren, Xiaokang Ye*, Xuhong He, Zijun Gao, Ryan Wu, Mrinaal Dogra, Cassie Zhang, Kai Kim, Bertt Wolfinger, Ziqiao Ma, Tianmin Shu, Zhiting Hu, Lianhui Qin.

CVPR 2025 (Demo Track)

Paper / Homepage / GitHub

TL;DR...

SimWorld is a simulator built on Unreal Engine 5, designed for developing and evaluating LLM/VLM agents in rich, real-world-like settings.
Realistic, open-ended world simulation, including accurate physical and social dynamics and language-driven procedural environment generation.
Rich interface for LLM/VLM agents, with multi-modal world inputs/feedback and open-vocabulary action outputs at varying levels of abstraction.

Can Vision Language Models Infer Human Gaze Direction? A Controlled Study

Zory Zhang, Pinyuan Feng, Bingyang Wang, Tianwei Zhao, Suyang Yu, Qingying Gao, Hokin Deng§, Ziqiao Ma§, Yijiang Li§, Dezhi Luo§

CCN 2025 (Extended Abstract)

Paper / Homepage / GitHub / Dataset

TL;DR...

94 of the 111 VLMs failed to do better than random guessing, while humans achieved near-perfect accuracy.
VLMs rely on head orientation rather than eye gaze direction when resolving the referent.
Scaling parameters won't solve this gap, we need better visual tokenization for higher resolution and finer-grained perception.

Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation

Qiyue Gao, Xinyu Pi, Kevin Liu, Junrong Chen, Ruolan Yang, Xinqi Huang, Xinyu Fang, Lu Sun, Gautham Kishore, Bo Ai, Stone Tao, Mengyang Liu, Jiaxi Yang, Chao-Jung Lai, Chuanyang Jin, Jiannan Xiang, Benhao Huang, David Danks, Hao Su, Tianmin Shu, Ziqiao Ma, Lianhui Qin, Zhiting Hu.

ACL 2025 (Findings) / The 1st Workshop on World Models: Understanding, Modelling and Scaling @ ICLR 2025

Paper / Homepage / Dataset

TL;DR...

WM-ABench is a large-scale benchmark comprising 23 fine-grained evaluation dimensions across 6 diverse simulated environments with controlled counterfactual simulations.
We introduce a two-stage framework that assesses perception (visual, spatial, temporal, quantitative, and motion) and prediction (mechanistic simulation, transitive inference, compositional inference) to provide an atomic evaluation of VLMs as WMs.
While VLMs excel in scenarios with pronounced differences, they struggle with 3D and dynamic perception, fail to differentiate subtle physical distinctions, and exhibit failures in understanding world transitions of transitive and compositional scenarios.

Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors

Jian Wang, Yinpei Dai, Yichi Zhang, Ziqiao Ma, Wenjie Li, Joyce Chai

ACL 2025 (Findings)

Paper / GitHub

TL;DR...

We introduce a novel agent workflow called Trace-and-Verify (TRAVER), which combines knowledge tracing to estimate a student's knowledge state and turn-by-turn verification to ensure effective guidance toward task completion.
We introduce the coding tutoring task as the testbed for tutoring LLM agents.
TRAVER effectively enables inference-time scaling for tutoring agents.

Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation

Ziqiao Ma, Jing Ding, Xuejun Zhang, Dezhi Luo, Jiahe Ding, Sihan Xu, Yuchen Huang, Run Peng, Joyce Chai

COLM 2025 / The 4th Workshop on Computer Vision in the Wild (CVinW) @ CVPR 2025 (Spotlight)

Paper / Homepage / GitHub / Dataset

TL;DR...

We introduce RefOI, a new dataset of 1.5k objects, each with 3 written and 2 spoken human-produced referring expressions.
We also release RefOI-TLHF, a large dataset of token-level human feedback for 10.6k referring expressions.
We identify three key failures of pragmatic competence in VLMs: (1) cannot uniquely refer to the referent, (2) include excessive or irrelevant information, and (3) misalign with human pragmatic preferences.

VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation

Shoubin Yu, Difan Liu, Ziqiao Ma*, Yicong Hong, Yang Zhou, Hao Tan, Joyce Chai, Mohit Bansal

ICCV 2025

Paper / Homepage / Dataset

TL;DR...

We introduce VEGGIE, a diffusion-loss only video generative model that handles various tasks for both video concept grounding and editing from user instructions.
Pixel-level grounded training helps various video concept editing task in multi-task learning.
VEGGIE shows emergent zero-shot multimodal instructional and in-context video editing.

Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities

Zheyuan Zhang, Fengyuan Hu, Jayjun Lee*, Freda Shi, Parisa Kordjamshidi, Joyce Chai, Ziqiao Ma§

ICLR 2025 (Oral) / The 1st Pluralistic Alignment Workshop @ NeurIPS 2024

Paper / Homepage / GitHub / Dataset / Poster

TL;DR...

We introduce COMFORT, a protocol to evaluate spatial reasoning in VLMs across multilingual and ambiguous frames of reference (FoR);
VLMs exhibit poor robustness and consistency, lack the flexibility to accommodate multiple FoRs, and fail to adhere to language-specific or culture-specific conventions in cross-lingual tests.

Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations

Ziqiao Ma, Zekun Wang, Joyce Chai

NAACL 2025 / The 1st Workshop on LLMs and Cognition (LLMCog) @ ICML 2024 (Oral)

Paper / GitHub / Poster

TL;DR...

We introduce a trial-and-demonstration (TnD) learning framework that incorporates three components: student trials, teacher demonstrations, and a reward conditioned on language competence at various developmental stages;
TnD accelerates word representation learning for student models of equal and smaller numbers of parameters, and both trials and demonstrations matter.
We further show that the teacher's choices of words influence students' word-specific learning efficiency, and a practice-makes-perfect effect is evident by a strong correlation between the frequency of words in trials and their respective learning curves.

A Benchmark of Expert-Level Academic Questions to Assess AI Capabilities (Humanity's Last Exam)

Community Contribution, lead by Center for AI Safety and Scale AI.

Nature, 2026

Paper / Homepage / Dataset

TL;DR...

A multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage;
State-of-the-art LLMs demonstrate low accuracy and calibration, gaps remain compared to the expert human frontier on closed-ended academic questions.

Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models

Yue Zhang, Ziqiao Ma, Jialu Li, Yanyuan Qiao, Zun Wang*, Joyce Chai, Qi Wu, Mohit Bansal, Parisa Kordjamshidi

TMLR 2024 (Survey Certificate)

Paper / GitHub

TL;DR...

We provide a top-down review on using foundation models to address VLN challenges;
The roles of foundation models are divided into world model, human model, and the VLN agent.

Multi-Object Hallucination in Vision-Language Models

Xuweiyi Chen, Ziqiao Ma, Xuejun Zhang*, Sihan Xu, Shengyi Qian, Jianing Yang, David Fouhey, Joyce Chai

NeurIPS 2024 / The 3rd Workshop on Advances in Language and Vision Research (ALVR) @ ACL 2024

Paper / Homepage / GitHub / Dataset / Poster

TL;DR...

We introduce ROPE, an evaluation protocol for hallucination across multiple objects using visual referring prompts;
VLMs hallucinate more with multiple objects, are influenced by object class distribution, and exhibit behavior driven by data-specific and intrinsic factors.

DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences

Yidong Huang, Jacob Sansom, Ziqiao Ma§, Felix Gervits, Joyce Chai

IROS 2024 / The 1st Vision and Language for Autonomous Driving and Robotics Workshop (VLADR) @ CVPR 2024

Paper / Hompage / Model / Poster

TL;DR...

We introduce DriVLMe, a video-language-model-based agent to facilitate natural and effective communication between humans and autonomous vehicles that perceive the environment and navigate;
We develop DriVLMe from both embodied experiences in a simulated environment and social experiences from real human dialogue.

GroundHog: Grounding Large Language Models to Holistic Segmentation

Yichi Zhang, Ziqiao Ma, Xiaofeng Gao, Suhaila Shakiah, Qiaozi Gao, Joyce Chai

CVPR 2024

Paper / Homepage / Model (Coming) / Dataset / Poster

TL;DR...

We introduce GroundHog, a multimodal large language model grounded in holistic segmentation, using a masked feature extractor and unified grounding masks for fine-grained visual understanding.
Trained on the curated M3G2 dataset, GroundHog outperforms in language grounding tasks, reduces object hallucination, and offers improved diagnosis for complex visual inputs.

Inversion-Free Image Editing with Language-Guided Diffusion Models

Sihan Xu, Yidong Huang, Jiayi Pan, Ziqiao Ma§, Joyce Chai

CVPR 2024

Paper / Homepage / GitHub / Live Demo / Poster

TL;DR...

We derive Denoising Diffusion Consistent Model (DDCM), showing that when the initial sample is known, a special variance schedule reduces the denoising step to the same form as the multi-step consistency sampling;
DDCM implies a inversion-free strategy without explicit inversion in sampling for image editing;
We further unify the attention control mechanisms in an inference time algorithm for text-guided editing, taking less than 3 seconds per edit.

CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation

Sihan Xu, Ziqiao Ma, Yidong Huang, Honglak Lee, Joyce Chai

NeurIPS 2023

Paper / Homepage / GitHub / Dataset / Poster

TL;DR...

We introduce CycleNet from theoretic derivations, a model that incorporates cycle consistency (and a self-supervision loss) into diffusion model to regularize image manipulation;
CycleNet is robust even with very limited training data (around 2k) and requires minimal computational resources (1 GPU) to train.

Towards A Holistic Landscape of Situated Theory of Mind in Large Language Models

Ziqiao Ma, Jacob Sansom, Run Peng, Joyce Chai

EMNLP 2023 (Findings)

Paper / GitHub / Dataset / Poster

TL;DR...

We taxonomize machine ToM into 7 mental state categories and delineate existing benchmarks to identify under-explored aspects of ToM;
Pilot studies for a holistic and situated evaluation of ToM to break ToM into individual components and treat LLMs as an agent who is physically situated in environments and socially situated in interactions with humans.

Towards Collaborative Plan Acquisition through Theory of Mind Modeling in Situated Dialogue

Cristian-Paul Bara, Ziqiao Ma, Yingzhuo Yu, Julie Shah, Joyce Chai

IJCAI 2023

Paper / GitHub / Dataset / Poster

TL;DR...

We study collaborative plan acquisition in human-AI tasks, where agents predict missing task knowledge for themselves and their partners using perceptual and dialogue history;
We show that predicting a partner's missing knowledge, coupled with explicit modeling of dialogue moves and mental states, leads to better collaboration.

World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models

Ziqiao Ma, Jiayi Pan, Joyce Chai

ACL 2023 (🏆 Outstanding Paper Award)

Paper / GitHub / Model / Dataset / Poster

TL;DR...

We introduce OctoBERT, a visually grounded language model designed to acquire grounding ability during pre-training and enable fast mapping of new words through few-shot learning without explicit grounding supervision;
Visual grounding accelerates grounded word representation learning;
Imageability aligns positively with human intuition and prediction metrics, while concreteness shows opposite correlations -> need for language learning agents to acquire word meanings through physical interactions!

NLP Reproducibility For All: Understanding Experiences of Beginners

Shane Storks, Keunwoo Peter Yu, Ziqiao Ma, Joyce Chai

ACL 2023 (Theme Track)

Paper / GitHub / Poster

TL;DR...

We studied 93 NLP students replicating recent NLP papers;
Programming skills and paper comprehension had limited impact on effort, while accessible documentation, coding practices, and data availability were critical.

SEAGULL: An Embodied Agent for Instruction Following through Situated Dialog

Yichi Zhang, Jianing Yang, Keunwoo Peter Yu, Yinpei Dai, Shane Storks Yuwei Bao, Jiayi Pan, Nikhil Devraj, Ziqiao Ma, Joyce Chai

Alexa Prize SimBot Challenge 2023 (🏆 1st Place)

Paper

TL;DR...

SEAGULL is the winning solution to the Alexa Prize SimBot Challenge;
SEAGULL features a modular system combining neural and symbolic components for language understanding, vision, state tracking, and policy execution.

DOROTHIE: Spoken Dialogue for Handling Unexpected Situations in Interactive Autonomous Driving Agents

Ziqiao Ma, Benjamin VanDerPloeg, Cristian-Paul Bara, Huang Yidong*, Eui-In Kim, Felix Gervits, Matthew Marge, Joyce Chai

EMNLP 2022 (Findings)

Paper / GitHub / Dataset / Poster

TL;DR...

We present DOROTHIE, a simulation platform for studying sensorimotor-grounded dialogue in dynamic autonomous driving scenarios;
We create the Situated Dialogue Navigation (SDN), a continuous outdoor VLN benchmark with real human dialogue.

DANLI: Deliberative Agent for Following Natural Language Instructions

Yichi Zhang, Jianing Yang, Shane Storks, Jiayi Pan, Nikhil Devraj, Ziqiao Ma, Keunwoo Peter Yu, Yuwei Bao, Joyce Chai

EMNLP 2022 (Oral)

Paper / GitHub / Slides

TL;DR...

We propose DANLI, a neuro-symbolic deliberative agent for embodied AI that proactively reasons and plans using neural and symbolic representations;
DANLI outperforms reactive baselines by over 70% on the TEACh benchmark.

Partition-Based Active Learning for Graph Neural Networks

Jiaqi Ma, Ziqiao Ma, Joyce Chai, Qiaozhu Mei

TMLR 2023 / The 8th International Workshop on Deep Learning on Graphs (DLG) @ KDD 2022 (Oral)

Paper / GitHub / Poster

TL;DR...

We introduce GraphPart, a partition-based active learning method for GNN that selects representative nodes from graph partitions for querying;
GraphPart is motivated by classification error analysis under smoothness assumptions;
GraphPart outperforms existing active learning methods across benchmarks and budget constraints and reduces the accuracy disparity compared to random training node selection across most datasets.

If you like my figures here, I highly recommend you also visit SiX's homepage.

Misc

Fun Facts

I was given my Chinese name, 马子乔, through a visually symbolic process, breaking down the radicals of 骄子, which roughly translates to 'the gifted child' in English. This is one of the reasons why logographic languages are so beautiful.
I was born and raised up in Chengdu, the home of pandas. I am proud of Chengdu Foreign Languages School, my high school, and identify myself as a CFLSer. 成外人永远不会成外人.
I am INFJ according to the Myers-Briggs, and my friends said that I exhibit stereotypical traits of this personality type...lol
I love literature and plays. I am particularly interested in Shakespeare's plays, traditional Chinese opera, Latin American literature, and modern Asian literature.
I love movies, I am obsessed with the Czechoslovak New Wave and psychological thrillers these days.
I am a super fan of Nightwish. My favourite album is Imaginaerum.
Lately, I've become deeply fascinated with herbariums. I just started to keep a personal herbarium journal!

Game Design (More)

I seriously considered a career in game design when I just started college, and although I ultimately chose a different path, it provided excellent preparation for my work in embodied AI research, which often involves intensive programming with simulators.

a list of student-made games I enjoyed: Mogu, Turbo Neon.
a list of video games I enjoyed: Sandbox (MineCraft, Terraria), Story-based RPG & Interactive Stories (Stardew Valley, Undertale, The Stanley Parable, This War of Mine, To the Moon, Season: A Letter to the Future), Roguelike (Soul Knight, Risk of Rain), Puzzle & Puzzle-based Platformer (Gorogoa, Limbo, Inside, Chants of Sennaar), Battle Royales (Naraka Bladepoint, Fall Guys).

Here are some of the projects we worked on:

Contracts

Zekai Fan, Shiyu Qu, Juan Rivera Plata, Yihao Huang, Ziqiao Martin Ma

[trailer][itch.io][indidb][tigsource]

A turn-based tactic video game.

Floating

Ziqiao Martin Ma

[play online][download]

A physics-based puzzle video game.

Mentoring

I understand that access to research oppotunities can be hard, particularly for beginners and the underrepresented. If there is a match in research interests, I am happy to collaborate with undergrads and masters when I have the bandwidth. Please find more details here.

I've been fortunate to have (co-)mentered and collaborated with these amazingly talented young researchers:

Zheyuan Zhang (2023-2025), Now Ph.D. @ JHU.
Xuejun Zhang (2023-2025), Now Ph.D. @ UIUC.
Zekun Wang (2022-2024), Now Ph.D. @ Georgia Tech.
Xijia Zhang (2022-2024), Now Ph.D. @ Georgia Tech.
Jacob Sansom (2022-2024), Now Ph.D. @ UMich.
Run Peng (2022-2024), Now Ph.D. @ UMich.
Yidong Huang (2021-2025), Now Ph.D. @ UNC.
Jiayi Pan (2021-2023), Now Ph.D. @ UCB.

Random Tours

Take a random virtual stroll over to one of my friends' homepage! It's like a digital house call, minus the awkward small talk and the "sorry, my place is a mess" excuse!
When I was exhausted but couldn't take time off to travel, I'd go on virtual adventures instead: randomly searching for remote destinations and dropping pins on Google Maps. Here are a few spots that I swear I'll visit in person...someday, eventually!
Why I am still staying alive?

Chat?

If you would like to have a random (virtual) coffee chat with me, please visit my calendly page. I am happy to talk if you want to share your stress or just want to chat about life in general (when I have time), but be sure to check out the On-Campus Mental Health Resources @ Michigan.

Get In Touch

Drop me a message here :)

marstin0607 (work only)
ziqiao_ma
Office
Bob and Betty Beyster Building 4909,
2260 Hayward Street,
Ann Arbor, MI 48109.

Bio / Profile Photo

My Research (TL;DR)

Updates

⚠ ⚠ ⚠

Blogs

News

Seminar and Technical Talks

Experiences

Industry

Selected Fellowships/Scholarships

Selected Research/Academic Awards

Selected Service Recognitions

Current Teaching

Tutorials and Guest Lectures

The Science of Benchmarking @ NeurIPS 2025

Learning Language through Grounding @ NAACL 2025

Academic Services

Workshop on Embodied Reasoning in Action (ERA @ CVPR 2026)

Workshop on Computational Developmental Linguistics (CDL @ ACL 2026)

Workshop on Space in Vision, Language, and Embodied AI (SpaVLE @ NeurIPS 2025)

Workshop on Bridging Language, Agent, and World Models for Reasoning and Planning (LAW @ NeurIPS 2025)

The 1st Workshop / Special Interest Group on Bidirectional Human-AI Alignment (Bi-Align @ ICLR 2025 / CHI 2025)

The 4th International Combined Workshop on Spatial Language Understanding and Grounded Communication for Robotics (SpLU-RoboNLP @ ACL 2024)

The 5th Michigan AI Symposium: AI & Accessibility (2022)

Publications [.bib]

Coming Soon

Coming Soon

Fast Spatial Memory with Scalable Elastic Test-Time Training

Ziqiao Ma, Xueyang Yu*, Haoyu Zhen, Yuncong Yang, Joyce Chai, Chuang Gan

Next-Embedding Prediction Makes Strong Vision Learners

Sihan Xu, Ziqiao Ma, Wenhao Chai, Xuweiyi Chen, Weiyang Jin, Joyce Chai, Saining Xie, Stella X. Yu

DeliveryBench: Can Agents Earn Profit in Real World?

Lingjun Mao, Jiawei Ren, Kun Zhou, Jixuan Chen, Ziqiao Ma, Lianhui Qin

ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation

Yongyuan Liang*, Wei Chow*, Feng Li, Ziqiao Ma, Xiyao Wang, Jiageng Mao, Jiuhai Chen, Jiatao Gu, Yue Wang§, Furong Huang§

The Mechanistic Emergence of Symbol Grounding in Language Models

Ziqiao Ma*, Shuyu Wu*, Xiaoxi Luo*, Yidong Huang, Josue Torres-Fonseca, Freda Shi§, Joyce Chai§

4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time

Ziqiao Ma, Xuweiyi Chen, Shoubin Yu, Sai Bi, Kai Zhang, Chen Ziwen, Sihan Xu, Jianing Yang, Zexiang Xu, Kalyan Sunkavalli, Mohit Bansal, Joyce Chai, Hao Tan

SimWorld-Robotics: Synthesizing Photorealistic and Dynamic Urban Environments for Multimodal Robot Navigation and Collaboration

Yan Zhuang, Jiawei Ren, Xiaokang Ye, Jianzhi Shen, Ruixuan Zhang, Tianai Yue, Muhammad Faayez, Xuhong He, Xiyan Zhang, Ziqiao Ma, Lianhui Qin, Zhiting Hu, Tianmin Shu.

SimWorld: An Open-ended Simulator for Agents in Physical and Social Worlds

Xiaokang Ye*, Jiawei Ren*, Yan Zhuang, Xuhong He, Yiming Liang, Yiqing Yang, Mrinaal Dogra, Xianrui Zhong, Eric Liu, Kevin Benavente, Rajiv Mandya Nagaraju, Dhruv Vivek Sharma, Ziqiao Ma, Tianmin Shu, Zhiting Hu, Lianhui Qin.

Position: Towards Bidirectional Human-AI Alignment

From Behavioral Performance to Internal Competence: Interpreting Vision-Language Models with VLM-Lens

Hala Sheta*, Eric Huang*, Shuyu Wu*, Ilia Alenabi, Jiajun Hong, Ryker Lin, Ruoxi Ning, Daniel Wei, Jialin Yang, Jiawei Zhou, Ziqiao Ma, Freda Shi

Communication and Verification in LLM Agents towards Collaboration under Information Asymmetry

Run Peng*, Ziqiao Ma*, Amy Pang, Sikai Li, Zhang Xi-Jia, Yingzhuo Yu, Cristian-Paul Bara, Joyce Chai

AimBot: A Simple Auxiliary Visual Cue to Enhance Spatial Awareness of Visuomotor Policies

Yinpei Dai*, Jayjun Lee*, Yichi Zhang, Ziqiao Ma, Jianing Yang, Amir Zadeh, Chuan Li, Nima Fazeli, Joyce Chai.

SimWorld: A World Simulator for Scaling Photorealistic Multi-Agent Interactions

Yan Zhuang*, Jiawei Ren*, Xiaokang Ye*, Xuhong He, Zijun Gao, Ryan Wu, Mrinaal Dogra, Cassie Zhang, Kai Kim, Bertt Wolfinger, Ziqiao Ma, Tianmin Shu, Zhiting Hu, Lianhui Qin.

Can Vision Language Models Infer Human Gaze Direction? A Controlled Study

Zory Zhang*, Pinyuan Feng*, Bingyang Wang, Tianwei Zhao, Suyang Yu, Qingying Gao, Hokin Deng§, Ziqiao Ma§, Yijiang Li§, Dezhi Luo§

Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation

Qiyue Gao*, Xinyu Pi*, Kevin Liu, Junrong Chen, Ruolan Yang, Xinqi Huang, Xinyu Fang, Lu Sun, Gautham Kishore, Bo Ai, Stone Tao, Mengyang Liu, Jiaxi Yang, Chao-Jung Lai, Chuanyang Jin, Jiannan Xiang, Benhao Huang, David Danks, Hao Su, Tianmin Shu, Ziqiao Ma, Lianhui Qin, Zhiting Hu.

Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors

Jian Wang, Yinpei Dai, Yichi Zhang, Ziqiao Ma, Wenjie Li, Joyce Chai

Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation

Ziqiao Ma*, Jing Ding*, Xuejun Zhang, Dezhi Luo, Jiahe Ding, Sihan Xu, Yuchen Huang, Run Peng, Joyce Chai

VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation

Shoubin Yu*, Difan Liu*, Ziqiao Ma*, Yicong Hong, Yang Zhou, Hao Tan, Joyce Chai, Mohit Bansal

Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities

Zheyuan Zhang*, Fengyuan Hu*, Jayjun Lee*, Freda Shi, Parisa Kordjamshidi, Joyce Chai, Ziqiao Ma§

Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations

Ziqiao Ma*, Zekun Wang*, Joyce Chai

A Benchmark of Expert-Level Academic Questions to Assess AI Capabilities (Humanity's Last Exam)

Community Contribution, lead by Center for AI Safety and Scale AI.

Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models

Yue Zhang*, Ziqiao Ma*, Jialu Li*, Yanyuan Qiao*, Zun Wang*, Joyce Chai, Qi Wu, Mohit Bansal, Parisa Kordjamshidi

Multi-Object Hallucination in Vision-Language Models

Xuweiyi Chen*, Ziqiao Ma*, Xuejun Zhang*, Sihan Xu, Shengyi Qian, Jianing Yang, David Fouhey, Joyce Chai

DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences

Yidong Huang, Jacob Sansom, Ziqiao Ma§, Felix Gervits, Joyce Chai

GroundHog: Grounding Large Language Models to Holistic Segmentation

Yichi Zhang, Ziqiao Ma, Xiaofeng Gao, Suhaila Shakiah, Qiaozi Gao, Joyce Chai

Inversion-Free Image Editing with Language-Guided Diffusion Models

Sihan Xu*, Yidong Huang*, Jiayi Pan, Ziqiao Ma§, Joyce Chai

CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation

Sihan Xu*, Ziqiao Ma*, Yidong Huang, Honglak Lee, Joyce Chai

Yongyuan Liang, Wei Chow, Feng Li, Ziqiao Ma, Xiyao Wang, Jiageng Mao, Jiuhai Chen, Jiatao Gu, Yue Wang§, Furong Huang§

Ziqiao Ma, Shuyu Wu, Xiaoxi Luo*, Yidong Huang, Josue Torres-Fonseca, Freda Shi§, Joyce Chai§

Xiaokang Ye, Jiawei Ren, Yan Zhuang, Xuhong He, Yiming Liang, Yiqing Yang, Mrinaal Dogra, Xianrui Zhong, Eric Liu, Kevin Benavente, Rajiv Mandya Nagaraju, Dhruv Vivek Sharma, Ziqiao Ma, Tianmin Shu, Zhiting Hu, Lianhui Qin.

Hala Sheta, Eric Huang, Shuyu Wu*, Ilia Alenabi, Jiajun Hong, Ryker Lin, Ruoxi Ning, Daniel Wei, Jialin Yang, Jiawei Zhou, Ziqiao Ma, Freda Shi

Run Peng, Ziqiao Ma, Amy Pang, Sikai Li, Zhang Xi-Jia, Yingzhuo Yu, Cristian-Paul Bara, Joyce Chai

Yinpei Dai, Jayjun Lee, Yichi Zhang, Ziqiao Ma, Jianing Yang, Amir Zadeh, Chuan Li, Nima Fazeli, Joyce Chai.

Yan Zhuang, Jiawei Ren, Xiaokang Ye*, Xuhong He, Zijun Gao, Ryan Wu, Mrinaal Dogra, Cassie Zhang, Kai Kim, Bertt Wolfinger, Ziqiao Ma, Tianmin Shu, Zhiting Hu, Lianhui Qin.

Zory Zhang, Pinyuan Feng, Bingyang Wang, Tianwei Zhao, Suyang Yu, Qingying Gao, Hokin Deng§, Ziqiao Ma§, Yijiang Li§, Dezhi Luo§

Qiyue Gao, Xinyu Pi, Kevin Liu, Junrong Chen, Ruolan Yang, Xinqi Huang, Xinyu Fang, Lu Sun, Gautham Kishore, Bo Ai, Stone Tao, Mengyang Liu, Jiaxi Yang, Chao-Jung Lai, Chuanyang Jin, Jiannan Xiang, Benhao Huang, David Danks, Hao Su, Tianmin Shu, Ziqiao Ma, Lianhui Qin, Zhiting Hu.

Ziqiao Ma, Jing Ding, Xuejun Zhang, Dezhi Luo, Jiahe Ding, Sihan Xu, Yuchen Huang, Run Peng, Joyce Chai

Shoubin Yu, Difan Liu, Ziqiao Ma*, Yicong Hong, Yang Zhou, Hao Tan, Joyce Chai, Mohit Bansal

Zheyuan Zhang, Fengyuan Hu, Jayjun Lee*, Freda Shi, Parisa Kordjamshidi, Joyce Chai, Ziqiao Ma§

Ziqiao Ma, Zekun Wang, Joyce Chai

Yue Zhang, Ziqiao Ma, Jialu Li, Yanyuan Qiao, Zun Wang*, Joyce Chai, Qi Wu, Mohit Bansal, Parisa Kordjamshidi

Xuweiyi Chen, Ziqiao Ma, Xuejun Zhang*, Sihan Xu, Shengyi Qian, Jianing Yang, David Fouhey, Joyce Chai

Sihan Xu, Yidong Huang, Jiayi Pan, Ziqiao Ma§, Joyce Chai

Sihan Xu, Ziqiao Ma, Yidong Huang, Honglak Lee, Joyce Chai

Ziqiao Ma, Jiayi Pan, Joyce Chai

Ziqiao Ma, Benjamin VanDerPloeg, Cristian-Paul Bara, Huang Yidong*, Eui-In Kim, Felix Gervits, Matthew Marge, Joyce Chai

Jiaqi Ma, Ziqiao Ma, Joyce Chai, Qiaozhu Mei