Research Statement (Quite long, please read...?)

As a muggle (?), my ultimate goal is to enable Mechanistic Alignment & Grounding for Interactive Cognition (aka MAGIC). The three constant themes of my research are Language, Interaction, and Embodiment from a scalable and cognitive angle. I will break it down and elaborate:

Language Grounding and Alignment. We develop our language systems from natural supervision. Our language develops through sensorimotor and sociolinguistic experiences in the physical world (semantic/static grounding) and through interactions with others (communicative/dynamic grounding). We acquire lexical semantics and syntactic structures via this grounded language learning, and we apply our language pragmatically in everyday communication. To me, grounding is about mapping a language system to something external—whether it be another language, perception, or shared beliefs. I (mostly) agree with Freda Shi's view on the definition of grounding, and to me, alignment is closely related to grounding but slightly different from it. To me, alignment is two-fold: in-vocabulary alignment (intent/value/preference/safety alignment) and out-of-vocabulary alignment (aka expanding the action space into multimodal/multilingual/code tokens). I (together with my colleagues) have a tutorial on grounding and wrote something on alignment. I will find some time to put down my thoughts on the difference between alignment and grounding (TODO list +1), but in short I think grounding is a property of our language representations but alignment includes a bit more on how we generate given representations.

Grounding and alignment: connecting language to everything non-linguistics.
  • Grounding language to the physical world: Understanding and generating language that is grounded to sensorimotor experiences and physical situations. There are more to look at beyond 2D grounding, e.g., video, 3D, generative world models.
  • Grounding language to human interactions: (Co-)situated Human-AI interaction in shared environment with disparate mental states, and collaborations towards a common ground.
  • Alignment in post-training and at inference time: Human-like planning and reasoning that is deliberate, (inter)active, lifelong, and steerable upon pre-trained systems.
  • Applications of frontier models in situated/embodied agents as well as content generation.


Mechanistic (Mis)alignment. In my view, the goal of cognitive science is to understand the underlying mechanisms that give rise to intelligence. I regard humans and machines as fundamentally distinct intelligent systems, and I believe there will come a point where human-like learning will no longer offer meaningful insights for superhuman AI models. My ultimate research question centers on what I refer to as "mechanistic (mis)alignment": investigating which factors drive shared cognitive behaviors between humans and machines, and which mechanistic differences account for their divergent cognitive behaviors. I always remind myself to be epistemologically rigid and avoid anthropomorphizing AI models as well as overclaiming. I (mostly) agree with Alex Warstadt and Sam Bowman's view on What Artificial Neural Networks Can Tell Us About Human Language Acquisition.

Scalable (data-driven but sufficiently efficient learning of) representations as computational abstractions of cognition.
  • Scaling law and developmental psychology: Exploring the developmental trajectories of data-driven models and the emergent cognitive capabilities over the course of development.
  • Efficient learning with minimal supervision: Learning that is data-efficient, over multiple modalities, on (semi-)structured data.
  • Cross-cultural and cross-lingual conventions of cognition. (Languages are dying! Under-represented languages are dear to my heart but I plan (try hard) not to do (too much) research on this topic before I finish my PhD lol)
  • The Connections

    This is how I perceive the connections between the pieces.

    Vision and Physical Embodiment

    Show/Hide Work on Semantic Grounding

    Interaction with Humans and Other Agents

    Show/Hide Work on Communicative Grounding

    Acknowledgement: Thanks to Jiayuan Mao for this amazing template!

    Get In Touch

    You are welcome to drop me a message :)

    • Phone

      xxx-xxx-xxxx
    • marstin0607
    • ziqiao_ma
    • Address

      Bob and Betty Beyster Building 4909,
      2260 Hayward Street,
      Ann Arbor, MI 48109.