Main

Jiafei Duan
PhD Student

Contact me at

duanj1 [at] cs.washington.edu

CV/Resume

About Me

I teach robots to perceive, reason, and act. As we move robots out of industrial “cages” and into people’s homes and daily lives, we must build generalist robotic models that can understand the world, reason in a human-centric way, and carry out meaningful real-world tasks.

Jiafei Duan is a fourth & final year PhD student in Robotics and AI at the Paul G. Allen School of Computer Science & Engineering, University of Washington, co-advised by Dieter Fox and Ranjay Krishna. His research centers on robot learning, embodied AI, and building large-scale robotics foundation models. His work has received Best Paper, Spotlight, and Oral recognitions at venues including ICLR, UR, and RSS, and has been featured in MIT Technology Review, GeekWire, VentureBeat, and Business Wire.
Jiafei is also a Graduate Student Researcher at the Allen Institute for AI (AI2) and has previously worked as a Research Scientist Intern at NVIDIA. He earned his B.Eng. in Electrical and Electronic Engineering with Highest Distinction from Nanyang Technological University (NTU), Singapore.

* [Announcement]: I am seeking motivated undergraduate and master’s students for research opportunities in the upcoming academic year at UW or AI2. Sign up here

*I am actively seeking faculty or postdoctoral positions in robotics foundation models and robot learning.

Publications

“MolmoAct: Action Reasoning Models that can Reason in Space”

ArXiv 2025

Jason Lee*, Jiafei Duan*, Haoquan Fang*, Yuquan Deng, Shuo Liu, Boyang Li, Bohan Fang, Jieyu Zhang, Yi Ru Wang, Sangho Lee, Winson Han, Wilbert Pumacay, Angelica Wu, Rose Hendrix, Karen Farley, Eli VanderBilt, Ali Farhadi, Dieter Fox, Ranjay Krishna

|Blogpost|Paper|Code|Dataset|Media Article|Checkpoints|

“FailSafe: Reasoning and Recovery from Failures in
Vision-Language-Action Models”

ArXiv 2025

Zijun Lin, Jiafei Duan, Haoquan Fang, Dieter Fox, Ranjay Krishna, Cheston Tan, Bihan Wen

|Paper|Code|Project Page|

“RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation”

ArXiv 2025

Yi Ru Wang, Carter Ung, Grant Tannert, Jiafei Duan, Josephine Li, Amy Le, Rishabh Oswal, Markus Grotz, Wilbert Pumacay, Yuquan Deng, Ranjay Krishna, Dieter Fox, Siddhartha Srinivasa.

|Project Page|Paper|Code|

“Point Arena: Probing Multimodal Grounding Through Language-Guided Pointing”

ArXiv 2025

Long Cheng , Jiafei Duan , Yi Ru Wang , Haoquan Fang , Boyang Li, Yushan Huang, Elvis Wang, Ainaz Eftekhar, Jason Lee, Wentao Yuan, Rose Hendrix, Noah A. Smith, Fei Xia, Dieter Fox, Ranjay Krishna

|Project Page|Paper|Code|Dataset|

“The One RING: a Robotic Indoor Navigation Generalist”

ArXiv 2025

Ainaz Eftekhar, Rose Hendrix, Luca Weihs, Jiafei Duan, Ege Caglar, Jordi Salvador, Alvaro Herrasti, Winson Han, Eli VanderBilt, Aniruddha Kembhavi, Ali Farhadi, Ranjay Krishna, Kiana Ehsani, Kuo-Hao Zeng

|Project Page|Paper|Code|

“From Mystery to Mastery: Failure Diagnosis for Improving Manipulation Policies”

ArXiv 2025

Som Sager, Jiafei Duan, Sreevishakh Vasudevan, Yifan Zhou, Heni Ben Amor, Dieter Fox, Ransalu Senanayake

|Project Page|Paper|Code|

“GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation”

CoRL 2025

Abhay Deshpande, Yuquan Deng, Arijit Ray, Jordi Salvador,  Winson Han, Jiafei Duan, Kuo-Hao Zeng, Yuke Zhu, Ranjay Krishna, Rose Hendrix

|Project Page|Paper|Code|Dataset|

SAT: Spatial Aptitude Training for Multimodal Language Models

COLM 2025

Arijit Ray, Jiafei Duan, Reuben Tan, Dina Bashkirova, Ross Hendrix, Kiana Ehsani, Aniruddha Kembhavi, Bryan A. Plummer, Ranjay Krishna*, Kuo-Hao Zeng*, Kate Saenko*

|Project Page|Paper|Dataset|

“SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation”

ICML 2025

RememberRL workshop@CoRL 2025 Best Paper

Haoquan Fang, Markus Grotz, Wilbert Pumacay, Yi Ru Wang, Dieter Fox*, Ranjay Krishna*, Jiafei Duan*

|Project Page|Paper|X Post|Code|

AHA: A Vision-Language-Model for Detecting and Reasoning over Failures in Robotic Manipulation

ICLR 2025

Jiafei Duan, Wilbert Pumacay, Nishanth Kumar, Yi Ru Wang, Shulin Tian, Wentao Yuan, Ranjay Krishna, Dieter Fox, Ajay Mandlekar*, Yijie Guo*

|Project Page|Paper|X Post|Code|

“Manipulate-Anything: Automating Real-World Robots using Vision-Language Models”

CoRL 2024

Jiafei Duan*, Wentao Yuan*, Wilbert Pumacay, Yi Ru Wang, Kiana Ehsani, Dieter Fox, Ranjay Krishna

|Project page|Paper|Code|

“RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics”

CoRL 2024

Wentao Yuan, Jiafei Duan, Valts Blukis, Wilbert Pumacay, Ranjay Krishna, Adithyavairavan Murali, Arsalan Mousavian, Dieter Fox

|Project page|Paper|Demo|Checkpoint|Code|

EVE: Enabling Anyone to Train Robot using Augmented Reality”

UIST 2024

Jun Wang, Chun-Cheng Chang*, Jiafei Duan*, Dieter Fox, Ranjay Krishna

|Paper|Project|

“Octopi: Object Property Reasoning with Large Tactile-Language Models”

RSS 2024, Oral

Samson Yu, Kelvin Lin, Anxing Xiao, Jiafei Duan, Harold Soh

|Project Page|Code|Paper|

“THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation”

RSS 2024, Oral

Wibert Pumacay*, Ishika Singh*, Jiafei Duan*, Ranjay Krishna, Jesse Thomason, Dieter Fox

|Paper|Project Page|Code|Real-world setup|

“Selective Visual Representations Improve Convergence and Generalization for Embodied-AI”

ICLR 2024, Spotlight

Ainaz Eftekhar*, Kuo-Hao Zeng*, Jiafei Duan, Ali Farhadi, Ani Kembhavi , Ranjay Krishna

|Paper|Project Page|Code|

“NEWTON: Are Language models Capable of Physical Reasoning”

EMNLP 2023

Yi Ru Wang, Jiafei Duan, Dieter Fox, Siddhartha Srinivasa

|Paper|Project Page|Code|Dataset|

“AR2-D2:Training a Robot Without a Robot”

CoRL 2023

Jiafei Duan, Yi Ru Wang, Mohit Shridhar, Dieter Fox, Ranjay Krishna

|Paper|Project Page|My Talk|Dieter Fox’s Talk|Code|

“Good Time to Ask: A Learning Framework for Asking for Help in Embodied Visual Navigation”

Ubiquitous Robots 2023, Best Paper Award

Jenny Zhang, Samson Yu, Jiafei Duan, Cheston Tan

|Paper|Project Page|Code|

“A Benchmark for Modeling Violation-of-Expectation in Physical Reasoning Across Event Categories”

CogSci 2023

Arijit Dasgupta, Jiafei Duan, Marcelo Ang, Yi Lin, Su-Hua Wang, Renée Baillargeon, Cheston Tan

|Paper|Code|Dataset|

“A Survey on Machine Learning Approaches for Modelling Intuitive Physics ”

IJCAI 2022, Oral

Jiafei Duan*, Arijit Dasgupta*, Jason Fischer, Cheston Tan

|Paper|Project Page|Video|

“PIP: Physical Interaction Prediction via Mental Simulation with Span Selection”

ECCV 2022

Jiafei Duan*, Samson Yu*, Soujanya Poria, Bihan Wen, Cheston Tan

|Paper|Project Page|Code|

“A Survey of Embodied AI: From Simulators to Research Tasks.”

IEEE Transactions on Emerging Topics in Computational Intelligence

Jiafei Duan, Samson Yu, Tan Hui Li, Hongyuan Zhu, Cheston Tan

|Paper|CIS Journal Featured Publication|

“ActioNet: An Interactive End-to-End Platform for Tasked-Based Data Collection and Augmentation in 3D Environment.”

ICIP 2020

Jiafei Duan, Samson Yu, Tan Hui Li, Cheston Tan

|Paper|Code|Video|Project Page|

Invited Talks

  • Mila Robot Learning Seminar: Towards robotics foundation that can reason (REAL Lab)
  • UT Dallas: Towards robotics foundation that can reason (Yu Xiang)
  • UT Austin: Towards robotics foundation that can reason (Yuke Zhu)
  • Workshop on Generalizable Priors for Robot Manipulation @ CoRL 2025: Grounded Reasoning from Vision-Language Models for Robotics Manipulation (Keynote speaker)
  • META FAIR Robotics Group: Towards robotics foundation that can reason (Host: Homanga Bharadhwaj)
  • John Hopkins Univeristy: Towards robotics foundation that can reason (Host: Tianmin Shu & Peter Kazanzides (LCSR))
  • Georgia Tech: Towards robotics foundation that can reason (Host: GT Institute for Robotics and Intelligent Machines)
  • Boston University: Towards robotics foundation that can reason
  • TRI LBM Group: Towards robotics foundation that can reason (Host: Jose Barreiros)
  • Cohere Lab: Towards robotics foundation that can reason (Host: Surya guthikonda)
  • David Hsu NUS Group: Towards robotics foundation that can reason (Host: Yiqing Xu)
  • NTU EEE: Towards robotics foundation that can reason (Host: Wen Bihan)
  • Stanford PAIR Group: Towards a unified multimodal large language model for robotics (Host: Wenlong Huang)
  • Franka Robotics Headquarter: Towards a unified multimodal large language model for robotics (Host: Sven Parusel (VP of Franka))
  • CMU RCHI Group: Grounded Embodied Intelligence: Grounding Reasoning from Multimodal Language Models into Robotics Manipulation (Host: Zackory Erickson)
  • RoboPapers: SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation (Host: Michael Cho and Chris Paxton)
  • Amazon Lab126: Towards Democratizing Robot Learning for All (Host: Yuyin Sun)
  • The AI Talks: Towards Democratizing Robot Learning for All (Host: AI talks organizer)
  • Allen School Colloquium: Democratizing Robot Learning for All (Host: GRAIL Lab)
  • NUS CLeAR Lab: Benchmarking Robot Learning for Manipulation (Host: Harold Soh)
  • AAAI Summer Symposium: AR2-D2: Training a Robot without a Robot (Host: Workshop organizer)

Academic & Workshop Service

Reviewers for CVPR, ECCV, NeurIPS, ICLR, ICML, ICRA, ICCV, Cogsci, CVPR Workshop on 3D Vision and Robotics, RA-L, IEEE Transactions on Automation Science and Engineering, Pattern Recognition and IROS

Fun things I do besides robotics

Manipulation – I practice and perform magic professionally. [Performance]

Navigation – I love to travel and see the world. [Youtube Vlog]