Yao Wan 萬瑤

Assoc. Prof.@HUST

1037 Luoyu Road, Wuhan, Hubei, China

wanyao@hust.edu.cn

News

June, 2025 One paper has been accepted by ICSE 2026. Congratulations to Zhaoyang!
May, 2025 Four papers have been accepted by ACL 2025. Congratulations to Geliang, Xiuxuan, Yang and Mingmeng!
May, 2025 Two papers have been accepted by KDD 2025. Congratulations to Yi, and Shu!
May, 2025 Three papers have been accepted by ICML 2025. Congratulations to Gen, Chenlong and Hailong!
Jan., 2025 Three papers have been accepted by ICLR 2025. Congratulations to Dongping and Siyuan!
Jan., 2025 Two papers have been accepted by WWW 2025. Congratulations to Yi!
May., 2024 Two papers have been accepted by ACL 2024 (Findings). Congratulations to Zhangqian and Yihe!
May., 2024 One paper has been accepted by ICML 2024. Congratulations to Dongping!
Mar., 2024 Three papers have been accepted by NAACL 2024. Congratulations to Wenting and Dongping!
Mar., 2024 Our paper titled "Graph Neural Networks for Vulnerability Detection - A Counterfactual Explanation" has been accepted by ISSTA 2024. Congratulations to Zhaoyang!
Jan., 2024 Our paper titled "Automated Data Visualization from Natural Language via Large Language Models: An Exploratory Study" has been accepted by SIGMOD 2024. Congratulations to Yang!
Jan., 2024 Our paper titled "IRCoCo: Immediate Rewards-Guided Deep Reinforcement Learning for Code Completion" has been accepted by FSE 2024. Congratulations to Bolun!
Jan., 2024 Our paper titled "NL2Formula: Generating Spreadsheet Formulas from Natural Language Queries" has been accepted by EACL 2024. Congratulations to Wei!
June 14, 2022 Our paper titled "You See What I Want You to See: Poisoning Vulnerabilities in Neural Code Search" has been accepted by ESEC/FSE 2022. Congratulations to Shijie!
Feb 24, 2022 Our two papers on source code summarization and generation have been accepted by ACL 2022 . Congratulations to Juncai and Xin!
Jan 25, 2022 Our paper titled "NaturalCC: An Open-Source Toolkit for Code Intelligence" has been accepted by ICSE 2022 Demo Track.
Nov 28, 2020 Our toolkit NaturalCC has been released in GitHub, which can be accessed via the Homepage.

About Me

I am currently an Associate Professor with the College of Computer Science and Technology at Huazhong University of Science and Technology (HUST), Wuhan, China. Prior to that, I got my Ph.D degree from Zhejiang University in 2019, under the supervision of Prof. Jian Wu and Prof. Zhou Zhao. I have been visiting Shenzhen Research Institute, Chinese University of Hong Kong, China (working with Prof. Zibin Zheng) in 2014, University of Technology Sydney, Australia (working with Prof. Guandong Xu) in 2016, and University of Illinois Chicago, USA (working with Prof. Philip S. Yu) in 2018. At HUST, I lead the ONE Lab, dedicated to empowering machines to interact with the physical world through a unified natural language interface—Language + X, where X can be code, vision, tables, etc.

(I am looking for highly-motivated under-graduate students with a strong passion to work with me. If interested, please drop me a message by email.)

Research Highlights

NaturalCC is an advanced sequence modeling toolkit designed to empower researchers and developers in training custom models for a myriad of software engineering tasks, including but are not limited to code summarization, code generation, code search, and type inference. Our vision is to seamlessly connect the realms of programming language and natural language, leveraging cutting-edge machine learning techniques. arXiv Code Homepage

Selected Publications (Full List)

Language + Code

Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning
Zhaoyang Chu, Yao Wan*, Zhikun Zhang, Di Wang, Zhou Yang, Hongyu Zhang, Pan Zhou, Xuanhua Shi, Hai Jin, David Lo
ICSE 2026. The 48th IEEE/ACM International Conference on Software Engineering
PDF CCF-A

Dataflow-Guided Neuro-Symbolic Language Models for Type Inference
Gen Li, Yao Wan*, Hongyu Zhang, Zhou Zhao, Wenbin Jiang, Xuanhua Shi, Hai Jin, Zheng Wang
ICML 2025. The Forty-second International Conference on Machine Learning
PDF CCF-A

CodeSync: Synchronizing Large Language Models with Dynamic Code Evolution at Scale
Chenlong Wang, Zhaoyang Chu, Zhengxiang Cheng, Xuyi Yang, Kaiyue Qiu, Yao Wan*, Zhou Zhao, Xuanhua Shi, Hai Jin, Dongping Chen
ICML 2025. The Forty-second International Conference on Machine Learning
PDF CCF-A

Can Large Language Models Understand Intermediate Representations?
Hailong Jiang, Jianfeng Zhu, Yao Wan*, Bo Fang, Hongyu Zhang, Ruoming Jin, Qiang Guan
ICML 2025. The Forty-second International Conference on Machine Learning
PDF CCF-A

How to Select Pre-Trained Code Models for Reuse? A Learning Perspective
Zhangqian Bi, Yao Wan*, Zhaoyang Chu, Yufei Hu, Junyi Zhang, Hongyu Zhang, Guandong Xu and Hai Jin
SANER 2025. 2025 IEEE International Conference on Software Analysis, Evolution and Reengineering
PDF CCF-B IEEE TCSE Distinguished Paper Award

Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit
Yao Wan, Yang He, Zhangqian Bi, Jianguo Zhang, Hongyu Zhang, Yulei Sui, Guandong Xu, Hai Jin, Philip Yu
ACM Computing Survey 2024.
PDF arXiv

You See What I Want You to See: Poisoning Vulnerabilities in Neural Code Search
Yao Wan, Shijie Zhang, Hongyu Zhang, Yulei Sui, Guandong Xu, Dezhong Yao, Hai Jin, and Lichao Sun
ESEC/FSE 2022. The 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering.
PDF CCF-A

What Do They Capture? - A Structural Analysis of Pre-Trained Language Models for Source Code
Yao Wan, Wei Zhao, Hongyu Zhang, Yulei Sui, Guandong Xu and Hai Jin
ICSE 2022. The 44th ACM/IEEE International Conference on Software Engineering
PDF CCF-A

Multi-Modal Attention Network Learning for Semantic Source Code Retrieval
Yao Wan, Jingdong Shu, Yulei Sui, Guandong Xu, Zhou Zhao, Jian Wu, Philip S. Yu
ASE 2019. The 34th ACM/IEEE International Conference on Automated Software Engineering
PDF Code CCF-A

Improving Automatic Source Code Summarization via Deep Reinforcement Learning
Yao Wan, Zhou Zhao, Min Yang, Guandong Xu, Haochao Ying, Jian Wu, Philip S. Yu
ASE 2018. The 33rd ACM/IEEE International Conference on Automated Software Engineering
PDF Code CCF-A

Language + UI

LaTCoder: Converting Webpage Design to Code with Layout-as-Thought
Yi Gui, Zhen Li, Zhongyi Zhang, Guohao Wang, Tianpeng Lv, Gaoyang Jiang, Yi Liu, Dongping Chen, Yao Wan*, Hongyu Zhang, Wenbin Jiang, Xuanhua Shi, Hai Jin
KDD 2025. The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PDF CCF-A

GUI-World: A GUI-oriented Dataset for Multimodal LLM-based Agents
Dongping Chen, Yue Huang, Siyuan Wu, Jingyu Tang, Huichi Zhou, Qihui Zhang, Zhigang He, Yilin Bai, Chujie Gao, Liuyi Chen, Yiqiang Li, Chenlong Wang, Yue Yu, Tianshuo Zhou, Zhen Li, Yi Gui, Yao Wan*, Pan Zhou, Jianfeng Gao, Lichao Sun
ICLR 2025. The Thirteenth International Conference on Learning Representations
PDF Top AI Conference

UICopilot: Automating UI Synthesis via Hierarchical Code Generation from Webpage Designs
Yi Gui, Zhen Li, Zhongyi Zhang, Yao Wan*, Dongping Chen, Hongyu Zhang, Yi Su, Bohua Chen, Xing Zhou, Wenbin Jiang, Xiangliang Zhang
WWW 2025 (Oral). The Web Conference 2025
PDF CCF-A

WebCode2M: A Real-World Dataset for Code Generation from Webpage Designs
Yi Gui, Zhen Li, Yao Wan*, Yemin Shi, Hongyu Zhang, Yi Su, Bohua Chen, Dongping Chen, Siyuan Wu, Xing Zhou, Wenbin Jiang, Hai Jin, Xiangliang Zhang
WWW 2025 (Oral). The Web Conference 2025
PDF CCF-A

Language + Table

nvAgent: Automated Data Visualization from Natural Language via Collaborative Agent Workflow
Geliang Ouyang, Jingyao Chen, Zhihe Nie, Yi Gui, Yao Wan*, Hongyu Zhang, Dongping Chen
ACL 2025. The 62nd Annual Meeting of the Association for Computational Linguistics
PDF CCF-A

Sign2Vis: Automated Data Visualization from Sign Language
Yao Wan, Yang Wu, Zhen Li, Guobiao Zhang, Hongyu Zhang, Zhou Zhao, Hai Jin, April Wang
ACL 2025 (Findings). The 62nd Annual Meeting of the Association for Computational Linguistics
PDF CCF-A

Automated Data Visualization from Natural Language via Large Language Models: An Exploratory Study
Yang Wu#, Yao Wan#*, Hongyu Zhang, Yulei Sui, Wucai Wei, Wei Zhao, Guandong Xu, Hai Jin
SIGMOD 2024. ACM Special Interest Group on Management of Data
PDF CCF-A

Language + Vision

Judge Anything: MLLM as a Judge Across Any Modality
Shu Pu, Yaochen Wang, Dongping Chen, Yuhang Chen, Guohao Wang, Qi Qin, Zhongyi Zhang, Zhiyuan Zhang, Zetong Zhou, Shuang Gong, Yi Gui, Yao Wan*, Philip S. Yu
KDD 2025. The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PDF CCF-A

Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment
Dongping Chen, Ruoxi Chen, Shu Pu, Zhaoyi Liu, Yanru Wu, Caixi Chen, Benlin Liu, Yue Huang, Yao Wan, Pan Zhou, Ranjay Krishna
ICLR 2025 (Spotlight). The Thirteenth International Conference on Learning Representations
PDF Top AI Conference

MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark
Dongping Chen, Ruoxi Chen, Shilin Zhang, Yinuo Liu, Yaochen Wang, Huichi Zhou, Qihui Zhang, Yao Wan*, Pan Zhou, Lichao Sun
ICML 2024 (Oral). The Forty-first International Conference on Machine Learning
PDF arXiv CCF-A

Large Language Models

DataGen: Unified Synthetic Dataset Generation via Large Language Models
Yue Huang, Siyuan Wu, Chujie Gao, Dongping Chen, Qihui Zhang, Yao Wan*, Tianyi Zhou, Chaowei Xiao, Jianfeng Gao, Xiangliang Zhang, Lichao Sun
ICLR 2025. The Thirteenth International Conference on Learning Representations
PDF Top AI Conference

HonestLLM: Toward an Honest and Helpful Large Language Model
Chujie Gao, Siyuan Wu, Yue Huang, Dongping Chen, Qihui Zhang, Zhengyan Fu, Yao Wan*, Lichao Sun, Xiangliang Zhang
NeurIPS 2024. The 38th Annual Conference on Neural Information Processing Systems
PDF arXiv CCF-A