About Me

I am currently a Lecturer (a.k.a Assistant Professor in USA) with the CGCL/SCTS/BDTS Lab, College of Compter Science and Technology at Huazhong University of Science and Technology (HUST), Wuhan, China. Prior to that, I got my Ph.D degree from Zhejiang University in Dec., 2019, under the supervision of Prof. Jian Wu and Prof. Zhou Zhao. I am also working closely with Prof. Yulei Sui from University of Technology Sydney. My research interests are mainly focusing on the synergy between Artificial Intelligence (AI)‎ and Software Engineering (a.k.a ASE), especially on natural language processing, programming language analysis, data mining and machine learning.

During my Ph.D life, I am fortunate to have the following three wonderful experiences hosted by three distinguished professors who have provided me much support, and I am also happy to meet with many wonderful friends and collaborators during these experiences. I have been visiting Shenzhen Research Institute, Chinese University of Hong Kong, China (working with Prof. Zibin Zheng) in 2014, University of Technology Sydney, Australia (working with Prof. Guandong Xu) in 2016, and University of Illinois at Chicago, USA (working with Prof. Philip S. Yu) in 2018.

(I am looking for highly-motivated graduate and under-graduate students to work with me. If interested, please drop me a message by email.)

Research Highlights

NaturalCC Logo

NaturalCC is a sequence modeling toolkit that allows researchers and developers to train custom models for many software engineering tasks, e.g., code summarization, code generation, code retrieval, code clone detection, and so on. Our vision is to bridge the gap between programming language and natural language through some machine learning techniques. [arXiv'20, ASE'18, ASE'19, TSE'20, ACL'21, EMNLP'21, TOSEM'21]

SCSMiner Logo

SCSMiner is a mining system on social coding sites (e.g., GitHub), which integrates social networking and distributed version control in a unified platform to facilitate collaborative developments over the world. It can be applied to software developer recruitment for IT corporations.[WWWJ'18, Neurocomputing'18]

Selected Publications

NaturalCC: A Toolkit to Naturalize the Source Code Corpus
Yao Wan, Yang He, Jian-Guo Zhang, Yulei Sui, Hai Jin, Guandong Xu, Caiming Xiong, Philip S. Yu
arXiv 2020.
PDF arXiv Code Homepage
XCode: Towards Cross-Language Code Representation with Large-Scale Pre-Training
Zehao Lin, Guodun Li, Jingfeng Zhang, Yue Deng, Xiangji Zeng, Yin Zhang, Yao Wan
TOSEM 2021. ACM Transactions on Software Engineering and Methodology
Fix-Filter-Fix: Intuitively Connect Any Models for Effective Multi-task Bug Fixing
Haiwen Hong, Jingfeng Zhang, Yin Zhang, Yao Wan and Yulei Sui
EMNLP 2021. The 2021 Conference on Empirical Methods in Natural Language Processing
Disentangled Code Representation Learning for Multiple Programming Languages
Jingfeng Zhang, Haiwen Hong, Yin Zhang, Yao Wan, Ye Liu, Yulei Sui
Findings of ACL 2021. The 59th Annual Meeting of the Association for Computational Linguistics
KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning
Ye Liu, Yao Wan, Lifang He, Hao Peng, Philip S. Yu
AAAI 2021. The 35th AAAI Conference on Artificial Intelligence
Multi-Modal Attention Network Learning for Semantic Source Code Retrieval
Yao Wan, Jingdong Shu, Yulei Sui, Guandong Xu, Zhou Zhao, Jian Wu, Philip S. Yu
ASE 2019. The 34th ACM/IEEE International Conference on Automated Software Engineering
Improving Automatic Source Code Summarization via Deep Reinforcement Learning
Yao Wan, Zhou Zhao, Min Yang, Guandong Xu, Haochao Ying, Jian Wu, Philip S. Yu
ASE 2018. The 33rd ACM/IEEE International Conference on Automated Software Engineering

Professional Services

  Confenrence PC/Reviewer
  • ACL: 2022,2021; EMNLP: 2021; AAAI: 2022,2021; IJCAI: 2021; SIGKDD: 2022; WSDM: 2022; COLING: 2020; NLPCC: 2020; BESC: 2021, 2020
  Journal Reviewer
  • TSE: 2021; TKDE: 2021; WWWJ: 2017-2021; TRel: 2020