Publications

selected papers and preprints.

* indicates equal contribution. First/co-first-author papers are highlighted.
SWE-Cycle: Benchmarking Code Agents across the Complete Issue Resolution Cycle preview

SWE-Cycle: Benchmarking Code Agents across the Complete Issue Resolution Cycle

Hao Guan, Lingyue Fu, Shao Zhang, Yaoming Zhu, Kangning Zhang, Lin Qiu, Xunliang Cai, Xuezhi Cao, Weiwen Liu, Weinan Zhang, and Yong Yu

arXiv preprint, 2026

SWE-Cycle evaluates code agents across environment reconstruction, implementation, test generation, and a full-cycle issue-resolution task.

MuSEAgent: A Multimodal Reasoning Agent with Stateful Experiences preview

MuSEAgent: A Multimodal Reasoning Agent with Stateful Experiences

Shijian Wang, Jiarui Jin, Runhao Fu, Zexuan Yan, Xingjian Wang, Mengkang Hu, Eric Wang, Xiaoxi Li, Kangning Zhang, Li Yao, Wenxiang Jiao, Xuelian Cheng, Yuan Lu, and Zongyuan Ge

arXiv preprint, 2026

MuSEAgent distills interaction histories into stateful decision experiences and retrieves them through complementary search strategies for multimodal reasoning.

Fints: Efficient Inference-Time Personalization for LLMs with Fine-Grained Instance-Tailored Steering preview

Fints: Efficient Inference-Time Personalization for LLMs with Fine-Grained Instance-Tailored Steering

Kounianhua Du, Jianxing Liu, Kangning Zhang, Wenxiang Jiao, Yuan Lu, Jiarui Jin, Weiwen Liu, Yong Yu, and Weinan Zhang

arXiv preprint, 2025

Fints performs inference-time personalization by selecting fine-grained, instance-tailored steering signals for dynamic user preferences and sparse personalization data.

A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models preview

A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models

Congmin Zheng, Jiachen Zhu, Zhuoying Ou, Yuxiang Chen, Kangning Zhang, Rong Shan, Zeyu Zheng, Mengyue Yang, Jianghao Lin, Yong Yu, and Weinan Zhang

arXiv preprint, 2025

This survey reviews process reward models across process data construction, reward modeling, test-time scaling, and reinforcement learning for large language models.

An Automatic Graph Construction Framework based on Large Language Models for Recommendation preview

An Automatic Graph Construction Framework based on Large Language Models for Recommendation

Rong Shan, Jianghao Lin, Chenxu Zhu, Bo Chen, Menghui Zhu, Kangning Zhang, Jieming Zhu, Ruiming Tang, Yong Yu, and Weinan Zhang

ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2025

This framework uses large language models to automate graph construction for recommendation, improving the graph learning substrate used by GNN-based recommenders.

ClickPrompt: CTR Models are Strong Prompt Generators for Adapting Language Models to CTR Prediction preview

ClickPrompt: CTR Models are Strong Prompt Generators for Adapting Language Models to CTR Prediction

Jianghao Lin, Bo Chen, Hangyu Wang, Yunjia Xi, Yanru Qu, Xinyi Dai, Kangning Zhang, Ruiming Tang, Yong Yu, and Weinan Zhang

The ACM Web Conference (WWW), 2024

ClickPrompt adapts language models to CTR prediction by using CTR models as prompt generators, combining semantic and collaborative signals.

CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models preview

CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models

Lingyue Fu, Huacan Chai, Shuang Luo, Kounianhua Du, Weiming Zhang, Longteng Fan, Jiayi Lei, Renting Rui, Jianghao Lin, Yuchen Fang, Yifan Liu, Jingkuan Wang, Siyuan Qi, Kangning Zhang, Weinan Zhang, and Yong Yu

arXiv preprint, 2023

CodeApex is a bilingual benchmark for evaluating large language models on programming comprehension, code generation, and code correction.