Yuxin Zhang

Yuxin Zhang (张玉鑫) is a direct Ph.D. student at the School of Information, Renmin University of China (中国人民大学信息学院), advised by Professor Ju Fan (范举). He is affiliated with the Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education.

He received his bachelor’s degree from Renmin University of China in 2024, majoring in Finance and Data Science & Big Data Technology. During his undergraduate study, he worked on multimodal knowledge graphs and classical Chinese poetry retrieval.

research

He is dedicated to developing intelligent systems for data-intensive reasoning and analysis. His research interests include Data Agent, data analysis, Text-to-SQL, and large language models.

overview

His recent work explores how language models and code agents can interact with databases, large file-system environments, and real analytical workflows. Some representative works include Reward-SQL for process-supervised Text-to-SQL reasoning, CODA-BENCH for evaluating code agents on data-intensive tasks, and surveys on NL2SQL with large language models.

View: Publications, and Contact.

latest news

Mar 2026Finished a research internship at Alibaba Cloud, working on Text-to-SQL reasoning optimization.

2026Our paper CODA-BENCH was accepted by ICML 2026.

2026Our paper Reward-SQL was accepted by ACM SIGMOD 2026.

2025Our NL2SQL survey was published in IEEE Transactions on Knowledge and Data Engineering.

Sep 2024Started my direct Ph.D. study at Renmin University of China.

selected publications

ICML

CODA-BENCH: Can Code Agents Handle Data-Intensive Tasks?

Yuxin Zhang, Ju Fan, Meihao Fan, Shaolei Zhang, Xiaoyong Du

ICML 2026

Introduces a benchmark for evaluating code agents in data-intensive environments, where agents must discover relevant files, write executable analysis code, and solve realistic analytical tasks.
SIGMOD

Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards

Yuxin Zhang, Meihao Fan, Ju Fan, Mingyang Yi, Yuyu Luo, Jian Tan, Guoliang Li

ACM SIGMOD 2026 (accepted, to appear) / arXiv:2505.04671, 2025

Introduces Chain-of-CTEs reasoning with process reward models and GRPO reinforcement learning, achieving 68.9% execution accuracy on BIRD and establishing 7B-model state of the art.

arXiv
TKDE

A Survey of NL2SQL with Large Language Models: Where are we, and where are we going

Xinyu Liu, Shuyu Shen, Boyan Li, Peixian Ma, Runzhi Jiang, Yuxin Zhang, Ju Fan, Guoliang Li, Nan Tang, Yuyu Luo

IEEE TKDE 2025, Vol. 37, No. 10, pp. 5735-5754 / arXiv:2408.05109, 2024

A systematic survey of NL2SQL in the LLM era, covering prompting, fine-tuning, benchmarks, evaluation, and open challenges.

arXiv
PVLDB

Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL

Ju Fan, Zihui Gu, Songyue Zhang, Yuxin Zhang, Zui Chen, Lei Cao, Guoliang Li, Sam Madden, Xiaoyong Du, Nan Tang

PVLDB 2024, Vol. 17, No. 11, pp. 2750-2763

Presents ZeroNL2SQL, combining PLM-based schema alignment with LLM-based complex reasoning for zero-shot Text-to-SQL.
EMNLP

A Multi-Modal Knowledge Graph for Classical Chinese Poetry

Yingjie Li, Yuxin Zhang, Bin Wu, Ji-Rong Wen, Rui Song, Tao Bai

Findings of EMNLP 2022, pp. 2318-2326

Builds a multimodal knowledge graph for classical Chinese poetry and supports poem-image cross-modal retrieval.

research

overview

latest news

selected publications

CODA-BENCH: Can Code Agents Handle Data-Intensive Tasks?

Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards

A Survey of NL2SQL with Large Language Models: Where are we, and where are we going

Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL

A Multi-Modal Knowledge Graph for Classical Chinese Poetry