Yuxin Zhang (张玉鑫) is a direct Ph.D. student at the School of Information, Renmin University of China (中国人民大学信息学院), advised by Professor Ju Fan (范举). He is affiliated with the Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education.

He received his bachelor’s degree from Renmin University of China in 2024, majoring in Finance and Data Science & Big Data Technology. During his undergraduate study, he worked on multimodal knowledge graphs and classical Chinese poetry retrieval.

research

He is dedicated to developing intelligent systems for data-intensive reasoning and analysis. His research interests include Data Agent, data analysis, Text-to-SQL, and large language models.

overview

His recent work explores how language models and code agents can interact with databases, large file-system environments, and real analytical workflows. Some representative works include Reward-SQL for process-supervised Text-to-SQL reasoning, CODA-BENCH for evaluating code agents on data-intensive tasks, and surveys on NL2SQL with large language models.

latest news

Finished a research internship at Alibaba Cloud, working on Text-to-SQL reasoning optimization.

Our paper CODA-BENCH was accepted by ICML 2026.

Our paper Reward-SQL was accepted by ACM SIGMOD 2026.

Our NL2SQL survey was published in IEEE Transactions on Knowledge and Data Engineering.

Started my direct Ph.D. study at Renmin University of China.

selected publications

  1. ICML

    CODA-BENCH: Can Code Agents Handle Data-Intensive Tasks?

    Yuxin Zhang, Ju Fan, Meihao Fan, Shaolei Zhang, Xiaoyong Du

    ICML 2026

    Introduces a benchmark for evaluating code agents in data-intensive environments, where agents must discover relevant files, write executable analysis code, and solve realistic analytical tasks.

  2. SIGMOD

    Reward-SQL: Boosting Text-to-SQL via Stepwise Reasoning and Process-Supervised Rewards

    Yuxin Zhang, Meihao Fan, Ju Fan, Mingyang Yi, Yuyu Luo, Jian Tan, Guoliang Li

    ACM SIGMOD 2026 (accepted, to appear) / arXiv:2505.04671, 2025

    Introduces Chain-of-CTEs reasoning with process reward models and GRPO reinforcement learning, achieving 68.9% execution accuracy on BIRD and establishing 7B-model state of the art.

  3. TKDE

    A Survey of NL2SQL with Large Language Models: Where are we, and where are we going

    Xinyu Liu, Shuyu Shen, Boyan Li, Peixian Ma, Runzhi Jiang, Yuxin Zhang, Ju Fan, Guoliang Li, Nan Tang, Yuyu Luo

    IEEE TKDE 2025, Vol. 37, No. 10, pp. 5735-5754 / arXiv:2408.05109, 2024

    A systematic survey of NL2SQL in the LLM era, covering prompting, fine-tuning, benchmarks, evaluation, and open challenges.

  4. PVLDB

    Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL

    Ju Fan, Zihui Gu, Songyue Zhang, Yuxin Zhang, Zui Chen, Lei Cao, Guoliang Li, Sam Madden, Xiaoyong Du, Nan Tang

    PVLDB 2024, Vol. 17, No. 11, pp. 2750-2763

    Presents ZeroNL2SQL, combining PLM-based schema alignment with LLM-based complex reasoning for zero-shot Text-to-SQL.

  5. EMNLP

    A Multi-Modal Knowledge Graph for Classical Chinese Poetry

    Yingjie Li, Yuxin Zhang, Bin Wu, Ji-Rong Wen, Rui Song, Tao Bai

    Findings of EMNLP 2022, pp. 2318-2326

    Builds a multimodal knowledge graph for classical Chinese poetry and supports poem-image cross-modal retrieval.