Yifan Song 宋一帆
I am a final-year PhD candidate at Peking University, advised by Prof. Sujian Li. My research focuses on LLM-based agents — how to make language models plan, act, and learn from interaction with the real world.
Since December 2024, I have been a research intern at Xiaomi's LLM Core team, where I am a core contributor to the MiMo series of models, including MiMo-7B, MiMo-VL, MiMo-V2-Flash, and MiMo-V2-Pro.
Find me on Google Scholar, GitHub, and X/Twitter.
Selected Publications
- RestGPT: Connecting Large Language Models with Real-World RESTful APIs arXiv 2023 · Cited by 201
- Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents ACL 2024 · Cited by 190
- Calibrating Factual Knowledge in Pretrained Language Models EMNLP 2022 Findings · Cited by 178
- R1-V: Reinforcing Super Generalization Ability in Vision-Language Models 2025 · Cited by 143
- The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism NAACL 2025 · Cited by 142
- PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wisE Training ICLR 2024 · Cited by 116
- MiMo: Unlocking the Reasoning Potential of Language Model — From Pretraining to Posttraining arXiv 2025 · Cited by 57
- MiMo-V2-Flash Technical Report arXiv 2026 · Cited by 18
- MiMo-VL Technical Report arXiv 2025 · Cited by 15
- VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding? COLM 2024 · Cited by 91
- Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement EMNLP 2024 · Cited by 70
- AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories EMNLP 2024 Findings · Cited by 40
Full list on Google Scholar.