About Me关于我
I am Feng Xiang, a master's student in Computer Science at Wuhan University and a Research Intern at Alibaba Group.
My research focuses on multimodal large language models and agentic reinforcement learning (including search and memory). Beyond building stronger model reasoning, I am particularly interested in trustworthy reasoning — enabling models to "know what they know and know what they don't", and to proactively gather clearly-sourced evidence to form verifiable reasoning chains. I believe this is essential for deploying LLM/MLLM agents in real-world, high-stakes scenarios. I am also exploring unified multimodal models that incorporate action modalities.
🤝 I am always looking for research collaborations. If you are interested in sharing GPU resources or discussing ideas, feel free to reach out!
我是冯祥,武汉大学计算机学院硕士研究生,同时是阿里巴巴集团的研究型实习生。
我目前的研究主要关注多模态大语言模型和智能体强化学习(包括搜索和记忆等)。在实现更强的模型推理能力之外,我同时关心可信推理问题——即如何让模型能够"知之为知之,不知为不知",并且可以主动收集来源清晰的证据组成可验证推理链。我认为,这对于将 LLM/MLLM 智能体部署到真实且严肃的场景中具有重要价值。此外,我也在关注融合动作模态的统一多模态模型。
🤝 我一直在寻找更多的科研合作。如果您愿意与我共享 GPU 算力或者对我的研究感兴趣想与我讨论,欢迎随时联系我!
Experience经历
Alibaba Group阿里巴巴集团
Research Intern研究型实习生
Focusing on trustworthy reasoning for document intelligence.主要关注文档智能的可信推理。
2026.01 - Present
Wuhan University武汉大学
M.S. in Computer Science and Technology, School of Computer Science计算机科学与技术硕士,计算机学院
Weighted average: 92.71/100; rank: 16/207, Top 8%加权平均分:92.71/100;排名:16/207,前 8%
2024.09 - Present
Lanzhou University兰州大学
B.S. in Computer Science and Technology, School of Information Science and Engineering计算机科学与技术学士,信息科学与工程学院
Weighted average: 88.44/100; rank: 7/113, Top 6%加权平均分:88.44/100;排名:7/113,前 6%
2020.09 - 2024.06
News新闻
Publications出版物
-
DocScope: Benchmarking Verifiable Reasoning for Trustworthy Long-Document UnderstandingarXiv preprint arXiv:2605.08888, 2026A benchmark that evaluates whether MLLMs can produce trustworthy, verifiable reasoning traces over long, visually rich documents via a four-stage evaluation protocol.一个评估多模态大模型能否在长文档上产生可信、可验证推理轨迹的基准,采用四阶段评估协议。
-
Omni-I2C: A Holistic Benchmark for High-Fidelity Image-to-Code GenerationACL 2026A comprehensive benchmark of 1,080 samples across 5 code types and 45 figure types for evaluating LMMs on converting complex digital graphics into executable code.一个包含 1,080 个样本、涵盖 5 种代码类型和 45 种图表类型的基准,用于评估大模型将复杂数字图形转化为可执行代码的能力。
-
AnesSuite: A Comprehensive Benchmark and Dataset Suite for Anesthesiology Reasoning in LLMsInternational Conference on Learning Representations (ICLR), 2026The first comprehensive dataset suite for anesthesiology reasoning, covering benchmark, training data (CPT/SFT/RLVR), and Morpheus baseline reasoning models.首个面向麻醉学推理的综合数据集套件,涵盖评测基准、训练数据(CPT/SFT/RLVR)和 Morpheus 基线推理模型。
-
REX-RAG: Reasoning Exploration with Policy Correction in Retrieval-Augmented GenerationarXiv preprint arXiv:2508.08149, 2025Addresses dead-end exploration in RL-trained RAG agents through mixed sampling with exploratory prompts and a policy correction mechanism to reduce distribution shift.通过混合采样和策略纠正机制解决 RL 训练的 RAG 智能体中的死胡同探索问题,减少分布偏移。
-
Adaptive Decoding via Hierarchical Neural Information Gradients in Mouse Visual TasksarXiv preprint arXiv:2510.09451, 2025Proposes a hierarchical neural-information gradient framework to decode visual task representations from mouse brain activity across cortical regions.提出分层神经信息梯度框架,从小鼠脑活动中解码跨皮层区域的视觉任务表征。
-
Decoding Mouse Visual Tasks via Hierarchical Neural-Information GradientsMathematics 14(1), 31, 2025Studies hierarchical information gradients across mouse visual cortex to understand how neural data flows support visual task decoding.研究小鼠视觉皮层的分层信息梯度,理解神经数据流如何支持视觉任务解码。
-
Orthogonal-moment-based Attraction Measurement with Ocular Hints in Video-watching TaskIEEE Transactions on Computational Social Systems 10(3), 900-909, 2023Combines orthogonal moments with eye-tracking signals to measure viewer attraction levels during video-watching tasks.结合正交矩与眼动信号,测量视频观看任务中的观众吸引力水平。