本文整理了关于“Coding Agent / LLM-based agent” 的重要论文。更新日期:2025年10月。
本文内容
代码智能体相关工作,主要涉及以下几个方面。
这些论文奠定了大模型用于代码生成和理解的基础。
目前很多 Agent 是基于的 LLMs 的 in-context 学习能力实现的,可以算是 Context-Engineering。
⭐ Language Models are Few-Shot Learners (2020.05,OpenAI)
这些论文提出了构建能够执行复杂任务的代码智能体的具体架构和方法。
运用不同的 Prompting 方法和过程,来提升基础 LLMs 的推理能力。
理论解释:
其他:
HumanEval,MBPP 见上。
Trends
其他平台(部分列出,仅供对比参考):
其他论文:
和 Context,Prompt 相关:
以下仅是初步列表,后续将根据分类和论文的影响力增删
本部分仅用于观察论文的索引量,方便后续跟踪。
Multi-Agent
Wireless Multi-Agent Generative AI: From Connected Intelligence to Collective Intelligence (2023.07,-)
Neural Amortized Inference for Nested Multi-agent Reasoning (2023.08,-)
CGMI: Configurable General Multi-Agent Interaction Framework (2023.08,-)
GPT-in-the-Loop: Adaptive Decision-Making for Multiagent Systems (2023.08,-)
MetaGPT: Meta Programming for a Multi-Agent Collaborative Framework (2023.08,—)
GameGPT: Multi-agent Collaborative Framework for Game Development (2023.10,—)
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation (2023.12,-)
CodePori: Large Scale Model for Autonomous Software Development by Using Multi-Agents (2024.02,—)
MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution (2024.03,—)
Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization (2024.04,—)
LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision and the Road Ahead (2024.04,-)
CoMM: Collaborative Multi-Agent, Multi-Reasoning-Path Prompting for Complex Problem Solving (2024.04,-)
MARE: Multi-Agents Collaboration Framework for Requirements Engineering (2024.05,—)
AutoSafeCoder: A Multi-Agent Framework for Securing LLM Code Generation through Static Analysis and Fuzz Testing (2024.09,—)
Self-Evolving Multi-Agent Collaboration Networks for Software Development (2024.10,—)
MAGE: A Multi-Agent Engine for Automated RTL Code Generation (2024.12,—)
Achilles Heel of Distributed Multi-Agent Systems (2025.04,-)
其他
TrustAgent: Towards Safe and Trustworthy LLM-based Agents (2024.02,-)
CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges (2024.01,-)
AlphaEvolve: A coding agent for scientific and algorithmic discovery (2025.06,-)
Takedown: How It’s Done in Modern Coding Agent Exploits (2025.09,-)
如下补充内容来自 https://github.com/JiaruQian/awesome-llm-based-agent4code
Self-planning Code Generation with Large Language Models (2023.03,—)
ToolCoder: Teach Code Generation Models to use API search tools (2023.05,—)
Self-Edit: Fault-Aware Code Editor for Code Generation (2023.05,—)
Is Self-Repair a Silver Bullet for Code Generation? (2023.06,—)
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis (2023.07,—)
CodePlan: Repository-level Coding using LLMs and Planning (2023.09,—)
L2MAC: Large Language Model Automatic Computer for Extensive Code Generation (2023.10,—)
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models (2023.10,—)
Lemur: Harmonizing Natural Language and Code for Language Agents (2023.10,—)
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules (2023.10,—)
A Self-Iteration Code Generation Method Based on Large Language Models
Knowledge-Aware Code Generation with Large Language Models (2024.01,—)
RepairAgent: An Autonomous, LLM-Based Agent for Program Repair (2024.03,—)
AnalogCoder: Analog Circuit Design via Training-Free Code Generation (2024.05,—)
Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search (2024.05,—)
CodeNav: Beyond tool-use to using real-world codebases with LLM agents (2024.06,—)
Planning In Natural Language Improves LLM Search For Code Generation (2024.09,—)
SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement (2024.10,—)
CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models (2024.11,—)
ROCODE: Integrating Backtracking Mechanism and Program Analysis in Large Language Models for Code Generation (2024.11,—)
Tree-of-Code: A Tree-Structured Exploring Framework for End-to-End Code Generation and Execution in Complex Task Handling (2024.12,—)
Optimizing Code Runtime Performance through Context-Aware Retrieval-Augmented Generation (2025.01,—)
PatchPilot: A Stable and Cost-Efficient Agentic Patching Framework (2025.02,—)
AIDE: AI-Driven Exploration in the Space of Code (2025.02,—)
DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal (2025.03,—)
CodeTool: Enhancing Programmatic Tool Invocation of LLMs via Process Supervision (2025.03,—)
CodeRAG: Supportive Code Retrieval on Bigraph for Real-World Code Generation (2025.04,—)
Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents (2025.05,—)
cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree (2025.06,—)
Self-collaboration Code Generation via ChatGPT (2023.04,—)
ChatDev: Communicative Agents for Software Development (2023.07,—)
CleanAgent: Automating Data Standardization with LLM-based Agents (2024.03,—)
SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents (2024.03,—)
AutoCodeRover: Autonomous Program Improvement (2024.04,—)
AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology (2024.06,—)
Agentless: Demystifying LLM-based Software Engineering Agents (2024.07,—)
VerilogCoder: Autonomous Verilog Coding Agents with Graph-based Planning and AST-based Waveform Tracing Tool (2024.08,—)
A Pair Programming Framework for Code Generation via Multi-Plan Exploration and Feedback-Driven Refinement (2024.09,—)
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale (2024.09,—)
QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks (2025.01,—)
Cogito, ergo sum: A Neurobiologically-Inspired Cognition-Memory-Growth System for Code Generation (2025.01,—)
SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering (2025.02,—)
Adversarial Reasoning for Repair Based on Inferred Program Intent (2025.05,—)
SEW: Self-Evolving Agentic Workflows for Automated Code Generation (2025.05,—)
Measuring Coding Challenge Competence With APPS (2021.05,-)
EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories (2024.03,-)
DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories (2024.05,-)
如下补充内容来自:https://github.com/Paitesanshi/LLM-Agent-Survey
A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human Level (2021.12,-)
MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning (2022.05,-)
WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents (2022.07,-)
Inner Monologue: Embodied Reasoning through Planning with Language Models (2022.07,Google)
Social Simulacra: Creating Populated Prototypes for Social Computing Systems (2022.08,-)
Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies (2022.08,-)
Out of One, Many: Using Language Models to Simulate Human Samples (2022.09,-)
Large Language Models are Few-shot Testers: Exploring LLM-based General Bug Reproduction (2022.09,-)
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models (2022.12,-)
OpenAGI: When LLM Meets Domain Experts (2023.04,-)
Emergent autonomous scientific research capabilities of large language models (2023.04,-)
ChemCrow: Augmenting large-language models with chemistry tools (2023.04,-)
API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs (2023.04,-)
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models (2023.04,-)
Improving Grounded Language Understanding in a Collaborative Environment by Interacting with Agents Through Help Feedback (2023.04,-)
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency (2023.04,-)
SCM: Enhancing Large Language Model with Self-Controlled Memory Framework (2023.04,-)
Industrial Engineering with Large Language Models: A case study of ChatGPT’s performance on Oil & Gas problems (2023.04,-)
Towards autonomous system: flexible modular production system enhanced with large language model agents (2023.04,-)
RET-LLM: Towards a General Read-Write Memory for Large Language Models (2023.05,-)
Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark (2023.05,-)
Playing repeated games with Large Language Models (2023.05,-)
Training Socially Aligned Language Models on Simulated Social Interactions (2023.05,-)
Mindstorms in Natural Language-Based Societies of Mind (2023.05,-)
Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory (2023.05,-)
Decision-Oriented Dialogue for Human-AI Collaboration (2023.05,-)
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) (2023.05,-)
Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach (2023.06,-)
ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory (2023.06,-)
SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning (2023.07,-)
The SocialAI School: Insights from Developmental Psychology Towards Artificial Socio-Cultural Agents (2023.07,-)
Towards A Unified Agent with Foundation Models (2023.07,-)
Are you in a Masquerade? Exploring the Behavior and Impact of Large Language Model Driven Social Bots in Online Social Networks (2023.07,-)
Tachikuma: Understading Complex Interactions with Multi-Character and Novel Objects by Large Language Models (2023.07,-)
WebArena: A Realistic Web Environment for Building Autonomous Agents (2023.07,-)
S$^3$: Social-network Simulation System with Large Language Model-Empowered Agents (2023.07,-)
Understanding the Benefits and Challenges of Using Large Language Model-based Conversational Agents for Mental Well-being Support (2023.07,-)
Dialogue Shaping: Empowering Agents through NPC Interaction (2023.07,-)
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs (2023.07,-)
RAH! RecSys-Assistant-Human: A Human-Centered Recommendation Framework with LLM Agents (2023.08,-)
ChatEDA: A Large Language Model Powered Autonomous Agent for EDA (2023.08,-)
ProAgent: Building Proactive Cooperative Agents with Large Language Models (2023.08,-)
RecMind: Large Language Model Powered Agent For Recommendation (2023.08,-)
Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations (2023.08,-)
MindAgent: Emergent Gaming Interaction (2023.09,-)
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback (2023.09,-)
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models (2023.11,-)
Memory Augmented Large Language Models are Computationally Universal (2023.01,-)
Blind Judgement: Agent-Based Supreme Court Modelling With GPT (2023.01,-)
Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? (2023.01,-)
ChatGPT and Software Testing Education: Promises & Perils (2023.02,-)
ViperGPT: Visual Inference via Python Execution for Reasoning (2023.03,-)
Language Models can Solve Computer Tasks (2023.03,-)
Can Large Language Models Transform Computational Social Science? (2023.04,-)
Mobile-Env: Building Qualified Evaluation Benchmarks for LLM-GUI Interaction (2023.05,-)
Introspective Tips: Large Language Model for In-Context Decision Making (2023.05,-)
Clembench: Using Game Play to Evaluate Chat-Optimized Language Models as Conversational Agents (2023.05,-)
CodeHelp: Using Large Language Models with Guardrails for Scalable Support in Programming Classes (2023.08,-)
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents (2024.10,-)
Two Failures of Self-Consistency in the Multi-Step Reasoning of LLMs (2023.05,-)
Towards Autonomous Testing Agents via Conversational Large Language Models (2023.06,-)
Large Language Models Are Semi-Parametric Reinforcement Learning Agents (2023.06,-)
Embodied Task Planning with Large Language Models (2023.07,-)
Math Agents: Computational Infrastructure, Mathematical Embedding, and Genomics (2023.07,-)
Flows: Building Blocks of Reasoning and Collaborating AI (2023.08,-)
ChatMOF: An Autonomous AI System for Predicting and Generating Metal-Organic Frameworks (2023.08,-)
Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench (2023.08,-)
PentestGPT: An LLM-empowered Automatic Penetration Testing Tool (2023.08,-)
如下补充内容来自:https://github.com/xinzhel/LLM-Agent-Survey
TALM: Tool Augmented Language Models (2022.05,-)
PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change (2022.06,-)
ART: Automatic multi-step reasoning and tool-use for large language models (2023.03,-)
Large Language Model Guided Tree-of-Thought (2023.05,-)
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings (2023.05,-)
ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models (2023.05,-)
Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning (2023.05,-)
On the Planning Abilities of Large Language Models : A Critical Investigation (2023.05,-)
On the Tool Manipulation Capability of Open-source Large Language Models (2023.05,-)
AdaPlanner: Adaptive Planning from Feedback with Language Models (2023.05,-)
Large Language Models as Tool Makers (2023.05,-)
GEAR: Augmenting Language Models with Generalizable and Efficient Tool Resolution (2023.07,-)
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning (2023.08,-)
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models (2023.08,-)
ISR-LLM: Iterative Self-Refined Large Language Model for Long-Horizon Sequential Task Planning (2023.08,-)
Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning (2023.09,-)
Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training (2023.09,-)
MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use (2023.10,-)
Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts (2023.10,-)
Learning From Mistakes Makes LLM Better Reasoner (2023.10,-)
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems (2023.11,-)
TaskBench: Benchmarking Large Language Models for Task Automation (2023.11,-)
LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models (2024.04,-)
A Survey on the Memory Mechanism of Large Language Model based Agents (2024.04,-)
LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback (2024.06,-)
Can Language Models Serve as Text-Based World Simulators? (2024.06,-)
LLM-A*: Large Language Model Enhanced Incremental Heuristic Search on Path Planning (2024.06,-)
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents (2024.08,The AGI Company (MultiOn))
Making Large Language Models into World Models with Precondition and Effect Knowledge (2024.09,-)
如下补充内容来自:https://github.com/zjunlp/LLMAgentPapers
Language Model Cascades (2022.07,-)
Mind’s Eye: Grounded Language Model Reasoning through Simulation (2022.10,-)
Don’t Generate, Discriminate: A Proposal for Grounding Language Models to Real-World Environments (2022.12,-)
Do Embodied Agents Dream of Pixelated Sheep: Embodied Decision Making using Language Guided World Modelling (2023.01,-)
Collaborating with language models for embodied reasoning (2023.02,-)
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents (2023.02,-)
Evaluating Large Language Models in Theory of Mind Tasks (2023.02,-)
PaLM-E: An Embodied Multimodal Language Model (2023.03,-)
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models (2023.03,-)
Chat with the Environment: Interactive Multimodal Perception Using Large Language Models (2023.03,-)
ChatLog: Carefully Evaluating the Evolution of ChatGPT Across Time (2023.04,-)
Learning to Reason and Memorize with Self-Notes (2023.05,-)
The Role of Summarization in Generative Agents: A Preliminary Perspective (2023.05,-)
Unlimiformer: Long-Range Transformers with Unlimited Length Input (2023.05,-)
Plan, Eliminate, and Track – Language Models are Good Teachers for Embodied Agents (2023.05,-)
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models (2023.05,-)
Knowledge-enhanced Agents for Interactive Text Games (2023.05,-)
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance (2023.05,-)
TidyBot: Personalized Robot Assistance with Large Language Models (2023.05,-)
Small Models are Valuable Plug-ins for Large Language Models (2023.05,-)
Adapting Language Models to Compress Contexts (2023.05,-)
Reasoning with Language Model is Planning with World Model (2023.05,-)
Revisiting Parallel Context Windows: A Frustratingly Simple Alternative and Chain-of-Thought Deterioration (2023.05,-)
Gorilla: Large Language Model Connected with Massive APIs (2023.05,-)
Landmark Attention: Random-Access Infinite Context Length for Transformers (2023.05,-)
Role-Play with Large Language Models (2023.05,-)
Randomized Positional Encodings Boost Length Generalization of Transformers (2023.05,-)
SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks (2023.05,-)
Monotonic Location Attention for Length Generalization (2023.05,-)
Epidemic Modeling with Generative Agents (2023.07,-)
Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration (2023.07,-)
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback (2023.07,-)
Do LLMs Possess a Personality? Making the MBTI Test an Amazing Evaluation for Large Language Models (2023.07,-)
InterAct: Exploring the Potentials of ChatGPT as a Cooperative Agent (2023.08,-)
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization (2023.08,-)
EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education (2023.08,-)
AgentBench: Evaluating LLMs as Agents (2023.08,-)
AgentSims: An Open-Source Sandbox for Large Language Model Evaluation (2023.08,-)
Gentopia: A Collaborative Platform for Tool-Augmented LLMs (2023.08,-)
Is There Any Social Principle for LLM-Based Agents? (2023.08,-)
Taken out of context: On measuring situational awareness in LLMs (2023.09,-)
Self-driven Grounding: Large Language Model Agents with Automatical Language-aligned Skill Learning (2023.09,-)
Cognitive Architectures for Language Agents (2023.09,-)
Agents: An Open-source Framework for Autonomous Language Agents (2023.09,-)
Identifying the Risks of LM Agents with an LM-Emulated Sandbox (2023.09,-)
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving (2023.09,-)
Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration (2023.09,-)
Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View (2023.10,-)
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents (2024.01,-)
Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives (2024.01,-)
Agent AI: Surveying the Horizons of Multimodal Interaction (2024.01,-)
AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning (2024.01,-)
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security (2024.01,-)
EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction (2024.01,-)
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents (2024.01,-)
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents (2024.01,-)
TravelPlanner: A Benchmark for Real-World Planning with Language Agents (2024.02,-)
Can Large Language Model Agents Simulate Human Trust Behavior? (2024.02,-)
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments (2024.04,-)
Agent Planning with World Knowledge Model (2024.05,-)
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models (2024.05,-)
Agentic Skill Discovery (2024.05,-)
Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models (2024.05,-)
Devil’s Advocate: Anticipatory Reflection for LLM Agents (2024.05,-)
TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models (2024.05,-)
Faithful Logical Reasoning via Symbolic Chain-of-Thought (2024.05,-)
Can Graph Learning Improve Planning in LLM-based Agents? (2024.05,-)
AgentSquare: Automatic LLM Agent Search in Modular Design Space (2024.10,-)
Benchmarking Agentic Workflow Generation (2024.10,-)
Reinforcement Learning for Long-Horizon Interactive LLM Agents (2025.02,-)
STeCa: Step-level Trajectory Calibration for LLM Agent Learning (2025.02,-)
LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey (2025.05,-)
Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning (2025.05,-)
Group-in-Group Policy Optimization for LLM Agent Training (2025.05,-)
SPA-RL: Reinforcing LLM Agents via Stepwise Progress Attribution (2025.05,-)
CitySim: Modeling Urban Behaviors and City Dynamics with Large-Scale LLM-Driven Agent Simulation (2025.06,-)
CoLT5: Faster Long-Range Transformers with Conditional Computation (2023.03,-)
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action (2023.03,-)
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs (2023.03,-)
Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks (2023.03,-)
CAMEL: Communicative Agents for Mind Exploration of Large Language Model Society (2023.03,-)
Toxicity in ChatGPT: Analyzing Persona-assigned Language Models (2023.04,-)
Emergent and Predictable Memorization in Large Language Models (2023.04,-)
WizardLM: Empowering large pre-trained language models to follow complex instructions (2023.04,-)
ChatLLM Network: More brains, More intelligence (2023.04,-)
Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models (2023.04,-)
User Behavior Simulation with Large Language Model based Agents (2023.06,-)
Mind2Web: Towards a Generalist Agent for the Web (2023.06,-)
RestGPT: Connecting Large Language Models with Real-World RESTful APIs (2023.06,-)
Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow (2023.06,-)
Inferring the Goals of Communicating Agents from Actions and Instructions (2023.06,-)
Personality Traits in Large Language Models (2023.07,-)
Building Cooperative Embodied Agents Modularly with Large Language Models (2023.07,-)
RoCo: Dialectic Multi-Robot Collaboration with Large Language Models (2023.07,-)
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration (2023.10,-)
OpenAgents: An Open Platform for Language Agents in the Wild (2023.10,-)
Agent Lumos: Unified and Modular Training for Open-Source Language Agents (2023.11,-)
Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models (2023.11,-)
An Embodied Generalist Agent in 3D World (2023.11,-)
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator (2023.12,-)
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent (2023.12,-)
CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update (2023.12,-)
T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step (2023.12,-)
Enhancing Open-Domain Task-Solving Capability of LLMs via Autonomous Tool Integration from GitHub (2023.12,-)
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models (2024.06,-)
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models (2024.06,-)
TextGrad: Automatic Differentiation via Text (2024.06,-)
Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models (2024.06,-)
GenoTEX: An LLM Agent Benchmark for Automated Gene Expression Data Analysis (2024.06,-)
Symbolic Learning Enables Self-Evolving Agents (2024.06,-)
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents (2024.07,-)
Tulip Agent – Enabling LLM-Based Agents to Solve Tasks Using Large Tool Libraries (2024.07,-)
Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations (2024.08,-)
OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs (2024.09,-)
Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback (2023.05,-)
MemoryBank: Enhancing Large Language Models with Long-Term Memory (2023.05,-)
Language Models Meet World Models: Embodied Experiences Enhance Language Models (2023.05,-)
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing (2023.05,-)
Making Language Models Better Tool Learners with Execution Feedback (2023.05,-)
LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities (2023.05,-)
Interactive Natural Language Processing (2023.05,-)
RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text (2023.05,-)
ExpertPrompting: Instructing Large Language Models to be Distinguished Experts (2023.05,-)
An Interactive Agent Foundation Model (2024.02,-)
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement (2024.02,-)
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models (2024.02,-)
Empowering Large Language Model Agents through Action Learning (2024.02,-)
Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization (2024.02,-)
KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents (2024.03,-)
Emergence of Social Norms in Generative Agent Societies: Principles and Architecture (2024.03,-)
SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents (2024.03,-)
AutoGuide: Automated Generation and Selection of Context-Aware Guidelines for Large Language Model Agents (2024.03,-)
A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond (2024.03,-)
Enhancing Trust in LLM-Based AI Automation Agents: New Considerations and Future Challenges (2023.08,-)
LLM As DBA (2023.08,-)
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents (2023.08,-)
Exploring the Intersection of Large Language Models and Agent-Based Modeling via Prompt Engineering (2023.08,-)
Consciousness in Artificial Intelligence: Insights from the Science of Consciousness (2023.08,-)
How susceptible are LLMs to Logical Fallacies? (2023.08,-)
ExpeL: LLM Agents Are Experiential Learners (2023.08,-)
AI Agents vs. Agentic AI: A Conceptual Taxonomy (2025.05,-)
AI Agents That Matter (2024.07,Princeton University)
AI Agentic Programming: A Survey of Techniques (2025.08,-)
Beyond Browsing: API-Based Web Agents (2024.10,-)
Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation (2025.10,-)
Agent Workflow Memory (2024.09,-)
Lessons Learned: A Multi-Agent Framework for Code LLMs to Learn and Improve (2025.05,-)
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications (2025.05,-)
CoAct-1: Computer-using Agents with Coding as Actions (2025.08,-)
Paper2Agent: Reimagining Research Papers As Interactive AI Agents (2025.09,-)
Lita: Light Agent Uncovers the Agentic Coding Capabilities of LLMs (2025.09,-)
Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What tters (2022.12,-)
What Makes Good In-Context Examples for GPT-3? (2021.01,-)
Parallel Context Windows for Large Language Models (2022.12,-)
Position Engineering: Boosting Large Language Models through Positional formation Manipulation (2024.04,-)
Recurrent Context Compression: Efficiently Expanding the Context Window of M (2024.03,-)
Long Context Tuning: Extending LLM Context Window Beyond Training Limits (2023.10,-)
Prompt Engineering or Context Engineering? A Survey of Methods to Improve -Context Learning (2024.07,-)
Context Optimization for In-Context Learning (2024.08,-)
Adaptive Context Selection for Large Language Models (2024.09,-)
发布于:2025-10-09 18:00:00 描述有误?我来纠错