「A Survey on Large Language Model based Autonomous Agents」を読んだメモ

2023 年 8 月の論文「A Survey on Large Language Model based Autonomous Agents」を読んだメモです。

GitHub: https://github.com/Paitesanshi/LLM-Agent-Survey

1 Introduction

Specifically, we organize our survey based on three aspects including the construction, application, and evaluation of LLM-based autonomous agents.

以下の 3 つの観点についてまとめているとのこと。

construction
application
evaluation

For the agent construction, we present a unified framework composed of four components, that is, a profile module to represent agent attributes, a memory module to store historical information, a planning module to strategize future actions, and an action module to execute the planned decisions.

エージェントの構造として、4 つのコンポーネントに整理したフレームワークを提案。

profile
memory
planning
action

2 LLM-based Autonomous Agent Construction

2.1 Agent Architecture Design

2.1.1 Profiling Module

profile の作成方法は以下の 3 つ。

手作り
LLM で生成
現実のデータセットに基づく

2.1.2 Memory Module

たとえば、短期記憶は context window、長期記憶は vector storage とのこと。

2.1.3 Planning Module

以下の構造に整理。

フィードバックなしの計画
- サブゴールの分解
- マルチパス思考
- 外部プランナー
フィードバックを伴う計画
- 環境に関するフィードバック
  - ReAct
- 人間のフィードバック
- モデルのフィードバック

2.1.4 Action Module

ここは読み飛ばした。

3 LLM-based Autonomous Agent Application

どんな分野があるかまとめられている。

大きくは以下の 3 分野。

社会科学
自然科学
エンジニアリング

4 LLM-based Autonomous Agent Evaluation

4.1 Subjective Evaluation

LLM-based agents have a wide range of applications. However, in many scenarios, there lacks general metrics to evaluate the performance of agents. Some potential properties, like agent’s intelligence and user-friendliness, cannot be measured by quantitative metrics as well. Therefore, subjective evaluation is indispensable for current research.

評価のための一般的な指標が不足しており、定量評価できない特性もあるため、現在の研究では主観的な評価は不可欠としている。

Subjective evaluation refers to the testing of the capabilities of LLM-based agents by humans through various means such as interaction, scoring, and so on.

主観的な評価とは、人間が対話したりスコア付けするものなど。

チューリングテストもある。

主観的な評価に LLM を使うこともできる。

たとえば EvaluatorGPT や ChatEval というものがある。

4.2 Objective Evaluation

客観的な評価の方法もいろいろある模様。

6 Challenges

6.2 Generalized Human Alignment

LLM は人間の正しい価値観で動くよう調整されることが多いが、それではシミュレーション用途で不適切な可能性があるとのこと。

6.5 Knowledge Boundary

人間をシミュレーションするうえでは、LLM は膨大な知識を持ちすぎている。

知識がない想定で意思決定する様子をシミュレーションするときに困る。

感想

エージェントの構成要素を以下の 4 つに整理しているのが分かりやすかった。

profile
memory
planning
action

主観的な評価は不可欠だと明言されている点は、こう言われていますと引用しやすい。

課題として、正しい倫理観を持ちすぎることや、知識がありすぎることにより、人間らしいシミュレーションをできない可能性が指摘されているのは面白い。

この記事を SNS でシェアする