「Voyager: An Open-Ended Embodied Agent with Large Language Models」を読んだメモ

「Voyager: An Open-Ended Embodied Agent with Large Language Models」を読んだメモです。

Abstract

1) an automatic curriculum that maximizes exploration, 2) an ever-growing skill library of executable code for storing and retrieving complex behaviors, and 3) a new iterative prompting mechanism that incorporates environment feedback, execution errors, and self-verification for program improvement.

とくに 2) の動作をスキルライブラリとして保存することや、3) の検証まわりが気になった。

2 Method

2.2 Skill Library

Adding a new skill. Each time GPT-4 generates and verifies a new skill, we add it to the skill library, represented by a vector database.

スキルを生成して検証したら、ベクトルデータベースに保存するとのこと。

2.3 Iterative Prompting Mechanism

3 種類のフィードバックをするとのこと。

環境（マインクラフト）からのフィードバック
実行エラー
タスクの成功の検証

Instead of manually coding success checkers for each new task proposed by the automatic curriculum, we instantiate another GPT-4 agent for self-verification.

別の GPT-4 インスタンスに検証させるとのこと。

3 Experiments

3.2 Baselines

ベースラインとして ReAct、Reflexion、AutoGPT と比較するとのこと。

therefore we have to re-interpret them to be executable in MineDojo and compatible with our experimental setting:

「MineDojo」でプレイ可能なように実装し直した？ MineDojo は Minecraft の API のようなもの？

https://github.com/MineDojo/MineDojo

Voyager は Mineflayer を使っているとのこと。 Mineflayer は、Minecraft の JS/Python API。

https://github.com/PrismarineJS/mineflayer

3.5 Multimodal Feedback from Humans

この時点では GPT-4 は Vision がなかったとのこと。

We demonstrate that given human feedback, Voyager is able to construct complex 3D structures in Minecraft, such as a Nether Portal and a house (Fig. 10). There are two ways to integrate human feedback:

人間からのフィードバックがあると、3D の建造物を建てたりできたとのこと。