Long-Horizon RPG Gameplay
This track challenges agents to complete a full Pokémon role-playing game (Pokémon Emerald) as quickly and efficiently as possible, navigating a massive, partially observable world with hundreds of NPCs and thousands of possible actions.
Long-horizon planning, efficient exploration, and strategic resource management are critical. Agents must balance immediate objectives with long-term strategic goals, making decisions that span thousands of timesteps while adapting to the unpredictable nature of RPG gameplay.
The speedrunning challenge pushes AI systems to their limits in sequential decision-making, requiring sophisticated planning algorithms and efficient resource management to achieve optimal completion times in complex, open-world environments.
A real-time agent loop with modular components for perception (game frame recognition), planning & memory (long vs. short term goals, knowledge storage), and control (emulator action execution).
We maintain a curated list of research papers on Pokémon AI, covering competitive battling, RPG gameplay, reinforcement learning, and LLM agents.
We welcome submissions from the research community. To add your paper:
README.md following the existing format.To appear on the speedrun leaderboard, include a video recording of your agent playing Pokémon Emerald in your PR. We accept runs through any portion of the game, from the first gym all the way to full completion.
The benchmark is designed to scale with agent capability. Our NeurIPS 2025 competition scoped evaluation to the first gym (Roxanne), but we encourage submissions that go further. If your agent can reach the second gym, the third, or complete the entire game, submit it.
Rankings are determined by raw performance metrics — number of actions and time.
While we provide a starter kit with an LLM-scaffolded approach, we encourage submissions using diverse methods: tool-augmented systems, reinforcement learning, purely text-based reasoning, hybrid architectures, and other innovative techniques.
Teams document their methodology across five dimensions: