🤖OpenAI
July 4, 2018
General AILearning Montezuma's Revenge from a single demonstration
Overview
We've trained an agent to achieve a high score of 74,500 on Montezuma's Revenge from a single human demonstration, better than any previously published result. Our algorithm is simple: the agent plays a sequence of games starting from carefully chosen states from the demonstration, and learns from them by optimizing the game score using PPO, the same reinforcement learning algorithm that underpins OpenAI Five.
Read the full story at OpenAI
This publisher only syndicates a short excerpt by RSS. The full article — with all the detail, quotes, and context — lives on their site.
Open original articleContinue Learning
Originally published by OpenAI
Read the originalComments
Sign in to join the conversation