Back to News Hub
🤖OpenAI
September 19, 2019
General AI

Fine-tuning GPT-2 from human preferences

Overview

We've fine-tuned the 774M parameter GPT-2 language model using human feedback for various tasks, successfully matching the preferences of the external human labelers, though those preferences did not always match our own. Specifically, for summarization tasks the labelers preferred sentences copied wholesale from the input (we'd only asked them to ensure accuracy), so our models learned to copy. Summarization required 60k human labels; simpler tasks which continue text in various styles required only 5k.

Read the full story at OpenAI

This publisher only syndicates a short excerpt by RSS. The full article — with all the detail, quotes, and context — lives on their site.

Open original article

Continue Learning

Originally published by OpenAI
Read the original

Comments

Sign in to join the conversation