Back to News Hub
🤖OpenAI
May 31, 2023
AI Safety

Improving mathematical reasoning with process supervision

Overview

We've trained a model to achieve a new state-of-the-art in mathematical problem solving by rewarding each correct step of reasoning ("process supervision") instead of simply rewarding the correct final answer ("outcome supervision"). In addition to boosting performance relative to outcome supervision, process supervision also has an important alignment benefit: it directly trains the model to produce a chain-of-thought that is endorsed by humans.

Read the full story at OpenAI

This publisher only syndicates a short excerpt by RSS. The full article — with all the detail, quotes, and context — lives on their site.

Open original article

Continue Learning

Originally published by OpenAI
Read the original

Comments

Sign in to join the conversation