Leitura rápida
Comece pela explicação mais curta e útil antes de aprofundar.
A class of LLMs trained with reinforcement learning to generate step-by-step internal chain-of-thought before producing a final answer, enabling stronger performance on complex math, coding, and logic tasks. Pioneered by OpenAI's o1 (September 2024) and followed by o3, DeepSeek-R1, and Claude's extended thinking mode. Unlike standard LLMs that answer directly, reasoning models produce a variable-length internal CoT, allowing controllable compute at inference time.