Lectura rápida
Empieza por la explicación más corta y útil antes de profundizar.
Artificially generated training data produced by LLMs or other AI models, used to augment or replace human-annotated datasets. Techniques include prompt-based generation, retrieval-augmented pipelines, and iterative self-refinement. Synthetic data slashes costs from $5-20 per human preference point to under $0.01 per sample and became central to post-training pipelines in 2024-2025.