Code Generation

What It Is

Training a language model to write functional programs from natural language specifications — typically a docstring or problem description — and evaluating correctness by running unit tests.

Why It Matters

Code is uniquely verifiable: unlike natural language tasks, correctness is binary and automatic (run the tests, get pass/fail). This makes code generation an ideal rigorous testbed for LLM capability, and enables scalable evaluation via pass@k metrics rather than human judgment.

How It Works

A pretrained language model is fine-tuned on a large corpus of code (e.g., GitHub repositories). At inference time, the model receives a function signature and docstring as a prompt and generates the function body. Multiple samples are drawn at varying temperatures; correctness is determined by executing the generated code against a hidden test suite. The key metric is pass@k: the probability that at least one of k sampled completions passes all tests.

Key Sources

codex-evaluating-large-language-models-trained-on-code
rag-retrieval-augmented-generation
codeact-executable-code-actions-llm-agents

pre-training
fine-tuning
sampling
scaling-laws

ML Wiki

Explorer

Code Generation

What It Is

Why It Matters

How It Works

Key Sources

Graph View

Table of Contents

Backlinks

ML Wiki

Explorer

Code Generation

What It Is

Why It Matters

How It Works

Key Sources

Related Concepts

Graph View

Table of Contents

Backlinks