ML Wiki

Tag: safety

1 item with this tag.

  • Apr 10, 2026

    Training language models to follow instructions with human feedback (InstructGPT)

    • source
    • alignment
    • rlhf
    • llm
    • fine-tuning
    • safety