ML Wiki

Tag: safety

4 items with this tag.

  • Apr 17, 2026

    Constitutional AI (CAI)

    • concept
    • alignment
    • safety
  • Apr 17, 2026

    Harmlessness (AI alignment)

    • concept
    • alignment
    • safety
  • Apr 17, 2026

    Constitutional AI: Teaching Models to Self-Correct

    • papers
    • alignment
    • rlhf
    • safety
  • Apr 10, 2026

    Training language models to follow instructions with human feedback (InstructGPT)

    • source
    • alignment
    • rlhf
    • llm
    • fine-tuning
    • safety