ML Wiki
Search
Search
Explorer
Tag: safety
4 items with this tag.
Apr 17, 2026
Constitutional AI (CAI)
concept
alignment
safety
Apr 17, 2026
Harmlessness (AI alignment)
concept
alignment
safety
Apr 17, 2026
Constitutional AI: Teaching Models to Self-Correct
papers
alignment
rlhf
safety
Apr 10, 2026
Training language models to follow instructions with human feedback (InstructGPT)
source
alignment
rlhf
llm
fine-tuning
safety