ML Wiki

Tag: alignment

8 items with this tag.

  • Apr 10, 2026

    Alignment (AI)

    • concept
    • alignment
    • training
  • Apr 10, 2026

    PPO (Proximal Policy Optimization)

    • concept
    • reinforcement-learning
    • training
    • alignment
  • Apr 10, 2026

    Reward Model

    • concept
    • alignment
    • training
    • rlhf
  • Apr 10, 2026

    Training language models to follow instructions with human feedback (InstructGPT)

    • source
    • alignment
    • rlhf
    • llm
    • fine-tuning
    • safety
  • Apr 04, 2026

    DPO (Direct Preference Optimization)

    • concept
    • alignment
    • training
  • Apr 04, 2026

    RLHF (Reinforcement Learning from Human Feedback)

    • concept
    • alignment
    • training
  • Apr 04, 2026

    SFT (Supervised Fine-Tuning)

    • concept
    • training
    • alignment
  • Apr 04, 2026

    Direct Preference Optimization: Your Language Model is Secretly a Reward Model

    • source
    • alignment
    • preference-learning
    • reward-modeling
    • policy-optimization
    • human-feedback