ML Wiki

Tag: mechanistic-interpretability

2 items with this tag.

  • May 03, 2026

    Probing (Neural Network Interpretability)

    • concept
    • interpretability
    • mechanistic-interpretability
  • May 03, 2026

    Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task (Othello-GPT)

    • source
    • mechanistic-interpretability
    • probing
    • emergent-behavior
    • transformer
    • in-context-learning