Google releases Gemma Scope 2, the largest open-source interpretability toolkit to date, covering all Gemma 3 model sizes from 270M to 27B parameters.
- •Built using sparse autoencoders (SAEs) and transcoders to reveal internal model states and decision-making processes
- •Training involved storing ~110 Petabytes of data and over 1 trillion total parameters
- •Includes skip-transcoders and cross-layer transcoders for deciphering multi-step computations across model layers
- •Uses Matryoshka training technique to improve concept detection and fix flaws found in the original Gemma Scope
- •Provides chat-tuned model analysis tools targeting jailbreaks, refusal mechanisms, and chain-of-thought faithfulness
This summary was automatically generated by AI based on the original article and may not be fully accurate.