
Microsoft
Machine Learning•2026-01-28
Diagnosing instability in production-scale agent reinforcement learning
This post identifies a late-phase instability mechanism in production-scale reinforcement learning for tool-using agents, caused by tool-conditioned variance amplification.