Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.
Endigest AI Core Summary
Spotify's ads team describes how they re-architected their serving stack to replace the Two-Tower model with more expressive neural networks capable of deep feature interactions.
•Two-Tower models are efficient but cannot leverage interaction features, target attention, or early feature crossing between user and item representations
•High-value O(1M) candidates have features embedded directly as PyTorch registered buffers in the model file, eliminating network I/O and host-to-GPU transfer overhead
•Business logic (utility calculation, diversity rules, top-k selection) was moved inside the PyTorch model to reduce GPU-to-CPU data transfer from O(100K) to O(1K) documents
•GPU inference latency was reduced from 4000ms p90 to 20ms via multi-stream CUDA, worker-to-core alignment, Triton kernel fusion, and BF16 precision
•Retrieval data flow was restructured to return only IDs and Bids in a column-wise format first, deferring heavy metadata fetch to after ranking reduces candidate set
This summary was automatically generated by AI based on the original article and may not be fully accurate.