Explore real-world engineering experiences from top tech companies.
Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.
Pinterest introduced a GPU-served two-tower model using MMOE-DCN architecture for lightweight ads engagement prediction.
This post introduces an agent skill that enables coding agents (Claude and Codex) to write production-ready CUDA kernels for HuggingFace's diffusers and transformers libraries.
This article explores low-bit inference techniques that make large AI models faster and more cost-efficient to serve in production.
This post from Lyft explains how they validate and diagnose Doubly Robust (AIPW) models used for causal inference when A/B testing is not feasible.
Transformers.js v4 preview is now available on NPM, bringing a new WebGPU runtime, build system overhaul, and expanded model support.
This post demonstrates finetuning FunctionGemma with Tunix, a JAX-based LLM post-training library, on Google TPUs.
Spotify's ads team describes how they re-architected their serving stack to replace the Two-Tower model with more expressive neural networks capable of deep feature interactions.
Pinterest's Ads team developed transformer-based behavioral sequence models to improve ad candidate generation using users' offsite activity history.
This post identifies a late-phase instability mechanism in production-scale reinforcement learning for tool-using agents, caused by tool-conditioned variance amplification.
Meta introduces the User True Interest Survey (UTIS) model to improve Facebook Reels recommendations by incorporating direct user feedback beyond traditional engagement signals.
Pinterest introduces PinLanding, a production pipeline that uses multimodal AI to automatically generate shopping collections from billions of catalog items.
This post provides a practical guide to debugging JAX workloads on Cloud TPUs, covering essential tools and their relationships in distributed environments.