Explore real-world engineering experiences from top tech companies.
Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.
This post explains Ulysses Sequence Parallelism (SP), a technique from Snowflake AI Research for training LLMs on sequences up to millions of tokens by distributing attention computation across GPUs.
TensorFlow 2.21 introduces major updates to the LiteRT stack and operator support for lower-precision data types.
Modular Diffusers introduces a composable block-based approach to building diffusion pipelines, replacing monolithic pipeline classes with reusable, swappable components.
Pinterest unified three separate ads engagement models for Home Feed, Search, and Related Pins into one shared architecture.
This post describes a 24-hour speedrun for training a text-to-image diffusion model using 32 H200 GPUs and a ~$1500 compute budget.
Pinterest investigates the online–offline discrepancy in L1 CVR models in their ads funnel.
TabPFN, by Prior Labs, applies the pre-trained LLM paradigm to tabular data, removing the need for traditional ML preprocessing and per-task training.
Meta open-sources RCCLX, an enhanced GPU communication library for AMD platforms that significantly improves AI training and inference performance.
Airbnb recaps its 2025 academic research at KDD, CIKM, and EMNLP covering ML, NLP, and recommendation systems.
Netflix introduces MediaFM, an in-house tri-modal (audio, video, text) foundation model for deep media content understanding at scale.
This post explains how to fine-tune small LLMs for free using Unsloth and Hugging Face Jobs, with support for coding agents like Claude Code and Codex.
Amazon SageMaker Inference now supports GA deployment of custom Amazon Nova models for production-grade inference.