Explore real-world engineering experiences from top tech companies.
Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.
This post covers Databricks' Predictive Optimization (PO) in Unity Catalog, which became the default platform behavior in 2025 for autonomous lakehouse table maintenance.
This post explores how data science work at Figma's Billing infrastructure differs from traditional product analytics, offering five lessons on expanding impact in complex, correctness-driven domains.
This post describes Pinterest's Auto Memory Retries feature for Apache Spark, which automatically retries OOM-failed tasks on larger executors to reduce failures and resource waste.
The Hydra team, maintainers of the pg_duckdb extension, is joining Supabase to advance Postgres-native analytics capabilities.
Pinterest describes its next-generation database ingestion framework built on CDC, Kafka, Flink, Spark, and Iceberg to replace legacy batch-based pipelines.
This article describes the architecture, optimization, and evolution of Lyft's Feature Store, a core ML infrastructure platform serving 60+ use cases across the rideshare stack.
Cloudflare announces support for GROUP BY, SUM, and other aggregation queries in R2 SQL, its serverless analytics query engine over R2 Data Catalog.
Grab built 'Scenarios' in their CDP to enable real-time personalization beyond daily batch updates.
This post introduces iceberg-js, a minimal JavaScript client for the Apache Iceberg REST Catalog API targeting JavaScript and TypeScript developers.
Amazon S3 Tables announces Intelligent-Tiering storage and cross-region replication support for Apache Iceberg tables.
This post introduces Coban, Grab's platform for real-time Kafka stream data quality monitoring using user-defined data contracts with syntactic and semantic test rules.
This article presents an ETL design document template used at Square to improve data quality, team consistency, and documentation practices.