Explore real-world engineering experiences from top tech companies.
Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.
Spark Declarative Pipelines (SDP) extends declarative data processing from individual queries to entire pipelines in Apache Spark, reducing operational burden for data engineering teams.
This post announces the general availability of Zerobus Ingest, a real-time data ingestion product targeting organizations at scale.
Databricks Genie now supports enterprise OAuth to embed natural-language data analytics into Microsoft Teams and custom web apps.
This post covers Databricks' Predictive Optimization (PO) in Unity Catalog, which became the default platform behavior in 2025 for autonomous lakehouse table maintenance.
This post explores how data science work at Figma's Billing infrastructure differs from traditional product analytics, offering five lessons on expanding impact in complex, correctness-driven domains.
This post describes Pinterest's Auto Memory Retries feature for Apache Spark, which automatically retries OOM-failed tasks on larger executors to reduce failures and resource waste.
The Hydra team, maintainers of the pg_duckdb extension, is joining Supabase to advance Postgres-native analytics capabilities.
Pinterest describes its next-generation database ingestion framework built on CDC, Kafka, Flink, Spark, and Iceberg to replace legacy batch-based pipelines.
This article describes the architecture, optimization, and evolution of Lyft's Feature Store, a core ML infrastructure platform serving 60+ use cases across the rideshare stack.
Cloudflare announces support for GROUP BY, SUM, and other aggregation queries in R2 SQL, its serverless analytics query engine over R2 Data Catalog.
Grab built 'Scenarios' in their CDP to enable real-time personalization beyond daily batch updates.
This post introduces iceberg-js, a minimal JavaScript client for the Apache Iceberg REST Catalog API targeting JavaScript and TypeScript developers.