How Instacart Scaled Through the Pandemic

Good morning! Welcome back to this week's edition of Full Stack Express, your go-to newsletter for web development, software architecture, and system design.

Headlines

  • Microsoft's AI Team Exposes 38TB of Private Data on GitHub

  • Google Unveils Major Bard Update: Now with Extensions for Maps, YouTube, Flights, and Hotels

  • Vercel Releases v0.dev: A Natural Language-Powered Web UI Generator for React

  • Remix Rolls Out Version 2 of Its Full Stack Web Framework

  • Introducing Nue: The Newest, Fastest Thing in Frontend Development

Featured Deep Dives

  • How Instacart Scaled Through the Pandemic Managing 80,000+ Retail Locations and Predicting Real-Time Item Availability

  • Figma’s Performance Testing Journey from a Single Macbook to a Dual-System Infrastructure

Quick Bytes

  • How Instagram Scaled to 14 Million Users with Just 3 Engineers

  • Understanding Backpressure in Software Systems

  • Bard’s latest AI Updates and Improvements

  • Runtime Comparison Between Node, Deno, and Bun

  • AWS's IPv4 Estate Now Worth $4.5 Billion

Community Spotlight

Kiesel, Theatre.js, nanoGPT, and More

Tip of the Week

Boost Your React App's Performance with Lazy Loading and Suspense

Meme of the Week

JavaScript Frameworks: The Never-Ending Story

HOW INSTACART SCALED THROUGH THE PANDEMIC MANAGING 80k+ RETAIL LOCATIONS

The Challenge

Instacart, an online grocery delivery service, faced a complex challenge in predicting real-time item availability across 80,000+ retail locations, especially during the pandemic.

The company needed to ensure that its machine learning (ML) models could scale to predict the availability of millions of items while maintaining low latency and high consistency.

The Turning Point

The pandemic led to a surge in customer demand and fluctuating in-store inventories.

Instacart needed to evolve its Real-Time Availability (RTA) infrastructure to keep pace with these changes and maintain customer trust.

Objectives and Requirements

  1. Low Latency: Fast and bulk fetching of availability scores at the retrieval stage.

  2. High Consistency: Consistent availability information across all user interfaces.

  3. Scalability: Ability to handle predictions for hundreds of millions of items.

  4. Experimentation: A framework to test multiple ML models efficiently.

The Solution

Instacart implemented two methods for ingesting ML-generated scores into their database (DB) storage:

  1. Full Sync: ML Availability Scoring Service updates a Snowflake table multiple times a day. DB ingestion workers read this table and update the availability scores.

  2. Lazy Score Refresh: Scores are updated on-demand when an item appears in search results and exceeds the allowable age.

Full Sync and Lazy Score Refresh Architecture

To foster faster experimentation, Instacart developed a Multi-Model Experimentation Framework with three key components:

  1. DB Column per Model: Each model's score is stored in a dedicated DB column.

  2. Model-Column Mapping: A service-level configuration system maps the model version to its corresponding unique column.

  3. Experiment-Column Mapping: A/B experiments are easily conducted with a unique feature flag associated with each column.

Multi-Model Experimentation Framework

To manage the complexity of different thresholds for various segments, Instacart introduced the Deltas Framework, which allows for the application of fixed deltas to base thresholds, computed at runtime.

Deltas Framework

Real-world Impact

  1. Scalability: The lazy score refresh reduced the ingestion load by 2/3rds.

  2. Experimentation: A 6X increase in experiments run using the new framework.

  3. Customer Trust: Improved "good found rate," crucial for customer retention.

Key Takeaways

  1. Scalability vs. Consistency: Balancing these two can be challenging but is crucial for maintaining customer trust.

  2. Data Ingestion Strategies: Combining full sync and lazy refresh can optimize both latency and consistency.

  3. Modular Experimentation: A well-designed experimentation framework can significantly reduce engineering work and speed up ML testing.

  4. Threshold Management: Dynamic thresholding can be a powerful tool for handling complex, multi-segment optimization problems.

Instacart's innovative approach to RTA infrastructure demonstrates how engineering, machine learning, and product design can come together to solve complex, real-world challenges at scale.

FIGMA’S PERFORMANCE TESTING JOURNEY FROM A SINGLE MACBOOK TO A DUAL-SYSTEM INFRASTRUCTURE

The Challenge

In 2018, Figma's entire in-house performance testing system ran on a single MacBook, looping through a series of key test scenarios.

Stress Test with 5,000 Comment Pins

Fast forward to 2023, and the landscape has changed dramatically. Figma's codebase has grown in complexity, with new features, products, and a team distributed globally.

The single MacBook approach was no longer sustainable, especially with the team expanding to over 400 engineers and managers.

The Turning Point

The MacBook that had been running tests overheated in October 2020.

Attempts to replicate the tests on another laptop failed, signaling the need for a more scalable, sophisticated approach to performance testing.

Early Stress Tests Created in FigJam

Objectives and Requirements

Figma aimed to build a system that could:

  1. Test every proposed code change in the main monorepo.

  2. Complete performance guardrail checks in under 10 minutes.

  3. Provide detailed performance metrics and comparisons.

The Solution

Figma deployed two systems connected by the same Continuous Integration (CI) system:

  1. Cloud-based System: Runs in GPU-enabled virtual machines (VMs) for every pull request. Due to the noise levels in VMs, a 20% pass margin was set to catch only the most significant regressions.

  2. Hardware System: Runs on an array of test laptops, including older machines, allowing for custom test scenarios and more precise performance metrics.

Both systems shared features like detailed HTML reports and CPU profiles for each run.

Real-world Impact

The new system went live in October 2022 and has been instrumental in identifying performance bottlenecks and regressions early in the development cycle.

It has also empowered engineers to collaborate on performance-sensitive code across teams and time zones.

Key Takeaways

  1. Start Lean, Plan for Scale: While starting with a minimal setup is good, always have a plan for scaling your testing infrastructure.

  2. Proactive vs. Reactive: Shifting from a reactive to a proactive approach in performance testing can save time and improve product quality.

  3. Parallelization and CI: Utilizing parallel runs in CI can significantly speed up the testing process.

  4. Hardware Matters: For graphics-heavy applications, real hardware testing can provide insights that VMs cannot.

  5. Continuous Monitoring: Automated tests and detailed reporting can catch regressions before they impact users, maintaining development velocity.

By embracing a scalable, dual-system approach, Figma has set itself up for sustainable growth, ensuring that performance doesn't become a bottleneck as the product evolves.

BYTE-SIZED TOPICS

INTERESTING PRODUCTS, TOOLS & PACKAGES

TIP OF THE WEEK

React’s React.lazy() and Suspense allow you to perform code splitting, which can significantly improve the performance of your app by reducing the initial bundle size.

import React, { Suspense, lazy } from 'react';

const HeavyComponent = lazy(() => import('./HeavyComponent'));

function App() {
  return (
    <div>
      <main>
        <Suspense fallback={<div>Loading...</div>}>
          <HeavyComponent />
        </Suspense>
      </main>
    </div>
  );
}

export default App;

In this example, HeavyComponent will only be loaded when it's about to be rendered, reducing the initial load time for your application.

The fallback prop in Suspense allows you to display a loading indicator or some other placeholder while the component is being loaded.

This approach is especially useful for components that are not immediately visible, like modals or tabs that the user might never click, or for splitting routes in a single-page application.

MEME OF THE WEEK

Reply

or to participate.