Pinterest's New Asynchronous Computing Platform

🌙 Hello world ☀️ 

As we approach the end of 2023, a wealth of survey data is being released. Here are my favorites so far:

In this week’s email:

  • Architecture: Pinterest’s architecture transition from PinLater to Pacer.

  • Node.js: Improving Node.js loader performance.

  • JavaScript: The await event horizon in JavaScript.

  • CSS: A full-width slider case study.

  • Productivity: Four simple programming habits to transform productivity.

Optimization hinders evolution.

Alan Perlis

Created with Midjourney


Some of the most notable features on the platform include Pins, Boards, image thumbnail generation, follows, and platform-wide notifications.

How does Pinterest serve so many users at such a large scale?

These core features were handled by an asynchronous job execution system called PinLater, and was one of Pinterest’s most mission critical systems.

PinLater has been open-sourced (and archived) and the codebase can be found here.

Old Architecture: PinLater

Looking at the architecture, we can see PinLater is composed of three core main components:

  1. PinLater Thrift service for scheduling and managing job submissions.

  2. Backend data store to save the jobs, with payloads and metadata.

  3. Workers in worker pools to process the stored jobs and to track the execution status of these jobs.

Pinterest’s PinLater Architecture

As traffic increased and more features were added to the platform, the Pinterest team found flaws with the PinLater system.

The first problem is lock contention in the data store. Since queues have one table in each data store shard, and each dequeue request scans all the shards for available jobs, multiple threads are attempting to retrieve data from the same table.

How does this impact the system on a larger scale?

As traffic increases, the Thrift service naturally scales up to account for the increased load.

Scaling horizontally would ideally distribute the load across multiple instances, but in this case, lock contention actually degrades the performance of PinLater.

Not only does this negate the scalability, but also the throughput of the platform.

Some other issues include mixing jobs from multiple job queues, and sharing the same Thrift services for different team functions.

Job queues may require different instance types, which makes performance tuning extremely challenging.

Also, different functions could have different reliability requirements, where utilizing a shared service is not ideal.

These issues became the driving need for a new architecture, Pacer.

New Architecture: Pacer

With a focus on performance, the Pinterest team introduced new components and mechanisms for storing, accessing, and isolating job data and queues.

These components include:

  1. Broker service to pull jobs from the backend data store.

  2. Helix and Zookeeper to manage job queues for the broker service.

  3. Dedicated worker pools for each queue.

The combination of Helix and Zookeeper ensures that each partition of a job queue is assigned to a dedicated broker host. As a result, there is no competition over the same job data.

Furthermore, since the dequeue broker service is in charge of pulling job queues from the data store, the service is able to prefetch and cache jobs in local memory buffers.

This reduces latency when reading from memory and also decouples the dequeue and enqueue process completely.

Key Improvements

With these new changes to the architecture, the Pinterest team has seen the following improvements:

  1. Lock contention is completely resolved in the data store.

  2. Significant improvement in hardware utilization.

  3. Improved performance and scalability due to independent job execution (own environment) and custom configurations.

In a side-by-side comparison with PinLater, it’s clear that Pacer is built for scalability and resilience.

Enqueue and Dequeue Decoupled From Thrift Service

In PinLater, lots of resources were wasted due to redundant partitions that were created in the data store shards. This led to extra calls where more than 50% of the calls were empty results.

The original system was also limited, as some functionalities simply couldn’t be supported. For example, job execution order can only be guaranteed locally. There was no global order.

In order to manage such a large number of partitions optimally, a cluster management framework was needed to properly partition, replicate, and distribute resources hosted on a cluster of nodes.

Helix: Cluster Management Framework

Helix monitors for various events, such as configuration changes and state changes for broker hosts.

In doing so, the Helix controller is able to calculate and manage resources, while also communicating with the broker cluster to bring it to the ideal state.

Each broker host reports its liveness to Zookeeper and are notified when assigned tasks are changed.

P.S. If you’re enjoying the content of this newsletter, please share it with your network:

Created with DALL·E 3

Big picture: The author discusses improvements in Node.js loader performance, focusing on how Node.js differentiates between EcmaScript (ES) and CommonJS modules for loading. The choice depends on factors like file extensions and package.json file configurations.

What you’ll learn: You will learn about the inner workings of the Node.js loader, specifically how it decides whether to use the ES or CommonJS module loader based on file extensions and package.json contents.

Why this matters: The optimizations discussed are particularly relevant for developers looking to enhance the performance of their Node.js applications, offering practical knowledge on improving load times and reducing operational overhead.

Created with DALL·E 3

Big picture: The author discusses the practical implications and challenges of async functions in JavaScript, where once execution crosses the await event horizon, it becomes impossible to forcibly escape, potentially leading to issues like resource leaks.

What you’ll learn: You'll learn about the inherent risks in async functions, specifically how they can lead to resource leaks if a Promise never settles or takes too long to do so. This is demonstrated through the example of a lock in an async function that may never get released.

Why this matters: This concept is significant for developing robust and efficient JavaScript applications, as it highlights the need for careful structuring and management of asynchronous code to prevent potential issues like resource leaks and unresponsive behaviors.

Created with DALL·E 3

Big picture: The case study focuses on implementing a full-width slider using CSS scroll snapping, integrated seamlessly with a global page layout. It addresses the challenge of aligning the slider's content with other page elements while maintaining consistent padding and alignment.

What you’ll learn: You will learn practical techniques for using CSS variables and scroll snapping properties to align a full-width slider with the rest of a webpage's layout. The case study provides step-by-step guidance on achieving a responsive and visually coherent design.

Why this matters: This tutorial demonstrates advanced CSS techniques for creating responsive, aligned, and visually appealing web components. Understanding these methods is crucial for designing modern, user-friendly websites.

Created with Midjourney

Big picture: The article presents four simple yet effective habits that can significantly enhance the productivity of software engineers. These habits focus on optimizing everyday workflow and decision-making processes in software development.

What you’ll learn: You will learn about practical strategies such as strategically leaving work slightly unfinished, utilizing keyboard shortcuts, maintaining an accessible list of commands and links, and the importance of saying "no" to manage workload effectively.

Why this matters: For software engineers, adopting these habits is crucial for improving efficiency and managing complex tasks in a demanding environment. The article provides actionable tips that can be easily integrated into daily routines for better productivity and work-life balance.

Balanced Binary Tree

Missed the solution to the latest coding challenge?

This question is asked by Adobe and Microsoft. Learn how to solve this problem here.

JS Weekly Pulse

📢 OpenAI SDK for Deno: Improvements to module loading via HTTPS, import TypeScript, and availability of GPT-4 Turbo, Assistants API, DALL-E 3 and more.

📢 Vue 2: Reaches End of Life on December 31, 2023, as the Vue team focuses on Vue 3, meaning Vue 2 will no longer receive updates or fixes but will still be available on existing distribution channels.

📜 React Summit: The talk focuses on reducing architectural complexity by structuring code into self-contained modules.

📜 shadcn/ui: A deep dive into the architecture of this unique JavaScript library.

🚀 SvelteKit 2.0: Incremental release supporting Vite 5 with various improvements and the much-requested shallow routing feature, is now available and sets the stage for the anticipated Svelte 5 release in 2024.

🚀 Remix 2.4.0: Introduces the Client Data RFC with new APIs, enhances support for Vite, and includes various improvements like shallow routing, strict route exports, and optimizations for both server and client builds.

🚀 date-fns 3.0.0: Includes 100% TypeScript support, smaller build size, string arguments support, Date class extensions support, ESM support for Node.js, named exports for all functions, and no more IE support.

Around The Web

 Interesting: JavaScript continued to dominate web development, including in WebAssembly, while generative AI began integrating into web frameworks.

 Learn: The 2023 edition of a Node.js best practices repository includes updated content, new libraries, and a demonstration application, Practica.js, to showcase these practices.

 Algorithms: A deep dive into Canva’s Shape Assist, a machine learning tool in their Draw tool, which transforms shaky scribbles into sleek vector graphics in the browser.

 LLMs: 2023 marked a surge in public interest and debate around Large Language Models (LLMs), with a focus on open-source models for their benefits in research reproducibility, community involvement, and environmental impact.

 Advice: The overuse of abstraction layers in technology and software development.

Tools and Packages

📦️ Oxc: A collection of high-performance JavaScript tools written in Rust.

📦️ Byte Base: Database DevOps and CI/CD solution designed to streamline the database development lifecycle and facilitate collaboration between developers and DBAs.

📦️ Page Spy Web: Remote debugging tool for web projects, designed to function like Chrome DevTools.

RIP AI: 1941-2023…it was a good run 🤣 

What'd you think of today's edition?

Login or Subscribe to participate in polls.

Join the conversation

or to participate.