Simple and efficient rules for making a “good enough” decision for a local optimal solution.
Frontend System Design Heuristics
Understand the problem and Requirements
- What does the system do?
- What data are involved in the system?
- Who are the actors in the system?
- What does a happy path look like (functional req — some dimensions: display order/ sort, expiry time, randomness)?
- What are the attributes of the system capabilities (non-functional req — eg. performance/latency, live update, usability, accessibility, multiple device support, consistency, availability, peak throughput)?
Example: Design Twitter
- The system allows users to post new tweets, to follow other users, to like and comment on other users tweets. Users can see the latest tweets on the timeline from the users they follow.
- Tweets: containing text, video, images, links. Users: name, images, bio.
- Actors: Users — followers and followees.
- Happy Path: User A successfully publishes a tweet. User B sees first 20 tweets from his followers. User B also sees A’s new tweet after refresh.
- Publish tweets: reliability, no data loss. View timeline: performant, accessible, on mobile devices too. Infinite Scroll: performant. New tweet update: small delay (inconsistency) ok.
Establish scope / out of scope
- What situations / conditions are we ignoring?
- What assumptions are we making?
- We are ignoring other functionalities like user management, bookmark tweets, search, messages
- We assume the Daily Active User = X, Monthly Active User = X. The ideal API response < 100ms.
High Level System Design
- Which capabilities are Read Heavy? (Caching & Denormalising)
- Which capabilities are Write Heaving? (No SQL & Sharding)
- What does a system diagram look like? (client, server, DB, cache..)
- What are some of the risky area?
- A lot more people are viewing tweets over creating new tweets. This is a Read Heavy system.
- Efficient DB for storing new tweets and support heavy read
- File storage for storing photos, videos
Further Design Considerations
Server side Caching (Read Tweets)
- How many data to store per user?
- How many users data do we store?
- Cache Eviction Strategy?
- Cache Invalidation Strategy?
- 20/80 rule?
- Most of the reads are on the most popular tweets of the few celebrities. We can use 20/80 rule to cache only those tweets and it will take away most of the daily read volumes from DB.
- When the cache is full, and we want to replace a newer/ hotter tweet in the cache, we can use LFU (least frequently used) or LRU (least recently used) cache eviction strategy
Client side Performance Consideration
- PRPL Pattern: Push/ Preload, Render initial route asap (SSR), Pre-cache remaining routes (Service Worker), Lazy Load other routes and non-critical assets.
- Metrics — Google RAIL Model: Response < 100ms, Animation: aim for 1 frame < 10ms and use css properties that requires compositing changes only without repaint or re-layout such as transform and opacity. Idle: maximise idle time to perform deferred work e.g. loading images below the fold. Load: lazy load and code splitting.
- Prefetch and Preload: Prefetch links in the users timeline in the view port, and preload the static assets (some are hidden in css/js so tell browser to load early). Load only the css, js required for critical path and async load the rest.
- Code splitting/ lazy loading (skeleton): Webpack can split code into smaller chunks e.g. vendors and main scripts. Lazy loading is the process of loading the chunks lazily. e.g. load by route or after user action. Images: lazy load the images below the fold or outside the viewport.
- Caching: browser http caching— we want to cache the immutable content for max age! e.g. main.[chunk].js (if the app is not rebuilt, it can be treated as immutable content) and other static assets. Note: the index.html should not be cached in case browser never realises new data is available.
- CDN: When user fetches the data for the first time, it does not need to travel all the way to server if the data is available in CDN. Traditionally we store the static assets like images, videos in the CDN, and JAMstack takes it one step further and deploy the apps on CDNs instead of always on servers.
- Service Workers and Cache API: store the network requests so they can provide fast responses. A service worker runs separately from the main browser thread. It can cache requests and receive push notifications when app is not active. It also enables PWA. Web Storage / IndexedDB can be used store application data offline.
- PWA: works offline or with poor network connection and improves user experience and adoption rates on mobiles.
- Polling: Client repeatedly polls for data from server at a regular interval. It may result in wasted network requests when server returns no data.
- Long-Polling: Server pushes data when data is available as connection may stay hanging open. Client does not need to periodically pulls data. Less wasted overhead (there is still a timeout period!)
- WebSockets: bi-directional, persistent connection over TCP. good for real-time chat messaging.
- Server-Sent Events: uni-directional, persistent, long term connection. good for push notification.
Distribute data in order to read/writing efficiently
- Shard on User ID
- Shard on Tweet ID
- Shard on Tweet creation time
- Shard on Tweet ID and Tweet creation time
- valid HTML
- Keyboard navigation
- Screen reader
- Color contrast
- 200% zoom
- Understand common attacks like XSS, CSRF
Grokking the System Design Interview - Learn Interactively
System design questions have become a standard part of the software engineering interview process. Performance in these…
Heuristic Test Strategy Model - Satisfice, Inc.
The HTSM (v.5.7.5) is a set of guideword heuristics designed to help you think better about test strategy. It includes…
System Design Interview: DoorDash - Free Interactive Course
This course prepares you for a system design interview by using a case study of DoorDash (a prepared food delivery…