How to Build·15 min read

How to Build a Real-Time Collaboration Tool Like Figma or Miro

Real-time collaboration is the most architecturally complex feature you can build for a web app. CRDTs changed the game, but the implementation details still trip up most teams.

N

Nate Laquis

Founder & CEO ·

CRDTs vs Operational Transform: Why CRDTs Won

For two decades, Operational Transform (OT) was the standard for real-time collaboration. Google Docs uses OT. It works by transforming concurrent operations against each other so they converge to the same state on every client. The problem? OT requires a central server to order operations, making it fragile at scale and impossible to use in peer-to-peer or offline-first architectures.

Conflict-free Replicated Data Types (CRDTs) solve this differently. Instead of transforming operations, CRDTs use data structures that are mathematically guaranteed to converge regardless of the order operations are applied. No central server required for ordering. Clients can diverge (even go offline) and merge back cleanly. This is why every modern collaboration tool built after 2020 uses CRDTs.

Figma uses a custom CRDT implementation compiled to WebAssembly. Notion migrated from OT to CRDTs. Linear, Liveblocks, and PartyKit all build on CRDT foundations. If you are starting a collaboration tool today, CRDTs are the only serious choice.

Team collaborating in real-time on a shared digital workspace

The tradeoff is complexity. CRDTs are harder to understand and implement from scratch than OT. A naive CRDT implementation can consume significantly more memory than OT because it stores metadata for conflict resolution. But the open-source ecosystem has matured enough that you rarely need to build CRDTs from scratch. Libraries like Yjs and Automerge handle the hard parts.

Choosing Your CRDT Library: Yjs, Automerge, and Diamond Types

Three libraries dominate the CRDT space in 2026, and each has a distinct philosophy.

Yjs

Yjs is the most widely adopted CRDT library for web applications. It is fast, memory-efficient, and has the largest ecosystem of integrations: Tiptap for rich text, BlockSuite for Notion-style block editors, y-prosemirror for ProseMirror, and y-codemirror for code editing. Yjs uses a custom CRDT algorithm (YATA) optimized for text editing that outperforms academic CRDT implementations by 10-100x in benchmarks. If you are building a document or text-based collaboration tool, start with Yjs.

Automerge

Automerge takes a more data-oriented approach. While Yjs is optimized for text, Automerge excels at structured data: JSON documents, nested objects, arrays, and counters. The Automerge 2.0 rewrite (in Rust, compiled to Wasm) dramatically improved performance. Automerge is the better choice if your collaboration model is structured data (like a shared kanban board, spreadsheet, or diagram) rather than free-form text.

Diamond Types

The newest contender, Diamond Types, is a Rust-based CRDT implementation that claims 5-10x better performance than Yjs for text editing. It is still maturing, but if raw performance is critical (think collaborative code editors handling files with tens of thousands of lines), Diamond Types is worth evaluating.

For most collaboration tools, Yjs is the safe default. Its ecosystem, documentation, and community support are unmatched. You can always swap the underlying CRDT engine later if performance demands it, as long as you abstract the CRDT layer behind a clean interface in your architecture.

Managed Services: Liveblocks, PartyKit, and Tiptap Collab

You do not have to manage your own WebSocket infrastructure. Managed collaboration services handle the networking, conflict resolution, and persistence layers so you can focus on your product.

Liveblocks

Liveblocks is the most complete managed collaboration platform. It provides presence (live cursors, who's online), storage (CRDT-backed shared data), and comments/notifications out of the box. Pricing starts at $0 for small projects and scales to $99+/month for production use. Liveblocks integrates with Yjs, so you get the best of both worlds: Yjs's CRDT engine with Liveblocks's managed infrastructure. If you want to ship collaboration features in weeks instead of months, Liveblocks is the fastest path.

PartyKit

PartyKit runs on Cloudflare's edge network, giving you globally distributed WebSocket servers with minimal latency. It is lower-level than Liveblocks. You write the server logic (including CRDT sync), and PartyKit handles deployment, scaling, and edge distribution. Think of it as a collaboration-aware serverless platform. PartyKit is ideal if you need custom sync logic or want more control over the server-side behavior.

Tiptap Collab

If your collaboration feature is specifically rich text editing (documents, comments, content creation), Tiptap Collab provides a managed Yjs backend paired with the Tiptap editor. It handles document storage, user awareness, and version history for text documents. Pricing is per-connection, starting at around $0.01 per connection-hour.

The build-vs-buy decision here is straightforward. If collaboration is your core product (you are building a Figma competitor), build your own infrastructure for maximum control. If collaboration is a feature within a larger product (adding real-time editing to a project management tool), use a managed service and focus your engineering on your core value proposition. For more on building real-time features, check our dedicated guide.

WebSocket Infrastructure at Scale

Whether you use a managed service or build your own, understanding WebSocket infrastructure is critical for a collaboration tool.

Connection Management

Each active user maintains a persistent WebSocket connection. A document with 20 concurrent editors has 20 connections. If you have 1,000 active documents at peak, that is 20,000+ concurrent WebSocket connections. Nginx or Caddy as a reverse proxy handles WebSocket upgrades natively. Budget your server capacity based on peak concurrent connections, not total users.

Scaling with Redis Pub/Sub

A single server process can handle roughly 10,000 to 50,000 WebSocket connections depending on message frequency. Beyond that, you need horizontal scaling. The standard pattern uses Redis Pub/Sub as a message broker between server instances. When a user edits on Server A, the update is published to Redis, and Server B broadcasts it to its connected clients. This requires sticky sessions (all clients editing the same document connect to the same server) or a shared document state store.

Sticky Sessions vs Shared State

Sticky sessions are simpler: use a load balancer to route all connections for a given document to the same server. This keeps the document state in memory on one server. The downside is failover: if that server goes down, all users on that document disconnect and must reconnect to a new server that needs to rebuild the document state from persistence.

Shared state (storing CRDT state in Redis or a database and syncing on every operation) provides better fault tolerance but adds latency. For most collaboration tools, sticky sessions with graceful failover are the right balance of simplicity and reliability.

Code showing WebSocket server implementation for real-time collaboration

Infrastructure Costs

WebSocket servers are memory-bound, not CPU-bound. Each connection consumes roughly 10-50 KB of memory for the socket state plus the in-memory document state. For a collaboration tool with 5,000 concurrent users across 500 documents, budget $200 to $500/month in compute costs. Add $50 to $150/month for Redis. These costs scale linearly with concurrent users.

Cursor Presence and Awareness Features

Live cursors, selection highlights, and user avatars are what make collaboration feel real-time. Without them, users have no sense that others are working alongside them.

Cursor broadcasting: Each client sends its cursor position (or selection range) to the server at a throttled rate, typically 15 to 30 times per second. The server broadcasts these positions to all other clients in the same document. This is lightweight data (just x/y coordinates or text offsets), but the frequency means it generates the majority of your WebSocket traffic.

Throttling and interpolation: Sending cursor positions at 60fps wastes bandwidth. Throttle to 15-20fps and use client-side interpolation to smooth the movement between received positions. This gives the illusion of real-time movement while cutting network traffic by 75%.

User avatars and colors: Assign each user a consistent color (from a predefined palette of 8-12 distinct colors) and show their avatar next to their cursor. Liveblocks and Yjs both provide awareness protocols that handle user metadata broadcasting out of the box.

Idle and away states: If a user has not moved their cursor for 60 seconds, fade their cursor to indicate they are idle. After 5 minutes, remove it entirely and update the "who's here" indicator. This prevents ghost cursors from cluttering the workspace.

Viewport awareness: In canvas-based tools (like Figma or Miro), show minimap indicators of where each user is viewing. This helps users navigate to each other and reduces the chance of duplicate work. In document tools, show a sidebar indicator of which section each user is currently editing.

These features are relatively simple to implement (1-2 weeks of engineering) but have an outsized impact on the collaboration experience. Users consistently rate live cursors as the feature that makes collaboration tools feel "alive."

Version History and Undo/Redo at Scale

Version history in a collaborative environment is fundamentally different from single-user undo/redo. When five people are editing simultaneously, pressing Ctrl+Z should undo your changes, not someone else's.

Per-User Undo Stacks

Yjs supports per-user undo out of the box through its UndoManager. It tracks which operations were made by the local user and reverses only those operations when undo is triggered, preserving all other users' changes. This is one of the major advantages of using a CRDT library rather than building your own conflict resolution.

Snapshot-Based Version History

For a "version history" feature (like Google Docs' version history), you need periodic snapshots of the document state. The naive approach is storing the full document state at regular intervals (every 5 minutes, or on every save). For small documents this works fine. For large documents or canvases, full snapshots become expensive in storage.

A more efficient approach: store the initial state plus a log of all CRDT operations. To reconstruct any point in time, replay operations up to that timestamp. Yjs supports this through its update encoding. Store encoded updates in an append-only log (PostgreSQL, S3, or a dedicated event store) and replay them to reconstruct any historical state. This is storage-efficient but compute-intensive for reconstruction.

The Hybrid Approach

Combine both: store full snapshots at regular intervals (every hour or every 100 operations) and operation logs between snapshots. To reconstruct a specific point in time, load the nearest snapshot and replay operations from there. This balances storage cost against reconstruction speed. Budget 2-3 weeks of engineering for a robust version history system. If you are building something that needs to scale to many users, this architecture becomes essential.

Version history interface showing document revision timeline

Canvas vs Document Collaboration: Architectural Differences

The CRDT model and rendering approach differ significantly depending on whether you are building a document editor or a canvas tool.

Document Collaboration (Notion, Google Docs Style)

Documents are structured as trees of blocks (paragraphs, headings, lists, embeds). Yjs's Y.XmlFragment or Y.Array maps naturally to this structure. Rich text editing uses ProseMirror or TipTap with y-prosemirror for CRDT binding. The rendering is handled by the browser's DOM engine, so performance is rarely a bottleneck. The main challenge is handling block-level operations: moving blocks between sections, nesting, and cross-block selections.

Canvas Collaboration (Figma, Miro Style)

Canvas tools render to HTML Canvas or WebGL, bypassing the DOM entirely. This gives you 60fps rendering for thousands of objects but means you own the entire rendering pipeline. Each shape, line, and text element is an object in your CRDT data structure, typically stored in a Y.Map. The rendering engine reads the CRDT state and draws every frame.

Figma compiles their rendering engine to WebAssembly for near-native performance. If you are building a canvas tool, plan to invest heavily in the rendering layer. Libraries like Konva.js or PixiJS provide a starting point, but production canvas tools inevitably need custom rendering for performance.

Canvas tools also need spatial indexing (quadtrees or R-trees) to efficiently determine which objects are visible in the current viewport and which object the user is clicking on. This is standard game engine architecture applied to productivity software.

Hybrid Approaches

Some tools (like Notion or Coda) combine both: document-style block editing with embedded interactive widgets, tables, and mini-canvases. This hybrid approach uses DOM rendering for text content and Canvas for interactive embeds. It is architecturally complex but creates the most versatile collaboration experience.

We have built real-time collaboration features for SaaS products across document editing, canvas tools, and hybrid applications. Book a free strategy call to discuss your collaboration architecture and get a timeline estimate for your specific use case.

Need help building this?

Our team has launched 50+ products for startups and ambitious brands. Let's talk about your project.

collaboration tool developmentCRDT implementationreal-time multiplayer appFigma architectureLiveblocks development

Ready to build your product?

Book a free 15-minute strategy call. No pitch, just clarity on your next steps.

Get Started