Why Whiteboard Apps Are Uniquely Hard to Build
A whiteboard app looks deceptively simple. Users draw shapes, type text, drag objects around a canvas. Under the hood, you are solving at least four hard problems simultaneously: real-time conflict resolution across distributed clients, performant canvas rendering at 60fps with thousands of objects, spatial indexing for hit detection and viewport culling, and a networking layer that keeps latency low enough that drawing feels instant.
Most teams underestimate the scope. A basic prototype with shapes and freehand drawing takes 2 to 4 weeks. A production whiteboard with collaboration, persistence, version history, and export takes 4 to 8 months of dedicated engineering. Miro has over 400 engineers. Figma's rendering engine alone took years to build. You are not going to replicate that with a small team, but you do not need to. The key is understanding which pieces to build yourself and which to buy.
The good news: the ecosystem in 2026 is mature enough that you can assemble a production-quality whiteboard from open-source libraries and managed services. tldraw provides a complete whiteboard UI and engine. Yjs and Automerge handle CRDT-based sync. Liveblocks and PartyKit manage WebSocket infrastructure. The remaining work is integration, customization, and the domain-specific features that differentiate your product. This guide walks you through each layer of the stack, with real costs, timelines, and tradeoffs.
Choosing Your Canvas Rendering Engine
The rendering engine is the foundation of your whiteboard. Every element on the board, from sticky notes to freehand strokes to embedded images, needs to be drawn, transformed, and interacted with at 60 frames per second. You have three main options in 2026, and the right choice depends on how custom your whiteboard needs to be.
tldraw: The Fastest Path to a Working Whiteboard
tldraw is an open-source whiteboard engine built by Steve Ruiz. It provides a complete whiteboard experience out of the box: freehand drawing, shapes, arrows, text, sticky notes, image embeds, selection, grouping, and a polished UI. The rendering is built on top of HTML and SVG (not raw Canvas), which makes customization easier for web developers. tldraw's API lets you define custom shapes, override rendering behavior, and hook into user interactions. If your product is "a whiteboard plus domain-specific features" (think whiteboard for product design reviews, architecture diagramming, or classroom teaching), tldraw saves you 3 to 6 months of foundational work. The library is MIT-licensed. tldraw also provides a paid sync service for multiplayer, or you can wire it up to your own Yjs backend.
Konva.js and Fabric.js: Canvas Libraries for Custom UIs
If you need a canvas-based app that is not strictly a whiteboard (maybe it is a floor planner, a circuit designer, or a mind-mapping tool), Konva.js and Fabric.js give you lower-level building blocks. Konva uses HTML Canvas with a scene graph abstraction: you create nodes (rectangles, circles, text, images), add them to layers, and Konva handles rendering, hit detection, and transforms. Fabric.js takes a similar approach with more built-in support for object serialization and SVG import/export. Both libraries support React wrappers (react-konva, fabricjs-react), making them straightforward to integrate into a modern frontend. Expect 6 to 10 weeks to build basic whiteboard functionality on top of these libraries, compared to tldraw's out-of-the-box solution.
Raw Canvas / WebGL: Maximum Performance, Maximum Effort
Figma renders everything with WebGL compiled to WebAssembly. This gives them sub-millisecond frame times even with tens of thousands of objects. If your whiteboard will routinely have 5,000+ objects on screen or requires advanced rendering features like GPU-accelerated blur, shadows, or 3D transforms, raw Canvas or WebGL is the path. But be honest about whether you need this. Most whiteboards top out at a few hundred objects per board. Konva or tldraw handles that easily. Going raw Canvas or WebGL adds 3 to 6 months of rendering engine development before you even start on collaboration features.
Our recommendation for most teams: start with tldraw. If your use case outgrows it, you will know exactly where the bottlenecks are and can migrate specific rendering paths to Canvas or WebGL incrementally.
CRDT Architecture for Spatial Data
Text-based CRDTs are well-understood. Yjs's Y.Text and Y.XmlFragment handle document collaboration beautifully. Whiteboard data is different. You are syncing a collection of objects with positions, dimensions, z-order, styling properties, and parent-child relationships. The CRDT data model you choose determines how conflicts resolve, how undo works, and how much bandwidth your sync layer consumes.
Modeling Whiteboard Objects in Yjs
The standard approach uses a Y.Map at the root level, where each key is an object ID and each value is a Y.Map containing that object's properties. When a user drags a rectangle from (100, 100) to (200, 150), the client updates the x and y properties on that object's Y.Map. Yjs handles the merge if two users move the same object simultaneously: the last write wins on a per-property basis. This means if User A changes the x coordinate and User B changes the fill color at the same time, both changes are preserved. If both change x, one wins. In practice, this is acceptable because simultaneous edits to the exact same property of the exact same object are rare in whiteboard apps.
Freehand Strokes: The Bandwidth Challenge
Freehand drawing generates hundreds of points per second. Each point has x, y, and pressure values. Syncing every point as it is drawn creates a flood of CRDT operations. The solution is batching: buffer points locally (every 50 to 100ms), then sync the batch as a single CRDT update. On the receiving end, render the incoming batch with curve interpolation to smooth out the chunked updates. Yjs's Y.Array works for stroke point storage, but be careful with very long strokes. A 2,000-point stroke stored as a Y.Array creates 2,000 CRDT entries. Consider simplifying completed strokes using the Ramer-Douglas-Peucker algorithm, which can reduce point count by 60 to 80% without visible quality loss.
Z-Order and Layer Management
Z-order (which objects render on top of which) is notoriously tricky in a collaborative setting. If User A moves object X to the front and User B moves object Y to the front at the same time, the final z-order needs to be deterministic. The simplest approach uses fractional indexing: instead of integer z-indices (1, 2, 3), use strings that sort lexicographically ("a0", "a1", "a2"). To insert between two objects, generate a string that sorts between their indices. Libraries like fractional-indexing on npm handle the string generation. This approach avoids renumbering existing objects and merges cleanly across concurrent edits.
For a deeper comparison of CRDT libraries and how they handle these patterns, see our breakdown of Yjs vs Automerge vs Liveblocks.
Multiplayer Presence: Cursors, Selections, and Viewport Awareness
Presence features are what transform a shared canvas into a collaborative experience. Without live cursors, users feel like they are editing alone. Without viewport awareness, two people accidentally work on the same area without knowing it. These features are relatively cheap to build (1 to 2 weeks) but have a massive impact on user experience.
Live Cursor Broadcasting
Each client sends its cursor position in canvas coordinates (not screen coordinates) at 15 to 20 frames per second. The server relays these positions to all other clients viewing the same board. On the receiving end, render each remote cursor with the user's assigned color and name label. Apply client-side interpolation between received positions to smooth the movement. This creates the illusion of fluid cursor tracking even at 15fps update rates. The data payload is tiny: a JSON object with userId, x, y, and a timestamp. At 20fps with 10 concurrent users, you are looking at roughly 200 messages per second for cursor data alone, which is well within the capacity of a single WebSocket server.
Selection and Locking
When a user selects an object, broadcast the selection to all clients so they can display a colored border indicating "User X is editing this." For a basic implementation, treat this as advisory locking: show the selection indicator but allow other users to edit the same object. True locking (preventing edits while someone else has an object selected) feels restrictive in practice and frustrates users. Let the CRDT handle concurrent edits to the same object, and use the visual indicator to give users enough awareness to avoid conflicts naturally.
Minimap and Follow Mode
Whiteboards have infinite canvases, so users can be viewing completely different areas. A minimap showing colored dots for each user's viewport position helps teams stay oriented. "Follow mode" (clicking a user's avatar to lock your viewport to their position) is a killer feature for presentations and guided walkthroughs. Both features use the same data: each client broadcasts its viewport bounds (top, left, width, height in canvas coordinates) alongside cursor data. The overhead is negligible since viewport changes happen far less frequently than cursor movements.
Liveblocks provides presence APIs that handle cursor broadcasting, user metadata, and room management out of the box. If you are building on Yjs directly, the y-protocols awareness module provides a similar (though lower-level) abstraction. Either way, do not skip presence features. They are the difference between "shared canvas" and "collaborative whiteboard."
Backend Architecture and Infrastructure Costs
The backend of a whiteboard app has three core responsibilities: WebSocket relay for real-time sync, persistent storage for board data, and media handling for uploaded images and assets. Here is how to architect each layer and what it costs.
WebSocket Layer
For a self-hosted setup, a Node.js server running the Yjs WebSocket provider (y-websocket) handles sync between clients. A single Node.js process can sustain 5,000 to 10,000 concurrent WebSocket connections depending on message frequency. For a whiteboard app with active drawing, budget 2,000 to 5,000 connections per process due to the higher message throughput. Run multiple processes behind a load balancer with sticky sessions (route all users on the same board to the same process). On AWS, a t3.medium instance ($30/month) handles roughly 3,000 concurrent whiteboard users. Add a second instance for redundancy and you are at $60/month in compute for a reasonably busy product.
Persistent Storage
CRDT state needs to be persisted so boards survive server restarts. Two approaches work well. First, snapshot the Yjs document to a binary blob and store it in PostgreSQL or S3. Yjs provides Y.encodeStateAsUpdate() for efficient binary serialization. A typical whiteboard with 500 objects produces a binary snapshot of 50 to 200 KB. For PostgreSQL, store snapshots in a BYTEA column and load them when a board is opened. Second, use an append-only log of Yjs updates for incremental persistence. This is more complex but enables version history and point-in-time recovery. Budget $20 to $50/month for a managed PostgreSQL instance (RDS db.t3.micro or Supabase free/pro tier) for boards with moderate usage.
Media and Asset Storage
Users will upload images, PDFs, and screenshots to the whiteboard. Store these in S3 or Cloudflare R2. Use presigned URLs so clients upload directly to object storage without routing through your server. R2 has no egress fees, making it the better choice for media-heavy whiteboards. Budget $5 to $20/month for storage depending on usage patterns.
Total Infrastructure Cost
For a whiteboard app supporting 1,000 to 5,000 monthly active users with typical usage patterns, expect $100 to $250/month in infrastructure. This includes compute (2 WebSocket servers), database (managed PostgreSQL), object storage (R2 or S3), and a CDN (Cloudflare free tier or CloudFront). If you use Liveblocks instead of self-hosting, their Pro plan at $99/month covers up to 10,000 monthly active users with managed WebSocket infrastructure, which is often cheaper than running your own once you factor in engineering time for DevOps. For a broader overview of real-time infrastructure patterns, our guide covers the full decision framework.
Essential Features and Build Timeline
Knowing what to build first is as important as knowing how to build it. Here is a phased roadmap based on what we have seen work for teams shipping whiteboard products.
Phase 1: Core Whiteboard (4 to 6 Weeks)
Build the single-player whiteboard first. Implement shape creation (rectangles, ellipses, arrows, lines), freehand drawing, text labels, object selection and transformation (move, resize, rotate), and infinite canvas with pan and zoom. Use tldraw or Konva.js as the rendering foundation. Wire up basic persistence to save and load boards. Do not add collaboration yet. Getting the single-player experience right is critical because every collaboration bug will be harder to diagnose if the local rendering is not solid.
Phase 2: Real-Time Collaboration (3 to 4 Weeks)
Integrate Yjs or Liveblocks for CRDT-based sync. Map your whiteboard data model to CRDT types. Add the WebSocket transport layer (or use Liveblocks's managed infrastructure). Implement live cursors, selection awareness, and user presence indicators. Test extensively with 3 to 5 concurrent users. The most common bugs at this stage are z-order conflicts, ghost objects that appear on one client but not another, and undo behavior that reverts other users' changes. Budget extra time for testing.
Phase 3: Polish and Productivity (3 to 4 Weeks)
Add the features that turn a demo into a product: sticky notes with rich text editing, image uploads with drag-and-drop, copy/paste (including cross-board paste), keyboard shortcuts, export to PNG/SVG/PDF, and board-level permissions (view-only, can-edit, owner). Implement templates so users start with useful layouts instead of a blank canvas.
Phase 4: Scale and Differentiate (Ongoing)
Version history, comments and threads, integrations (Slack, Jira, Notion), AI features (auto-layout, shape recognition, summarization), and performance optimization for large boards. This is where your product diverges from generic whiteboard tools and starts serving your specific market. The total timeline from zero to a shipped v1 with collaboration is 10 to 14 weeks for a team of 2 to 3 frontend engineers with real-time experience. If collaboration tooling is new to your team, add 3 to 4 weeks for the learning curve.
Performance Optimization for Large Boards
A whiteboard with 50 objects performs fine with almost any approach. A board with 5,000 objects will expose every inefficiency in your rendering and sync pipeline. Here are the optimizations that matter most.
Viewport Culling
Only render objects that are visible in the current viewport. Maintain a spatial index (quadtree or R-tree) of all objects on the board. On every pan or zoom event, query the spatial index for objects intersecting the viewport bounds and render only those. This alone can improve frame rates by 10x on large boards. tldraw implements viewport culling by default. If you are building on Konva or raw Canvas, you need to implement this yourself. The js-quadtree or rbush npm packages provide efficient spatial indexing.
Rendering Layers and Caching
Separate your canvas into layers: a static layer for objects that have not changed recently, and a dynamic layer for objects currently being interacted with. Render the static layer to an offscreen canvas and composite it as a single image on each frame. This avoids re-rendering hundreds of unchanged objects every frame. When an object is modified, move it to the dynamic layer, render it independently, and merge it back into the static layer when the interaction ends. This technique is called "dirty rectangle rendering" and it is how most performant canvas applications work.
CRDT Sync Optimization
Large boards generate large CRDT documents. A board with 5,000 objects and a long edit history can produce a Yjs document of 5 to 10 MB. Loading this on initial connection delays the time-to-interactive. Two strategies help. First, compress Yjs updates with zlib or brotli before sending over WebSocket. CRDT binary data compresses well, typically 60 to 80% size reduction. Second, periodically compact the Yjs document by calling Y.encodeStateAsUpdate() and replacing the full history with a single snapshot. This reduces document size but loses granular version history, so do it only after saving a full history snapshot for recovery.
Lazy Loading for Off-Screen Content
For very large boards (10,000+ objects), consider loading board data in spatial chunks. When a user pans to a new area, fetch the objects in that region from the server. This adds complexity to your sync architecture (you need to track which chunks are loaded on each client) but makes initial load times constant regardless of board size. Figma uses a similar approach for their largest files.
Performance work is iterative. Profile your actual boards with real user data before optimizing. The bottleneck might surprise you: often it is not rendering speed but CRDT merge time, garbage collection pauses from large object graphs, or SVG text measurement that slows things down.
Shipping Your Whiteboard App: Next Steps
Building a real-time collaboration whiteboard app is one of the most technically demanding projects in web development. It touches rendering, networking, distributed systems, and user experience design simultaneously. But the ecosystem has never been more accessible. tldraw, Yjs, Liveblocks, and modern browsers give you building blocks that would have required a team of 20 just five years ago.
Here is the honest assessment. If your team has experience with canvas rendering and real-time systems, you can ship a solid v1 in 3 to 4 months. If this is your first real-time product, budget 5 to 6 months and plan to throw away your first WebSocket implementation. The CRDT layer will work beautifully (Yjs is battle-tested), but the integration between your rendering engine, your sync layer, and your persistence layer will need iteration.
The biggest risk is not technical. It is scope. Whiteboard apps invite feature creep because users compare you to Miro and Figma from day one. Define your niche early: is this a whiteboard for agile retrospectives, for UX design, for education, for engineering diagrams? Build the features that serve that niche and resist the urge to become a general-purpose canvas tool until you have product-market fit.
We have helped teams build real-time collaboration products across whiteboarding, document editing, and canvas-based design tools. If you are planning a whiteboard app and want to validate your architecture, get a realistic timeline, or bring in engineers who have shipped this kind of product before, book a free strategy call and we will walk through your specific use case.
Need help building this?
Our team has launched 50+ products for startups and ambitious brands. Let's talk about your project.