---
title: "How to Build an Encrypted Real-Time Messaging App From Scratch"
author: "Nate Laquis"
author_role: "Founder & CEO"
date: "2026-04-24"
category: "How to Build"
tags:
  - build encrypted messaging app
  - end-to-end encryption messaging
  - Signal Protocol implementation
  - real-time chat architecture
  - secure messaging development
excerpt: "Most teams bolt encryption onto a chat app as an afterthought and end up with a system that leaks metadata, breaks on group messages, or collapses under key rotation. If you want to build an encrypted messaging app that actually protects users, encryption has to be the foundation, not a feature flag."
reading_time: "14 min read"
canonical_url: "https://kanopylabs.com/blog/how-to-build-an-encrypted-messaging-app"
---

# How to Build an Encrypted Real-Time Messaging App From Scratch

## Why End-to-End Encryption Is the Only Serious Option

Transport-layer encryption (TLS) protects messages between a client and your server. That is table stakes. It does not protect messages on the server itself, which means any breach, subpoena, or rogue admin exposes every conversation in plaintext. If you are building a messaging app where privacy is a real requirement and not a marketing checkbox, you need end-to-end encryption (E2EE). Only the sender and recipient hold the keys. Your server relays ciphertext it cannot read.

Signal set the standard here. WhatsApp, Google Messages, and Facebook Messenger all adopted the Signal Protocol for E2EE. The protocol is open source, battle-tested, and peer-reviewed by cryptographers. Unless you have a dedicated cryptography team and a very specific reason to roll your own, the Signal Protocol is where you start. Attempting to invent your own encryption scheme is the fastest way to ship something that looks secure but is not.

E2EE introduces genuine engineering constraints. You cannot run server-side search over encrypted messages. You cannot moderate content automatically. Push notifications cannot include message previews unless you build a separate encrypted notification channel. Every feature you take for granted in a plaintext chat app, from link previews to message reactions, needs to be rethought when the server cannot see the content. These are solvable problems, but they shape your entire architecture from day one.

![Digital security concept showing encrypted data protection for messaging applications](https://images.unsplash.com/photo-1563986768609-322da13575f2?w=800&q=80)

The good news: libraries like **libsignal** (the reference implementation maintained by the Signal Foundation) handle the cryptographic heavy lifting. You do not need to implement AES-256 or Curve25519 yourself. What you do need is a deep understanding of how key exchange, session management, and message ordering work so that your application layer integrates correctly with the protocol. A misplaced assumption about key states can silently break encryption for thousands of users.

## Signal Protocol Deep Dive: X3DH and Double Ratchet

The Signal Protocol combines two mechanisms: X3DH (Extended Triple Diffie-Hellman) for initial key agreement and the Double Ratchet Algorithm for ongoing message encryption. Understanding both is non-negotiable if you plan to build an encrypted messaging app that survives real-world conditions like dropped connections, delayed messages, and device changes.

### X3DH Key Agreement

X3DH solves a specific problem: how do two users establish a shared secret when one of them is offline? In a traditional Diffie-Hellman exchange, both parties must be online simultaneously. X3DH avoids this by having each user upload a bundle of prekeys to the server. When Alice wants to message Bob for the first time, she downloads Bob's prekey bundle and performs a series of Diffie-Hellman computations to derive a shared secret, all without Bob being online.

A prekey bundle contains three things: Bob's long-term identity key (a Curve25519 key pair that never changes), a signed prekey (rotated periodically, typically every 1 to 2 weeks), and a set of one-time prekeys (each used exactly once, then discarded). The one-time prekeys provide forward secrecy for the initial message. If they run out, the protocol falls back to using just the signed prekey, which is slightly less secure but still functional. Your server needs to track one-time prekey inventory per user and prompt clients to upload fresh batches when supply runs low. A good target is keeping 100 one-time prekeys available per device.

### The Double Ratchet Algorithm

Once X3DH establishes the initial shared secret, the Double Ratchet takes over for all subsequent messages. It combines two ratcheting mechanisms. The symmetric-key ratchet derives a new message key for every single message using a KDF (Key Derivation Function) chain. Even if an attacker compromises one message key, they cannot derive past or future keys. The Diffie-Hellman ratchet performs a new DH exchange every time the conversation direction changes (Alice sends, then Bob replies). This creates a new root key, which resets the symmetric chain entirely.

The result is both forward secrecy (compromising current keys does not expose past messages) and future secrecy, sometimes called "break-in recovery" (if an attacker gains temporary access to key material, they lose access once a new DH ratchet step occurs). No other widely deployed protocol offers both properties simultaneously. This is why Signal's approach became the industry default.

Implementation detail that trips up many teams: message ordering. The Double Ratchet handles out-of-order messages by maintaining a window of skipped message keys. If message #5 arrives before message #4, the client stores the derived key for #4 so it can decrypt it when it eventually arrives. You need to decide how large this window should be. Too small and you drop messages on flaky networks. Too large and you consume excessive memory. Signal's reference implementation uses a window of 2000, which is a reasonable starting point.

## Real-Time Transport: WebSockets, MQTT, and Connection Management

Encryption handles the "what" of secure messaging. Real-time transport handles the "how." You need a persistent, bidirectional connection between each client and your server so that messages arrive instantly rather than on the next polling interval. The two dominant choices are WebSockets and MQTT, and the right pick depends on your platform targets and scale requirements.

### WebSockets for Web and Mobile

WebSockets give you a full-duplex TCP connection over HTTP. Every modern browser supports them. React Native, Flutter, and native iOS/Android SDKs all have mature WebSocket libraries. For most encrypted messaging apps targeting under 500,000 concurrent connections, WebSockets are the straightforward choice. You establish the connection on app launch, send and receive encrypted payloads as binary frames, and handle reconnection when the network drops.

On the server side, you need a WebSocket gateway that authenticates connections, routes messages to the correct recipient's socket, and queues messages for offline users. Node.js with the **ws** library handles this well for moderate scale. For higher concurrency, Elixir with Phoenix Channels or Go with **gorilla/websocket** are significantly more efficient per server instance. A single Elixir node can handle over 2 million concurrent WebSocket connections with proper tuning, which is relevant when you are estimating infrastructure costs.

### MQTT for IoT and Constrained Networks

MQTT is a lightweight publish/subscribe protocol designed for low-bandwidth, high-latency networks. Facebook Messenger originally used MQTT for mobile because it reduces battery drain and data usage compared to WebSockets. If your app targets emerging markets with unreliable 2G/3G networks, or if you are building for IoT devices, MQTT deserves serious consideration. Brokers like **EMQX** and **VerneMQ** scale to millions of connections and support TLS natively.

Regardless of transport, you need robust reconnection logic. Mobile networks are unreliable. Users move between Wi-Fi and cellular. Your client must detect disconnections within seconds (using ping/pong frames for WebSockets or keepalive packets for MQTT), re-establish the connection, re-authenticate, and request any messages that arrived during the gap. This "catch-up" mechanism is critical: your server must store encrypted messages for offline recipients and deliver them in order upon reconnection. Most teams implement a message queue per user, backed by Redis or a similar in-memory store with disk persistence, that flushes when the client reconnects and acknowledges receipt.

![Global network visualization representing real-time encrypted data transmission across connected devices](https://images.unsplash.com/photo-1451187580459-43490279c0fa?w=800&q=80)

If you have already built real-time features before, much of this will be familiar. For a deeper walkthrough of connection management patterns, presence detection, and scaling strategies, check out our [complete guide to real-time features](/blog/real-time-features-guide).

## Message Storage, Metadata Protection, and Disappearing Messages

With E2EE, your server stores ciphertext. It cannot read message content. But metadata, specifically who messaged whom, when, how often, and from which IP address, is just as sensitive as content in many threat models. The NSA's former director Michael Hayden once said, "We kill people based on metadata." If your users care enough about privacy to want E2EE, you should take metadata protection seriously too.

### Server-Side Storage

Encrypted messages need to persist on the server only until the recipient downloads them. Once a client acknowledges receipt, the server should delete the ciphertext. This is the "store and forward" model. For multi-device support (a user logged in on their phone and laptop simultaneously), you need to store a separate encrypted copy per device, since each device has its own session keys. A message sent to a user with 3 devices generates 3 ciphertext blobs. Storage costs scale linearly with device count per user.

For the message queue itself, PostgreSQL with a simple table (message_id, recipient_device_id, ciphertext, timestamp, delivered) works for apps under 100,000 active users. Above that, consider a dedicated message broker like Apache Kafka or Amazon SQS for the queuing layer, with a separate data store for undelivered messages. Redis is fast but risky for persistence if a node crashes before flushing to disk. If you go the Redis route, enable AOF (Append Only File) persistence and replicate across at least two nodes.

### Metadata Protection

Minimizing metadata means limiting what your server can observe. Practical steps include: padding messages to a uniform size so the server cannot infer content type (text vs. image) from payload length, using fixed-interval message batching so traffic analysis cannot determine exact send times, rotating sender IP addresses through a VPN or Tor integration for the most sensitive use cases, and separating the authentication service from the message relay service so no single server component knows both the user identity and their conversation partners.

Signal goes further with "sealed sender," a mechanism where even the server does not learn who sent a given message. The sender encrypts the message envelope (including their own identity) with the recipient's identity key. The server routes the message based on the recipient identifier but cannot see the sender. This is an advanced feature, and it requires careful implementation to prevent spam (since the server cannot rate-limit by sender), but it is worth considering for high-security applications.

### Disappearing Messages

Disappearing messages are a client-side feature, not a cryptographic one. The sender specifies a timer (5 seconds, 1 hour, 1 week), and the recipient's client deletes the message after the timer expires. This is enforced by the app, not the protocol. A determined user can always screenshot or photograph the screen. But disappearing messages reduce the surface area for data exposure if a device is lost or seized. Implementation is straightforward: store the expiration timestamp in the message payload (inside the encrypted content, so the server cannot see it), and run a periodic cleanup job on the client that deletes expired messages from the local database.

## Group Messaging and Encrypted Media Sharing

Group messaging is where E2EE gets significantly more complex. In a 1-on-1 conversation, you manage one session with one ratchet state. In a group of 50 people, a naive approach would require the sender to encrypt the message 49 times, once for each recipient's session. This works for small groups but does not scale to hundreds or thousands of members.

### Sender Keys for Group Messaging

Signal's approach for groups uses "Sender Keys." Each group member generates a sender key and distributes it to every other member via their existing pairwise E2EE channels. When Alice sends a message to the group, she encrypts it once with her sender key, and every group member who has her sender key can decrypt it. This is dramatically more efficient: O(1) encryption per message instead of O(n).

The tradeoff is that sender keys do not ratchet per-message the way the Double Ratchet does. If a member's sender key is compromised, all future messages from that member in the group are readable until the key is rotated. Signal mitigates this by rotating sender keys when a member is removed from the group or when a member's device list changes. Your implementation needs to handle key rotation events gracefully, which means re-distributing new sender keys to all remaining members through their pairwise sessions.

An alternative approach, used by Matrix/Element, is the Megolm protocol. Megolm uses a ratchet for group sessions but only ratchets forward (no backward secrecy). Each sender maintains an outbound Megolm session, shares the session key with group members via Olm (Matrix's 1-on-1 protocol, similar to Signal's), and ratchets forward with each message. New members joining the group cannot decrypt messages sent before they joined, which is a desirable property. Megolm is well-documented and the **libolm** library provides a C implementation with bindings for JavaScript, Python, and other languages.

### Encrypted Media: Photos, Videos, and Files

You cannot encrypt a 50 MB video the same way you encrypt a text message. The payload is too large to send through your message relay. The standard approach is: generate a random AES-256 key, encrypt the media file with that key, upload the encrypted blob to a storage service (S3, GCS, or your own CDN), and send the AES key plus the download URL inside the E2EE message. The recipient downloads the encrypted blob and decrypts it locally.

This pattern keeps your media storage server ignorant of the content. It also lets you use standard CDN caching and edge distribution for encrypted blobs without compromising security. The file is useless without the key, and the key travels only through the encrypted message channel. For images specifically, you should generate an encrypted thumbnail alongside the full-resolution image so the chat UI can display previews without downloading the entire file. Keep thumbnails under 10 KB for fast rendering on slow connections.

One detail many teams miss: you need to verify file integrity after download. Include a SHA-256 hash of the encrypted blob in the message payload. The recipient computes the hash of the downloaded file and compares. If they do not match, the file was tampered with in transit or storage, and the client should refuse to decrypt it. This is basic authenticated encryption, but it is easy to overlook when you are focused on the E2EE layer.

![Developer writing encryption and messaging application code on a dark-themed code editor](https://images.unsplash.com/photo-1555949963-ff9fe0c870eb?w=800&q=80)

## Push Notifications, Tech Stack, and Multi-Device Support

Push notifications are a privacy minefield in an E2EE app. Both Apple's APNs and Google's FCM require you to send notification payloads through their servers. If you include the message content or sender name in the push payload, Apple and Google can read it. Every E2EE messaging app handles this differently, and none of the solutions are perfect.

### Push Notification Strategies

The simplest approach is a "content-free" notification: send a silent push that simply tells the app "you have a new message." The app wakes up, connects to your server, downloads and decrypts the message, and displays a local notification with the actual content. This works reliably on Android but is restricted on iOS, where Apple limits background execution time and can throttle silent pushes. Signal uses this approach on Android and supplements it with a foreground service for reliability.

For iOS, you can use Apple's Notification Service Extension (NSE), a small process that runs when a push arrives and can modify the notification content before it is displayed. The flow: your server sends a push containing an encrypted payload (encrypted with a notification-specific key). The NSE decrypts it and replaces the notification content with the plaintext. The plaintext never passes through Apple's servers. This is the approach WhatsApp and Signal use on iOS. The NSE has tight memory limits (around 24 MB) and a short execution window (about 30 seconds), so you need to keep the decryption logic lean.

### Choosing Your Tech Stack

You have two realistic paths for building an encrypted messaging app. The first is to build on top of the **Matrix protocol** using the Element SDK. Matrix is an open, federated protocol for real-time communication with E2EE baked in via Olm and Megolm. The Element SDK (available for Web, iOS, and Android) gives you a full-featured chat client that you can customize. This path is faster, typically 3 to 4 months to a production app, but you are constrained by Matrix's architecture and federation model. If you need a standalone app without federation, you will spend time disabling features you do not need.

The second path is building from scratch using **libsignal** (available in Rust with bindings for Java, Swift, and TypeScript) for encryption, your own WebSocket or MQTT server for transport, and a custom backend for user management and message relay. This gives you full control over the architecture but takes 6 to 9 months for a competent team of 3 to 4 engineers. If you are building something that needs to feel like a differentiated product rather than a Matrix client with a custom skin, this is the path. For context on how [authentication flows work in security-focused apps](/blog/how-to-build-secure-authentication), that guide covers token management and session security in depth.

### Multi-Device Support

Supporting multiple devices per user (phone, tablet, desktop) is one of the hardest parts of an E2EE messaging app. Each device has its own identity key pair and its own sessions with every contact. When Alice sends a message to Bob, who has 3 devices, Alice's client must encrypt the message separately for each of Bob's devices. Alice also needs to encrypt a copy for each of her own other devices so they can display the sent message.

Device management requires a server endpoint where clients register new devices and query the device list for any user. When a new device is added, existing sessions do not automatically extend to it. The new device must establish new sessions (via X3DH) with every contact. Some apps require identity verification when a new device is added (Signal shows a "safety number changed" warning) to prevent man-in-the-middle attacks where a server operator adds a rogue device to intercept messages.

For message history on new devices, you have two options: no history (the new device only sees messages received after setup) or encrypted backup transfer (the existing device exports the message database, encrypts it with a key derived from a QR code or PIN, and transfers it to the new device via a local connection or cloud storage). Signal uses the first approach. WhatsApp uses encrypted cloud backups. The backup approach is more user-friendly but introduces additional attack surface if the backup encryption key is weak.

## Costs, Timeline, and Getting Started

Building an encrypted messaging app is not a side project. Here is a realistic breakdown of what you are looking at in terms of budget, team, and timeline, based on apps we have helped teams ship.

### Development Costs

An MVP with 1-on-1 E2EE messaging, basic group chat, media sharing, and push notifications on iOS and Android will run **$150,000 to $300,000** with a team of 3 to 4 senior engineers over 5 to 7 months. Using the Matrix/Element SDK as a foundation can cut this to **$80,000 to $150,000** and 3 to 4 months, but you sacrifice control over the protocol layer and are locked into Matrix's federation model unless you invest effort in stripping it out.

If you are building a Signal-level product with sealed sender, disappearing messages, encrypted backups, video calling, and a desktop client, expect **$500,000 to $1,000,000+** and 12 to 18 months. That is not an exaggeration. Signal's team has been working on their app for over a decade, and they are backed by a nonprofit with substantial funding.

### Infrastructure Costs

Monthly infrastructure for 100,000 monthly active users typically runs $2,000 to $5,000. That covers WebSocket/MQTT servers (2 to 4 instances on AWS or GCP), a PostgreSQL or ScyllaDB cluster for message queuing, S3 or GCS for encrypted media storage, and a Redis cluster for presence and session state. The largest cost driver is media storage and CDN bandwidth. Encrypted media files are larger than plaintext (no server-side compression possible), so budget for roughly 20% more storage and bandwidth than a comparable unencrypted app.

Push notifications via APNs and FCM are free at moderate scale. TURN servers for voice/video calling (if you add that feature later) cost $500 to $2,000/month depending on usage. For a detailed breakdown of [building real-time video and calling features](/blog/how-to-build-a-video-calling-app), we covered the full WebRTC architecture separately.

### Key Vendor and Library Choices

- **libsignal** (Rust, Java, Swift, TypeScript): the gold standard for Signal Protocol implementation. Maintained by the Signal Foundation. Use this unless you have a compelling reason not to.

- **Matrix SDK / Element SDK** (Web, iOS, Android): full-featured E2EE chat framework. Best for teams that want to ship fast and can accept the Matrix protocol's tradeoffs.

- **Firebase Cloud Messaging + APNs**: push notification delivery. Free tier handles millions of notifications per month.

- **Ably or Pusher**: managed WebSocket infrastructure if you do not want to operate your own connection servers. Adds $200 to $2,000/month depending on connection count.

- **AWS KMS or HashiCorp Vault**: server-side key management for signing keys and server certificates. Do not store cryptographic keys in environment variables or config files.

### Where to Start

Start with the encryption layer, not the UI. Get X3DH key exchange and Double Ratchet messaging working between two test clients on a local server before you write a single line of frontend code. If the crypto foundation is wrong, everything built on top of it is compromised. Use libsignal's test vectors to verify your integration. Run the protocol against known-answer tests before moving to network transport.

From there, layer on WebSocket transport, then message persistence, then group messaging, then media sharing. Each layer depends on the one below it. Trying to parallelize crypto work and UI work too early leads to integration headaches when the message format changes (and it will change multiple times during development).

If you are serious about building an encrypted messaging app and want help architecting the system correctly from day one, we have done this before. [Book a free strategy call](/get-started) and we will walk through your requirements, threat model, and the fastest path to a production-ready product.

---

*Originally published on [Kanopy Labs](https://kanopylabs.com/blog/how-to-build-an-encrypted-messaging-app)*
