Why SMS Still Matters for AI in 2027
Every year someone declares SMS dead. Every year the data proves them wrong. Over 5 billion people worldwide can send and receive text messages. Roughly 3.5 billion of them do not have reliable mobile internet access. In rural healthcare clinics, remote agricultural regions, and field operations across dozens of industries, SMS remains the only digital channel that consistently works.
If you are building an AI assistant and your only interface is a web app or a mobile app, you are ignoring the majority of the planet. More practically, you are ignoring high-value use cases where connectivity is unreliable: community health workers in sub-Saharan Africa tracking patient symptoms, farmers in Southeast Asia checking crop disease recommendations, utility field crews reporting equipment status from mountainous terrain with no cell data coverage.
SMS has properties that make it uniquely suited for AI assistants in these environments. Messages are store-and-forward, meaning they get delivered even when connectivity is intermittent. Every phone supports SMS, from a $15 feature phone to the latest iPhone. There is no app to download, no login to remember, no software updates to manage. Your user sends a text, and your AI responds with a text. That simplicity is a feature, not a limitation.
The challenge is that SMS was designed for humans sending short messages to other humans, not for stateful AI conversations. Character limits, session management, message threading, and compliance regulations all create real engineering problems. This guide walks through every one of them.
Choosing Your SMS Gateway: Twilio, MessageBird, and Vonage
Your SMS gateway is the bridge between your AI backend and the cellular network. This choice affects your cost structure, geographic reach, compliance tooling, and development speed. Three providers dominate in 2027, and each has clear strengths.
Twilio
Twilio is the default for most development teams, and for good reason. Their API documentation is exceptional. Their webhook model makes it straightforward to receive inbound SMS and respond programmatically. Pricing runs around $0.0079 per outbound SMS in the US, with inbound messages at $0.0075. Twilio handles 10DLC registration directly in their console, and they support shortcodes, toll-free numbers, and international sending across 180+ countries.
The main drawback is cost at scale. Once you are sending millions of messages per month, Twilio's per-message pricing adds up quickly, and their volume discounts require significant commitment.
MessageBird (now Bird)
MessageBird rebranded to Bird and expanded beyond SMS into a broader communications platform. Their strength is European and Asian market coverage. If your SMS AI assistant targets users in the EU, Southeast Asia, or Africa, Bird often has better local routes and lower latency. Their API is clean, though the documentation is not as thorough as Twilio's. Pricing is competitive, often 10 to 15% cheaper per message for non-US destinations.
Vonage (formerly Nexmo)
Vonage sits between Twilio and Bird. Their Messages API supports SMS, MMS, WhatsApp, and Viber through a unified interface, which is valuable if you plan to expand beyond SMS later. Pricing is comparable to Twilio. Their 10DLC support has improved significantly in the past year, though the registration process is still slightly clunkier than Twilio's.
Which One to Pick
For US-focused projects, start with Twilio. The developer experience saves you days of integration time, and the ecosystem (tutorials, Stack Overflow answers, community libraries) is unmatched. For international deployments, evaluate Bird and Vonage alongside Twilio by sending test messages to your target countries and comparing delivery rates. Some carriers in Africa and South Asia have significantly different delivery performance across providers.
Regardless of provider, your architecture should abstract the SMS gateway behind an interface. Wrap the provider SDK in your own service class so you can swap providers without touching your AI logic. You will thank yourself later when pricing changes or delivery issues force a migration.
Session Management for Stateless SMS Conversations
Here is the core engineering challenge that separates SMS AI assistants from web-based AI chatbots: SMS is stateless. There is no persistent WebSocket connection, no session cookie, no authentication token. Each inbound message arrives as an independent HTTP webhook with a phone number and a text body. That is it.
Your AI assistant needs conversation context to be useful. If a farmer texts "what should I spray?" you need to know they mentioned corn blight three messages ago. Building this session layer is the first real architecture decision you face.
Phone Number as Session Key
The simplest approach is to use the sender's phone number as a session identifier. When a message arrives, look up the phone number in your session store, retrieve the conversation history, append the new message, send it all to the LLM, and store the response. This works for single-user scenarios where one phone number equals one user.
Session Expiration
Unlike web sessions where the user explicitly logs out, SMS conversations just stop. You need a timeout policy. For most use cases, expire sessions after 30 minutes of inactivity. For healthcare and field operations, consider longer windows of 2 to 4 hours, since users might step away and return to the same issue. When a session expires, the next message starts a fresh conversation. Let the user know: "Starting a new conversation. Text HISTORY to see your previous sessions."
Session Store Options
Redis is the obvious choice for session storage. It handles key-value lookups with TTL (time-to-live) expiration natively, runs fast enough to stay within SMS response time expectations, and can persist to disk for durability. For smaller deployments, DynamoDB with TTL works well and requires zero infrastructure management. Avoid storing sessions in your primary PostgreSQL database. The write pattern (every single message updates the session) creates unnecessary load on a database better suited for transactional data.
Conversation Threading
SMS does not have native threading like Slack or iMessage. Every message in a conversation appears as a flat list. To help users navigate multi-topic conversations, implement keyword-based thread management. Let users text "NEW" to start a fresh topic, "BACK" to return to a previous topic, or a topic number to resume a specific thread. Store each thread separately in your session store, with the active thread ID tracked at the session level.
Keep thread management simple. Your users are on SMS because they want simplicity. If your threading system requires a user manual, you have over-engineered it.
LLM Integration with SMS Character Limits
A standard SMS message is 160 characters (GSM-7 encoding) or 70 characters for Unicode (non-Latin scripts like Arabic, Chinese, or emoji-heavy text). Modern carriers support concatenated SMS, which splits longer messages into multiple segments and reassembles them on the receiving phone. Most carriers support up to 10 segments (1,600 characters for GSM-7), though some cap at 6.
LLMs, by default, are verbose. Ask Claude or GPT-4 a question and you will get 200 to 500 words back. That is fine for a chat widget. Over SMS, it translates to 5 to 10 message segments, each billed separately, and a wall of text that is painful to read on a small screen.
Constraining LLM Output
Your system prompt needs to enforce brevity aggressively. Something like: "You are an SMS assistant. Respond in 300 characters or fewer. Use short sentences. No bullet points. No headers. If the answer requires more detail, give the key point first and ask if the user wants more." Test this with your specific LLM. Claude tends to respect character limits more reliably than GPT-4, but both need explicit instruction and occasional reminder prompts.
Multi-Part Response Strategy
For complex answers that genuinely need more than 300 characters, implement a pagination pattern. Send the first part with "(1/3)" appended, followed by the second and third parts in sequence. Add a 1 to 2 second delay between segments so they arrive in order. Most carriers deliver concatenated messages correctly, but rapid-fire individual messages can arrive out of sequence.
Handling Inbound Long Messages
Users sometimes send long messages too, especially when describing symptoms, equipment problems, or field conditions. Your SMS gateway handles concatenation on the inbound side automatically. Twilio, Bird, and Vonage all reassemble multi-segment inbound messages into a single webhook payload. You get the full text regardless of how many segments it required.
Language and Encoding
If your users text in Spanish, Hindi, Arabic, or Mandarin, your character budget drops dramatically due to Unicode encoding. Plan for 70 characters per segment instead of 160. This makes brevity even more critical. Consider using transliteration (Latin-script approximations) for languages where users are comfortable with it. Many Hindi speakers, for example, routinely text in Romanized Hindi, which uses GSM-7 encoding and gives you the full 160-character budget.
10DLC Compliance, TCPA, and Shortcodes vs Long Codes
Compliance is not optional, and getting it wrong can result in your messages being blocked by carriers or your business facing six-figure fines. Here is what you need to know.
10DLC Registration
10DLC (10-Digit Long Code) is the standard for application-to-person (A2P) messaging in the United States. Since 2023, all US carriers require 10DLC registration for businesses sending SMS via local phone numbers. The registration process involves two steps: brand registration (verifying your business identity) and campaign registration (describing your messaging use case). Twilio, Vonage, and Bird all handle this through their platforms. Expect the approval process to take 1 to 3 weeks.
Without 10DLC registration, your messages will be throttled to roughly 1 message per second per number, and carrier filtering will block a significant percentage of them. With registration, throughput jumps to 15 to 75 messages per second depending on your trust score.
TCPA Compliance
The Telephone Consumer Protection Act (TCPA) requires explicit consent before sending SMS messages to US phone numbers. For an AI assistant, this means the user must initiate the conversation or explicitly opt in before you send any message. Implement clear opt-in flows: the user texts a keyword (like "START" or "HELP") to your number, and you respond with a confirmation and instructions for opting out. Always honor "STOP" messages immediately and confirm the opt-out.
Store consent records with timestamps. If a user claims they never opted in, you need proof. Log every opt-in and opt-out event with the phone number, timestamp, and the message that triggered it.
Shortcodes vs Long Codes vs Toll-Free
Shortcodes (5 to 6 digit numbers like 12345) offer the highest throughput (hundreds of messages per second) and best deliverability. They cost $1,000 to $1,500 per month to lease, plus a $2,000 to $5,000 setup fee, and approval takes 8 to 12 weeks. Use shortcodes if you are sending more than 100,000 messages per month or need guaranteed delivery for critical communications like healthcare alerts.
Long codes (standard 10-digit numbers) with 10DLC registration are the sweet spot for most AI assistant projects. They cost $1 to $2 per month per number, support adequate throughput for conversational use cases, and are ready in 1 to 3 weeks.
Toll-free numbers (800, 888, etc.) sit in between. They offer better throughput than long codes (around 30 messages per second after verification), cost $2 to $3 per month, and verification takes about 1 week. A solid choice for medium-volume applications.
International Compliance
Outside the US, regulations vary dramatically. The EU's GDPR adds data protection requirements on top of consent rules. India's TRAI regulations require entity registration and template approval for every message. Brazil, Nigeria, and the Philippines all have their own frameworks. If you are deploying internationally, budget 2 to 4 weeks per country for compliance research and registration.
MMS, Media Handling, and Rich Responses
SMS is text-only, but MMS (Multimedia Messaging Service) lets you send and receive images, audio, and short video clips. For an AI assistant, this opens up powerful use cases that pure text cannot handle.
Inbound Media Processing
When a user sends a photo via MMS, your gateway provides a URL to the media file in the webhook payload. Download it, store it in S3 or GCS, and pass it to your AI pipeline. Common use cases include a farmer photographing a diseased crop leaf for identification, a field technician snapping a photo of a serial number plate, or a patient sending an image of a wound or rash for triage guidance.
Use a vision-capable LLM (Claude's vision API or GPT-4V) to analyze inbound images. The workflow is: receive MMS webhook, download media, send the image plus conversation context to the vision model, and return a text-based SMS response. The user sends a photo, your AI analyzes it, and responds with actionable text advice.
Outbound Media
Sending images via MMS costs more than plain SMS (roughly $0.02 to $0.04 per message on Twilio vs $0.008 for SMS). Use MMS judiciously. Good use cases include sending a diagram showing proper equipment assembly, a map image with directions to a service point, or a chart summarizing a patient's health trend over time. Bad use cases include sending your company logo with every response or decorative images that add no information.
MMS Limitations
MMS support varies by carrier and country. In the US and Canada, MMS works reliably across all major carriers. In many other countries, MMS is unreliable or unsupported entirely. For international deployments, default to text-only responses and use MMS only when you have confirmed carrier support in the target region. If your users need rich media and MMS is unavailable, consider sending a shortened URL to a lightweight mobile web page as a fallback.
Use Cases: Healthcare, Agriculture, and Field Operations
The strongest SMS AI assistant use cases share three characteristics: the users are in low-connectivity environments, the information is time-sensitive, and the interaction pattern is short question-and-answer exchanges.
Healthcare Reminders and Triage
Community health workers (CHWs) in rural clinics use SMS AI assistants to check drug interaction information, report patient symptoms for remote physician review, and receive protocol-based triage guidance. A CHW texts "patient female 34 fever 3 days cough blood" and receives a structured triage recommendation with urgency level and next steps. Medication adherence programs use scheduled SMS to remind patients to take medications, with the AI handling responses like "I ran out" or "having side effects" by escalating appropriately.
For healthcare SMS, you must be explicit about limitations. The system prompt should include clear statements that the AI does not provide medical diagnoses and that users should seek in-person care for emergencies. Store all conversations for audit purposes and ensure your infrastructure meets HIPAA requirements if operating in the US.
Agriculture and Crop Advisory
Smallholder farmers in East Africa, South Asia, and Latin America use SMS-based advisory services to get planting recommendations, pest identification help, weather-based advice, and market pricing information. Integrating your AI with local weather APIs and crop databases lets a farmer text "when to plant maize in Nakuru" and receive a response tailored to their specific region, elevation, and current weather patterns.
These services have proven ROI. Studies from Kenya's iCow and India's Kisan Call Center show that farmers with access to SMS-based advisory services increase yields by 10 to 30%. An AI layer makes these services scalable without requiring armies of call center agents.
Field Worker Operations
Utility companies, oil and gas operators, and logistics firms deploy field workers to locations with unreliable data connectivity. SMS AI assistants let workers report equipment status, request part numbers, check maintenance procedures, and log safety incidents. The AI can cross-reference a reported issue against maintenance manuals and provide troubleshooting steps immediately, rather than requiring the worker to call dispatch and wait on hold.
For field operations, integrate your SMS assistant with your multi-agent AI system so that reported issues automatically create tickets in your work order system, notify supervisors, and update asset records.
Offline-First Design Patterns
Building for low-connectivity is different from building for no-connectivity. Your architecture needs to handle both gracefully.
Message Queuing
SMS messages sent to your number while your server is down get queued by the carrier and your SMS gateway. Twilio, for example, retries webhook delivery for up to 24 hours. Design your system to handle burst processing: when your server comes back online, it may receive hundreds of queued messages simultaneously. Use a message queue (SQS, RabbitMQ, or BullMQ) between your webhook endpoint and your AI processing layer to absorb these bursts without overwhelming your LLM API.
Response Caching
Many SMS AI assistants serve communities where users ask similar questions repeatedly. "What is the current maize price in Kampala?" or "What are the side effects of metformin?" Cache responses for common queries using a semantic similarity check. When an inbound message is semantically similar (cosine similarity above 0.92) to a recently answered question, return the cached response instead of making a fresh LLM call. This reduces cost, improves response time, and keeps the system working even if your LLM provider has an outage.
Graceful Degradation
When your LLM API is unavailable, your system should not go silent. Implement a fallback chain: first try your primary LLM, then a secondary provider, then cached responses, and finally a static response like "Our system is temporarily unavailable. Please try again in a few minutes. For emergencies, call [number]." The user should always get some response. Silence erodes trust faster than an honest "try again later."
Local Processing
For deployments in regions with extremely unreliable internet on the server side, consider running a smaller LLM locally. Models like Llama 3 8B or Mistral 7B can run on modest hardware (a single GPU server or even a high-end CPU). The response quality will not match Claude or GPT-4, but for structured, domain-specific queries with a well-tuned system prompt and RAG pipeline, a local model can handle 70 to 80% of requests without any internet dependency.
Cost Per Message Analysis and Optimization
Understanding your true cost per AI-powered SMS interaction is essential for pricing your service and projecting margins. Here is the breakdown.
Direct Messaging Costs
On Twilio with a 10DLC long code in the US: $0.0079 outbound plus $0.0075 inbound per segment. A typical AI conversation involves 3 to 5 exchanges (user sends, AI responds, user follows up, AI responds, user confirms). That is roughly 8 to 10 message segments total, costing $0.06 to $0.08 in pure messaging fees. Internationally, costs vary widely. Sending to Kenya costs about $0.04 per segment. India runs $0.02 to $0.03. European countries range from $0.05 to $0.10.
LLM API Costs
Each AI response requires an LLM API call. With conversation context, a typical SMS interaction sends 500 to 1,500 input tokens and receives 50 to 150 output tokens (remember, you are constraining output length). Using Claude Sonnet at current pricing, that is roughly $0.002 to $0.005 per response. For a 5-exchange conversation, LLM costs total $0.01 to $0.025. If you use response caching, this drops further since cached responses cost zero in LLM fees.
Infrastructure Costs
A modest setup (a single application server, Redis for sessions, PostgreSQL for logging, S3 for media) runs $100 to $300 per month on AWS or GCP. At 10,000 conversations per month, that is $0.01 to $0.03 per conversation in infrastructure costs.
Total Cost Per Conversation
Adding it all up for a US-based deployment: $0.06 to $0.08 (SMS) plus $0.01 to $0.025 (LLM) plus $0.01 to $0.03 (infrastructure) equals $0.08 to $0.135 per conversation. Compare that to a human agent handling the same interaction at $5 to $15 per session, and the economics are compelling. Even at low volumes, an SMS AI assistant costs 50 to 100x less per interaction than human support.
Optimization Strategies
- Response caching: Eliminates LLM costs for repeated queries. In agricultural and healthcare use cases, 30 to 50% of queries are cacheable.
- Shorter responses: Every character saved reduces SMS segment count. Cutting average response length from 400 to 250 characters can eliminate one segment per response.
- Smaller models for simple queries: Route straightforward questions (operating hours, basic pricing, standard protocols) to Claude Haiku or GPT-4o-mini at 10 to 20% of the cost of the full model.
- Number pooling: For high-volume outbound messaging, distribute across multiple numbers to improve throughput without upgrading to shortcodes.
For voice AI applications, per-interaction costs are 5 to 10x higher than SMS due to telephony and speech-to-text costs. SMS is the most cost-effective AI delivery channel that exists.
Architecture, Tech Stack, and Timeline
Here is a production-ready architecture for an SMS AI assistant, along with realistic timelines and costs for each tier.
Recommended Tech Stack
- SMS Gateway: Twilio Programmable Messaging (or Bird/Vonage for international)
- Backend: Python with FastAPI or Node.js with Express. FastAPI is preferred if you are doing any ML or NLP processing alongside LLM calls.
- LLM: Claude Sonnet for primary responses, Claude Haiku for simple query routing
- Session Store: Redis with 30-minute TTL
- Database: PostgreSQL for conversation logs, user profiles, compliance records
- Queue: AWS SQS or BullMQ for inbound message processing
- Cache: Semantic response cache using pgvector or Pinecone
- Monitoring: Helicone or LangSmith for LLM observability, Datadog or Grafana for infrastructure
Project Tiers
MVP (3 to 5 weeks, $15K to $30K): Single-topic SMS AI assistant with Twilio integration, basic session management, LLM-powered responses constrained to SMS length, 10DLC registration, and opt-in/opt-out handling. Suitable for a pilot deployment with a few hundred users.
Production (6 to 10 weeks, $30K to $70K): Multi-topic support with conversation threading, MMS image processing, response caching, graceful degradation, multi-language support, analytics dashboard, and integration with one backend system (CRM, EHR, work order platform). Ready for thousands of active users.
Enterprise (10 to 16 weeks, $70K to $150K): Multi-country deployment with international carrier integrations, shortcode provisioning, HIPAA or GDPR compliance infrastructure, local LLM fallback, advanced analytics with A/B testing on response strategies, multi-agent orchestration, and white-label capabilities. Built for tens of thousands of users across multiple regions.
Getting Started
Start with the MVP. Pick one use case, one country, and one user group. Get 50 real users texting your AI assistant within 4 weeks. Watch the conversation logs. See where the AI fails. Fix those failure modes. Then expand to the next use case or region. The teams that succeed with SMS AI build iteratively, not ambitiously.
We have built SMS-based AI systems for healthcare providers, agricultural services, and field operations teams. If you are considering an SMS AI assistant for your users, Book a free strategy call and we will walk through the architecture, compliance requirements, and cost model for your specific use case.
Need help building this?
Our team has launched 50+ products for startups and ambitious brands. Let's talk about your project.