Why Apple Intelligence Is a Big Deal for App Developers
Apple Intelligence is not just another AI feature announcement. It is the first time Apple has shipped a complete, developer-facing framework for running language models directly on the device. That distinction matters. When you use the Foundation Models framework in your app, the model runs on the Apple Neural Engine. No network call. No API key. No per-token billing. The user's data never leaves their phone.
For years, mobile developers who wanted AI features had two options: send data to a cloud API like OpenAI or Anthropic and deal with latency, cost, and privacy concerns, or attempt to run a smaller open-source model on-device using Core ML and hope it was good enough. Apple Intelligence collapses that trade-off. You get a capable language model, fine-tuned for common app tasks, running locally with system-level optimization that third-party solutions cannot match.
The practical implications are significant. A fitness app can generate personalized workout summaries without sending health data to a server. A journaling app can offer writing suggestions that stay completely private. A customer support tool can draft responses in milliseconds, not seconds. And because the model is integrated at the system level, your app benefits from the same optimizations Apple applies to its own features like Mail summaries and notification digests.
This guide covers what you actually need to know to integrate the Apple Intelligence SDK into your iOS app. Not the marketing pitch from WWDC. The real APIs, device requirements, performance characteristics, and gotchas we have encountered shipping apps with these capabilities.
The Foundation Models Framework: Your Primary Interface
The Foundation Models framework is the core of Apple Intelligence for developers. Introduced alongside iOS 26, it gives you a Swift-native API for running language model inference directly on the Apple Neural Engine. If you have used OpenAI's SDK or Anthropic's API, the patterns will feel familiar, but with important differences that reflect the on-device execution model.
Basic Text Generation
At its simplest, you create a LanguageModelSession, provide a prompt, and get back a response. The API is async/await native, which means it integrates cleanly with SwiftUI's task modifiers and structured concurrency patterns. A basic call looks like creating a session, calling session.respond(to: "Your prompt here"), and reading the response text. The model begins generating tokens immediately with no cold start penalty after the first invocation, because Apple keeps the model warm in memory when Apple Intelligence is active.
Response times are impressive for on-device inference. In our testing on iPhone 16 Pro, simple generation tasks (summarizing a paragraph, extracting key points, classifying sentiment) complete in 200 to 800 milliseconds. Longer generation tasks, like drafting a multi-paragraph response, stream tokens at roughly 30 tokens per second. That is slower than a cloud API on a fast connection, but the consistency is better. No network variability, no cold starts, no rate limits.
Structured Output with @Generable
One of the most developer-friendly features is the @Generable macro. You annotate a Swift struct with @Generable, and the framework will generate instances of that struct from natural language input. This is Apple's answer to JSON mode in cloud APIs, but it is more tightly integrated. The model understands your struct's property names, types, and any documentation comments you add. You can also use the @Guide macro on individual properties to constrain the output, providing enum-like value lists or descriptive hints that steer the model.
For example, if you have a struct representing a recipe with properties for title, ingredients, cook time, and difficulty level, you annotate it with @Generable, add @Guide annotations to constrain difficulty to "easy," "medium," or "hard," and then call session.respond(to: prompt, generating: Recipe.self). You get back a typed Swift object, not a JSON string you need to parse and validate. This eliminates an entire category of bugs that plague cloud-based LLM integrations.
Streaming Responses
For longer outputs, you will want to stream. The framework provides a respondStreaming method that returns an AsyncSequence of partial results. Each partial result contains the text generated so far, so you can update your UI progressively. This is critical for user experience. Nobody wants to stare at a loading spinner for three seconds waiting for a complete response. Streaming makes the interaction feel responsive even when the full generation takes time.
One important detail: streaming on-device behaves differently than streaming from a cloud API. With cloud APIs, you receive tokens as they are generated on the server. With Foundation Models, the framework batches tokens and delivers them in small chunks for UI thread efficiency. The result looks the same to users, but your code should not assume one-token-at-a-time granularity.
Siri Integration with App Intents
Apple Intelligence transforms Siri from a voice command parser into something much closer to an actual assistant, and your app can be part of that. The App Intents framework lets you expose your app's functionality to Siri in a way that Apple Intelligence can reason about. When a user says "summarize my last workout in FitTrack," Siri uses Apple Intelligence to understand the request, identifies that your app's intent can fulfill it, and executes the action.
This is not the old SiriKit approach where you had to match rigid domain templates. App Intents are flexible. You define what your app can do, what parameters each action takes, and what it returns. Apple Intelligence handles the natural language understanding. Your app just needs to implement the intent and return a result.
Building an App Intent
An App Intent is a Swift struct conforming to the AppIntent protocol. You give it a title, optional description, and define its parameters using the @Parameter property wrapper. The perform() method contains your business logic. When Siri triggers the intent, your perform method runs and returns an IntentResult that Siri presents to the user.
The key to good Siri integration is making your intents discoverable and composable. Apple Intelligence works best when it can chain multiple intents together. If your app has separate intents for "get recent workouts," "summarize workout," and "share summary," Siri can compose them: "Summarize my last run and send it to my trainer." Each intent stays simple and focused, and the intelligence layer handles orchestration.
Spotlight and Semantic Search
App Intents also feed into Spotlight, which Apple Intelligence has supercharged with semantic search. When your app donates content to Spotlight using CSSearchableItem with rich attributes, Apple Intelligence can surface that content in response to natural language queries. A user searching for "that Italian recipe I saved last month" can find it even if the title is "Cacio e Pepe" because the semantic layer understands the relationship. For apps with significant content, like note-taking, recipe, or bookmark apps, this integration alone can justify the development investment.
We have found that apps with well-structured App Intents see measurably higher engagement through Siri and Spotlight. Users discover features they did not know existed simply because Siri can now surface them contextually. For a deeper look at the trade-offs between on-device and cloud-based AI approaches, see our comparison of on-device AI versus cloud AI.
Writing Tools API: System-Wide Text Intelligence
The Writing Tools API is one of the most underappreciated parts of Apple Intelligence for developers. If your app has any text input fields, you get basic Writing Tools support for free. Users can select text and access Proofread, Rewrite, and Summarize from the system context menu. But the real power comes when you adopt the WritingToolsCoordinator protocol and build custom behaviors.
Default Behavior and Custom Adoption
Any standard UITextView or NSTextView automatically supports Writing Tools. Users long-press or right-click selected text and see options to proofread, rewrite in different tones (friendly, professional, concise), or summarize. If you are using standard text components, you do not need to write a single line of code. This is Apple's "it just works" philosophy applied to AI.
For custom text editors or rich content views, you adopt WritingToolsCoordinator. This protocol gives you callbacks when the user invokes a writing tool, lets you provide the source text (which might differ from what is displayed, as in a Markdown editor), and receive the transformed result. You control how the result is applied to your content model. This is essential for apps with custom text rendering, attributed strings, or non-standard editing surfaces.
Building Custom Writing Tool Experiences
Beyond the system defaults, you can register custom writing tool actions that are specific to your app's domain. A legal document app might offer "Simplify Legal Language" as a writing tool. An email client might offer "Make More Diplomatic." You define these using the Writing Tools API and they appear alongside Apple's built-in options in the context menu.
The implementation uses the Foundation Models framework under the hood, so your custom writing tools run on-device with the same privacy guarantees. You provide a prompt template, define input and output expectations, and the framework handles model invocation and result delivery. The user experience feels native because it is native. There is no web view, no loading screen pointing to a remote server, no "AI powered by" badge. It just looks like part of the operating system.
For apps that handle sensitive text, like healthcare, legal, or financial apps, this is a compelling feature. You can offer AI-powered writing assistance with a straight face in a compliance review because no data leaves the device. That is a conversation-ending advantage over cloud-based alternatives.
On-Device Processing vs. Private Cloud Compute
Not everything runs on the device. Apple Intelligence uses a tiered architecture. Simple tasks (summaries, classification, short generation) run entirely on the Apple Neural Engine. More complex tasks, like processing very long documents or performing multi-step reasoning, can be routed to Private Cloud Compute (PCC), Apple's server-side AI infrastructure.
Here is what developers need to understand about this split:
- You do not control routing. When you use the Foundation Models framework, Apple decides whether a request runs on-device or on PCC based on the complexity of the task and the device's capabilities. Your API calls look identical either way.
- PCC maintains privacy guarantees. Apple designed PCC so that data is processed in a secure enclave, never stored, never logged, and never accessible to Apple employees. Independent audits and publicly inspectable server images back this up. It is not the same as sending data to a third-party API.
- Latency differs. On-device inference starts generating tokens in under 100 milliseconds. PCC requests add network round-trip time, typically 200 to 500 milliseconds before the first token. Your UI should handle both scenarios gracefully.
- Availability differs. On-device processing works in airplane mode. PCC requires a network connection. If you are building features for users who might be offline (field workers, travelers, rural areas), design your UX to degrade gracefully when PCC is unavailable.
For most app developers, the practical takeaway is simple: use the Foundation Models framework, design your prompts to be concise, and trust Apple's routing. If you need guaranteed on-device execution (for regulatory or compliance reasons), you can check the ModelAvailability API to verify that the on-device model can handle your request before sending it. If you want a broader perspective on when on-device AI makes sense versus cloud alternatives, our on-device AI vs. cloud AI guide covers the decision framework in detail.
One thing worth noting: Private Cloud Compute is only available for requests made through Apple's frameworks. You cannot use PCC as a general-purpose cloud inference endpoint. It is exclusively a fallback for Apple Intelligence tasks that exceed on-device capacity.
Device Requirements and Implementation Constraints
Apple Intelligence is not available on every device, and this has real implications for how you architect your app. Here are the hard requirements as of iOS 26:
- iPhone: iPhone 16 and later (A18 chip minimum). iPhone 15 Pro and 15 Pro Max (A17 Pro) are also supported. Standard iPhone 15 and earlier are not.
- iPad: M1 chip or later. This excludes many iPads still in active use, including recent base-model iPads with A-series chips.
- Mac: M1 or later. Intel Macs are completely unsupported.
- RAM: 8 GB minimum. This is the real bottleneck. The on-device model requires significant memory, and Apple reserves capacity for the model even when your app is not actively using it.
What this means in practice: a substantial portion of your user base probably cannot use Apple Intelligence features. As of late 2028, roughly 45 to 55 percent of active iPhones in the US support Apple Intelligence. Globally, the percentage is lower. You must build every Apple Intelligence feature with a fallback path for unsupported devices.
Checking Availability at Runtime
The framework provides clean availability checks. Use LanguageModelSession.isAvailable to verify that the device supports Apple Intelligence and the user has enabled it (it is opt-in during device setup). Do not assume availability based on device model alone. Users can disable Apple Intelligence in Settings, and enterprise-managed devices may have it restricted by MDM policy.
Your UI should adapt, not break. If Apple Intelligence is unavailable, hide AI-specific features or replace them with non-AI alternatives. Do not show a grayed-out "Summarize" button with a tooltip saying "Requires iPhone 16." That is a bad experience. Either the feature works or it does not appear.
Memory and Performance Considerations
On-device model inference consumes significant memory and compute resources. During active inference, expect your app's memory footprint to increase by 100 to 200 MB temporarily. The system manages this aggressively, so you will not see an out-of-memory crash, but your app may experience increased background termination rates if you are already memory-heavy.
Battery impact is measurable but manageable for typical usage patterns. A single summarization task consumes roughly the same energy as loading a complex web page. But if your app triggers dozens of inference requests per session, battery drain becomes noticeable. Profile your energy usage with Instruments and batch inference requests where possible.
Thermal throttling is the constraint most developers overlook. If the device is already warm (from gaming, GPS navigation, or heavy camera use), the Neural Engine may throttle, and inference times can double or triple. Always set reasonable timeouts and provide feedback to users when processing takes longer than expected.
App Store Review and Responsible AI Practices
Apple has specific App Store Review Guidelines for apps that use Apple Intelligence features, and they are stricter than you might expect. Here are the key rules as of 2028:
- Transparency: If your app generates text, images, or other content using AI, you must make this clear to users. This applies to Foundation Models framework usage, not just third-party AI APIs. A small "Generated with AI" label or similar indicator is sufficient.
- User control: Users must be able to opt out of AI features without losing access to core app functionality. If your app is primarily an AI tool, this does not apply, but if you are adding AI features to an existing app, they must be optional.
- No misleading claims: Do not market AI-generated content as human-created. Do not imply that your app's AI capabilities exceed what Apple Intelligence actually provides. Apple reviewers will test your AI features and reject apps that overstate their capabilities.
- Content safety: Apple Intelligence has built-in content safety filters. Do not attempt to bypass them through prompt engineering or by processing model output to remove safety markers. Apple considers this a policy violation.
Prompt Engineering Best Practices
The on-device model is smaller and more focused than models like GPT-4o or Claude. Your prompts need to be more specific and structured than what you might use with a cloud API. Vague prompts produce vague results. Here is what works:
- Be explicit about format. Instead of "summarize this," say "summarize this in exactly three bullet points, each under 20 words."
- Provide context through system instructions. The LanguageModelSession accepts a system prompt that persists across the conversation. Use it to establish your app's domain and constraints.
- Use @Generable for structured tasks. If you need structured output, always use the @Generable macro instead of asking the model to generate JSON. The structured output path is more reliable and faster.
- Keep inputs concise. The on-device model has a smaller context window than cloud models. For long documents, chunk your input and process segments individually, then combine the results in your app logic.
For a complete walkthrough of building an on-device AI mobile app from scratch, including architecture patterns that apply beyond Apple's ecosystem, check out our guide to building on-device AI mobile apps.
Getting Started Today
The best way to start is small. Pick one feature in your app where AI adds genuine value, not novelty, and implement it using Foundation Models. A text summarization feature, a smart compose suggestion, or an intent that makes your app accessible through Siri. Ship it, measure how users engage with it, and iterate. Do not try to "AI-ify" your entire app in one release. Users are skeptical of AI features that feel bolted on, and rightfully so.
If you are building an iOS app and want to integrate Apple Intelligence the right way, with proper fallbacks, clean architecture, and an eye toward App Store approval, we can help. We have been shipping apps with on-device AI capabilities since the first Apple Intelligence betas, and we know where the pitfalls are. Book a free strategy call and let us walk through your use case together.
Need help building this?
Our team has launched 50+ products for startups and ambitious brands. Let's talk about your project.