On-Device AI for Apple Platforms
Guide for selecting, deploying, and optimizing on-device ML models. Covers Apple
Foundation Models, Core ML, MLX Swift, and llama.cpp.
Contents
Framework Selection Router
Use this decision tree to pick the right framework for your use case.
Apple Foundation Models
When to use: Text generation, summarization, entity extraction, structured
output, and short dialog on iOS 26+ / macOS 26+ devices with Apple Intelligence
enabled. Zero setup -- no API keys, no network, no model downloads.
Best for:
- Generating text or structured data with
@Generable types
- Summarization, classification, content tagging
- Tool-augmented generation with the
Tool protocol
- Apps that need guaranteed on-device privacy
Not suited for: Complex math, code generation, factual accuracy tasks,
or apps targeting pre-iOS 26 devices.
Core ML
When to use: Deploying custom trained models (vision, NLP, audio) across all
Apple platforms. Converting models from PyTorch, TensorFlow, or scikit-learn
with coremltools.
Best for:
- Image classification, object detection, segmentation
- Custom NLP classifiers, sentiment analysis models
- Audio/speech models via SoundAnalysis integration
- Any scenario needing Neural Engine optimization
- Models requiring quantization, palettization, or pruning
MLX Swift
When to use: Running specific open-source LLMs (Llama, Mistral, Qwen, Gemma)
on Apple Silicon with maximum throughput. Research and prototyping.
Best for:
- Highest sustained token generation on Apple Silicon
- Running Hugging Face models from
mlx-community
- Research requiring automatic differentiation
- Fine-tuning workflows on Mac
llama.cpp
When to use: Cross-platform LLM inference using GGUF model format. Production
deployments needing broad device support.
Best for:
- GGUF quantized models (Q4_K_M, Q5_K_M, Q8_0)
- Cross-platform apps (iOS + Android + desktop)
- Maximum compatibility with open-source model ecosystem
Quick Reference
| Scenario |
Framework |
| Text generation, zero setup (iOS 26+) |
Foundation Models |
| Structured output from on-device LLM |
Foundation Models (@Generable) |
| Image classification, object detection |
Core ML |
| Custom model from PyTorch/TensorFlow |
Core ML + coremltools |
| Running specific open-source LLMs |
MLX Swift or llama.cpp |
| Maximum throughput on Apple Silicon |
MLX Swift |
| Cross-platform LLM inference |
llama.cpp |
| OCR and text recognition |
Vision framework |
| Sentiment analysis, NER, tokenization |
Natural Language framework |
| Training custom classifiers on device |
Create ML |
Apple Foundation Models Overview
On-device language model optimized for Apple Silicon. Available on devices
supporting Apple Intelligence (iOS 26+, macOS 26+).
- Token budget covers input + output; check
contextSize for the limit
- Check
supportedLanguages for supported locales
- Guardrails always enforced, cannot be disabled
Availability Checking (Required)
Always check before using. Never crash on unavailability.
import FoundationModels
switch SystemLanguageModel.default.availability {
case .available:
case .unavailable(.appleIntelligenceNotEnabled):
case .unavailable(.modelNotReady):
case .unavailable(.deviceNotEligible):
default:
}
Session Management
let session = LanguageModelSession()
let session = LanguageModelSession {
"You are a helpful cooking assistant."
}
let session = LanguageModelSession(
tools: [weatherTool, recipeTool]
) {
"You are a helpful assistant with access to tools."
}
Key rules:
- Sessions are stateful -- multi-turn conversations maintain context automatically
- One request at a time per session (check
session.isResponding)
- Call
session.prewarm() before user interaction for faster first response
- Save/restore transcripts:
LanguageModelSession(model: model, tools: [], transcript: savedTranscript)
Structured Output with @Generable
The @Generable macro creates compile-time schemas for type-safe output:
@Generable
struct Recipe {
@Guide(description: "The recipe name")
var name: String
@Guide(description: "Cooking steps", .count(3))
var steps: [String]
@Guide(description: "Prep time in minutes", .range(1...120))
var prepTime: Int
}
let response = try await session.respond(
to: "Suggest a quick pasta recipe",
generating: Recipe.self
)
print(response.content.name)
@Guide Constraints
| Constraint |
Purpose |
description: |
Natural language hint for generation |
.anyOf([values]) |
Restrict to enumerated string values |
.count(n) |
Fixed array length |
.range(min...max) |
Numeric range |
.minimum(n) / .maximum(n) |
One-sided numeric bound |
.minimumCount(n) / .maximumCount(n) |
Array length bounds |
.constant(value) |
Always returns this value |
.pattern(regex) |
String format enforcement |
.element(guide) |
Guide applied to each array element |
Properties generate in declaration order. Place foundational data before
dependent data for better results.
Streaming Structured Output
let stream = session.streamResponse(
to: "Suggest a recipe",
generating: Recipe.self
)
for try await snapshot in stream {
if let name = snapshot.content.name { updateNameLabel(name) }
}
Tool Calling
struct WeatherTool: Tool {
let name = "weather"
let description = "Get current weather for a city."
@Generable
struct Arguments {
@Guide(description: "The city name")
var city: String
}
func call(arguments: Arguments) async throws -> String {
let weather = try await fetchWeather(arguments.city)
return weather.description
}
}
Register tools at session creation. The model invokes them autonomously.
Error Handling
do {
let response = try await session.respond(to: prompt)
} catch let error as LanguageModelSession.GenerationError {
switch error {
case .guardrailViolation(let context):
case .exceededContextWindowSize(let context):
case .concurrentRequests(let context):
case .unsupportedLanguageOrLocale(let context):
case .unsupportedGuide(let context):
case .assetsUnavailable(let context):
case .refusal(let refusal, _):
case .rateLimited(let context)