Foundation Models β On-Device AI for Apple Platforms
When to Use This Skill
Use when:
- Implementing on-device AI features with Foundation Models
- Adding text summarization, classification, or extraction capabilities
- Creating structured output from LLM responses
- Building tool-calling patterns for external data integration
- Streaming generated content for better UX
- Debugging Foundation Models issues (context overflow, slow generation, wrong output)
- Deciding between Foundation Models vs server LLMs (ChatGPT, Claude, etc.)
Related Skills
- Use
axiom-foundation-models-diag for systematic troubleshooting (context exceeded, guardrail violations, availability problems)
- Use
axiom-foundation-models-ref for complete API reference with all WWDC code examples
Red Flags β Anti-Patterns That Will Fail
β Using for World Knowledge
Why it fails: The on-device model is 3 billion parameters, optimized for summarization, extraction, classification β NOT world knowledge or complex reasoning.
Example of wrong use:
let session = LanguageModelSession()
let response = try await session.respond(to: "What's the capital of France?")
Why: Model will hallucinate or give low-quality answers. It's trained for content generation, not encyclopedic knowledge.
Correct approach: Use server LLMs (ChatGPT, Claude) for world knowledge, or provide factual data through Tool calling.
β Blocking Main Thread
Why it fails: session.respond() is async but if called synchronously on main thread, freezes UI for seconds.
Example of wrong use:
Button("Generate") {
let response = try await session.respond(to: prompt)
}
Why: Generation takes 1-5 seconds. User sees frozen app, bad reviews follow.
Correct approach:
Button("Generate") {
Task {
let response = try await session.respond(to: prompt)
}
}
β Manual JSON Parsing
Why it fails: Prompting for JSON and parsing with JSONDecoder leads to hallucinated keys, invalid JSON, no type safety.
Example of wrong use:
let prompt = "Generate a person with name and age as JSON"
let response = try await session.respond(to: prompt)
let data = response.content.data(using: .utf8)!
let person = try JSONDecoder().decode(Person.self, from: data)
Why: Model might output {firstName: "John"} when you expect {name: "John"}. Or invalid JSON entirely.
Correct approach:
@Generable
struct Person {
let name: String
let age: Int
}
let response = try await session.respond(
to: "Generate a person",
generating: Person.self
)
β Ignoring Availability Check
Why it fails: Foundation Models only runs on Apple Intelligence devices in supported regions. App crashes or shows errors without check.
Example of wrong use:
let session = LanguageModelSession()
Correct approach:
switch SystemLanguageModel.default.availability {
case .available:
let session = LanguageModelSession()
case .unavailable(let reason):
}
β Single Huge Prompt
Why it fails: 4096 token context window (input + output). One massive prompt hits limit, gives poor results.
Example of wrong use:
let prompt = """
Generate a 7-day itinerary for Tokyo including hotels, restaurants,
activities for each day, transportation details, budget breakdown...
"""
Correct approach: Break into smaller tasks, use tools for external data, multi-turn conversation.
β Not Handling Generation Errors
Why it fails: Three errors MUST be handled or your app will crash in production.
do {
let response = try await session.respond(to: prompt)
} catch LanguageModelSession.GenerationError.exceededContextWindowSize {
} catch LanguageModelSession.GenerationError.guardrailViolation {
} catch LanguageModelSession.GenerationError.unsupportedLanguageOrLocale {
}
Mandatory First Steps
Before writing any Foundation Models code, complete these steps:
1. Check Availability
See "Ignoring Availability Check" in Red Flags above for the required pattern. Foundation Models requires Apple Intelligence-enabled device, supported region, and user opt-in.
2. Identify Use Case
Ask yourself: What is my primary goal?
| Use Case |
Foundation Models? |
Alternative |
| Summarization |
β
YES |
|
| Extraction (key info from text) |
β
YES |
|
| Classification (categorize content) |
β
YES |
|
| Content tagging |
β
YES (built-in adapter!) |
|
| World knowledge |
β NO |
ChatGPT, Claude, Gemini |
| Complex reasoning |
β NO |
Server LLMs |
| Mathematical computation |
β NO |
Calculator, symbolic math |
Critical: If your use case requires world knowledge or advanced reasoning, stop. Foundation Models is the wrong tool.
3. Design @Generable Schema
If you need structured output (not just plain text):
Bad approach: Prompt for "JSON" and parse manually
Good approach: Define @Generable type
@Generable
struct SearchSuggestions {
@Guide(description: "Suggested search terms", .count(4))
var searchTerms: [String]
}
Why: Constrained decoding guarantees structure. No parsing errors, no hallucinated keys.
4. Consider Tools for External Data
If your feature needs external information:
- Weather β WeatherKit tool
- Locations β MapKit tool
- Contacts β Contacts API tool
- Calendar β EventKit tool
Don't try to get this information from the model (it will hallucinate).
Do define Tool protocol implementations.
5. Plan Streaming for Long Generations
If generation takes >1 second, use streaming:
let stream = session.streamResponse(
to: prompt,
generating: Itinerary.self
)
for try await partial in stream {
self.itinerary = partial
}
Why: Users see progress immediately, perceived latency drops dramatically.
Decision Tree
Need on-device AI?
β
ββ World knowledge/reasoning?
β ββ β NOT Foundation Models
β β Use ChatGPT, Claude, Gemini, etc.
β β Reason: 3B parameter model, not trained for encyclopedic knowledge
β
ββ Summarization?
β ββ β
YES β Pattern 1 (Basic Session)
β β Example: Summarize article, condense email
β β Time: 10-15 minutes
β
ββ Structured extraction?
β ββ β
YES β Pattern 2 (@Generable)
β β Example: Extract name, date, amount from invoice
β β Time: 15-20 minutes
β
ββ Content tagging?
β ββ β
YES β Pattern 3 (contentTagging use case)
β β Example: Tag article topics, extract entities
β β Time: 10 minutes
β
ββ Need external data?
β ββ β
YES β Pattern 4 (Tool calling)
β β Example: Fetch weather, query contacts, get locations
β β Time: 20-30 minutes
β
ββ Long generation?
β ββ β
YES β Pattern 5 (Streaming)
β β Example: Generate itinerary, create story
β β Time: 15-20 minutes
β
ββ Dynamic schemas (runtime-defined structure)?
ββ β
YES β Pattern 6 (DynamicGenerationSchema)
β Example: Level creator, user-defined forms
β Time: 30-40 minutes
Pattern 1: Basic Session
Use when: Simple text generation, summarization, or content analysis.
Core Concepts
LanguageModelSession:
- Stateful β retains transcript of all interactions
- Instructions vs prompts:
- Instructions (from developer): Define model's role, static guidance
- Prompts (from user): Dynamic input for generation
- Model trained to obey instructions over prompts (security feature)
Implementation
import FoundationModels
func respond(userInput: String) async throws