From swift-skills
Complete guide for Apple's on-device Foundation Models framework (iOS 26+). Use when implementing, debugging, or architecting with Foundation Models. Triggers on: 'Foundation Models', 'LanguageModelSession', '@Generable', '@Guide', 'on-device LLM', 'FM framework'. Covers API reference, anti-patterns, decision trees, diagnostics, Instruments triage, and production crisis defense.
How this skill is triggered — by the user, by Claude, or both
Slash command
/swift-skills:foundation-modelsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Apple's on-device LLM. Its context window (input + output combined) is model-version-dependent and can differ across OS releases — query it at runtime with `SystemLanguageModel.default.contextSize` rather than hardcoding a number. (At the time of writing, Apple documents the base model at 4,096 tokens, but treat that as a current value, not a fixed contract.) Optimized for summarization, extrac...
Apple's on-device LLM. Its context window (input + output combined) is model-version-dependent and can differ across OS releases — query it at runtime with SystemLanguageModel.default.contextSize rather than hardcoding a number. (At the time of writing, Apple documents the base model at 4,096 tokens, but treat that as a current value, not a fixed contract.) Optimized for summarization, extraction, classification, and generation. No network, no cost, no data leaves device.
Request: "Add article summarization with streaming to my app."
1. Check availability:
guard case .available = SystemLanguageModel.default.availability else {
showUnavailableMessage()
return
}
2. Define output type:
@Generable
struct ArticleSummary {
@Guide(description: "One-sentence summary of the article's main point")
var headline: String
@Guide(.count(2...5), description: "Key takeaways in order of importance")
var takeaways: [String]
@Guide(.range(1...10), description: "Reading complexity score")
var complexity: Int
}
3. Stream with progressive UI:
let session = LanguageModelSession(instructions: "Summarize articles concisely and accurately")
let stream = session.streamResponse(
to: Prompt { "Summarize this article:"; articleText },
generating: ArticleSummary.self
)
for try await partial in stream {
withAnimation { self.summary = partial }
}
// partial.headline appears first, then takeaways fill in, then complexity
4. Handle errors:
catch LanguageModelSession.GenerationError.exceededContextWindowSize {
// Article too long — chunk it or truncate
session = LanguageModelSession(instructions: originalInstructions)
}
// WRONG: Manual JSON parsing — fragile, model might produce malformed output
let json = try JSONSerialization.jsonObject(with: response.content.data(using: .utf8)!)
// RIGHT: @Generable with constrained decoding — model cannot produce invalid structure
let person = try await session.respond(to: "Generate a person", generating: Person.self).content
// WRONG: User waits for entire response
self.text = try await session.respond(to: prompt).content
// RIGHT: Streaming for progressive display
for try await partial in session.streamResponse(to: prompt) {
withAnimation { self.text = partial.content }
}
// WRONG: Endless multi-turn — crashes when the context window fills
// RIGHT: Catch and recover
catch LanguageModelSession.GenerationError.exceededContextWindowSize {
session = LanguageModelSession(instructions: originalInstructions)
}
The context window is TOTAL — instructions + schema + transcript + new prompt + output all count against it. Read the actual limit from SystemLanguageModel.default.contextSize; don't assume a fixed number.
// WRONG: User input in instructions
let session = LanguageModelSession(instructions: "Summarize: \(userInput)")
// RIGHT: User input in prompt only
let session = LanguageModelSession(instructions: "You summarize text concisely")
let response = try await session.respond(to: userInput)
Instructions are developer-controlled. Model trained to prioritize instructions over prompts.
Use class (not struct) when tools track state across calls. Struct copies lose mutations between calls.
@Generable and @Guide encode output structure at the decoding level. Don't repeat the schema in instructions — it wastes tokens (critical given the limited context window). Use instructions for tone and behavioral constraints only.
| Question | Choice |
|---|---|
| Privacy required / offline needed / avoid per-request cost? | FM |
| Summarization, extraction, or classification? | FM |
| World knowledge, complex reasoning, math, or translation? | Server API |
Need more context than SystemLanguageModel.default.contextSize allows? | Server API |
Both can coexist in one app.
FM: private, offline, no latency, no per-request cost, no API keys. Server API: world knowledge, complex reasoning, larger context, translation. Both can coexist — not either/or.
The context window is TOTAL (query it via SystemLanguageModel.default.contextSize). Keep instructions concise, use @Generable instead of describing format, chunk large inputs, monitor with tokenUsage(for:), catch exceededContextWindowSize for multi-turn.
Three unavailable states — .deviceNotEligible (permanent), .appleIntelligenceNotEnabled (user action), .modelNotReady (temporary). Not checking = crashes on unsupported devices. Check on scenePhase activation to catch state changes.
| Symptom | Error | Key Fix |
|---|---|---|
| Context too long | .exceededContextWindowSize | Fresh/condensed session |
| Content policy error | .guardrailViolation | Rephrase prompt, filter input |
| Language not supported | .unsupportedLanguageOrLocale | Fall back to server |
| Structured output fails | .decodingError | Verify nested @Generable, add @Guide |
| Too many requests | .rateLimited | Backoff, queue requests |
| Tool not called | Inspect session.transcript | Strengthen instructions and tool description |
| Slow response | Profile with Instruments | Pre-warm, reduce tokens, stream |
| Wrong output | Check @Guide constraints | Add descriptions, constrain range/count |
Full triage procedures and production crisis playbook: references/diagnostics.md
npx claudepluginhub foxtrottwist/swift-skills --plugin swift-skillsProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.