Skill

foundation-models

Complete guide for Apple's on-device Foundation Models framework (iOS 26+). Use when implementing, debugging, or architecting with Foundation Models. Triggers on: 'Foundation Models', 'LanguageModelSession', '@Generable', '@Guide', 'on-device LLM', 'FM framework'. Covers API reference, anti-patterns, decision trees, diagnostics, Instruments triage, and production crisis defense.

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/swift-skills:foundation-models

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Apple's on-device LLM. Its context window (input + output combined) is model-version-dependent and can differ across OS releases — query it at runtime with `SystemLanguageModel.default.contextSize` rather than hardcoding a number. (At the time of writing, Apple documents the base model at 4,096 tokens, but treat that as a current value, not a fixed contract.) Optimized for summarization, extrac...

Supporting Files

references/api-reference.mdreferences/diagnostics.md

SKILL.md

185 lines · ~1.8k tokens

Stats

LanguageShell

Stars1

MaintenanceExcellent

Last CommitJun 16, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Foundation Models

Apple's on-device LLM. Its context window (input + output combined) is model-version-dependent and can differ across OS releases — query it at runtime with SystemLanguageModel.default.contextSize rather than hardcoding a number. (At the time of writing, Apple documents the base model at 4,096 tokens, but treat that as a current value, not a fixed contract.) Optimized for summarization, extraction, classification, and generation. No network, no cost, no data leaves device.

Worked Example

Request: "Add article summarization with streaming to my app."

1. Check availability:

guard case .available = SystemLanguageModel.default.availability else {
    showUnavailableMessage()
    return
}

2. Define output type:

@Generable
struct ArticleSummary {
    @Guide(description: "One-sentence summary of the article's main point")
    var headline: String

    @Guide(.count(2...5), description: "Key takeaways in order of importance")
    var takeaways: [String]

    @Guide(.range(1...10), description: "Reading complexity score")
    var complexity: Int
}

3. Stream with progressive UI:

let session = LanguageModelSession(instructions: "Summarize articles concisely and accurately")

let stream = session.streamResponse(
    to: Prompt { "Summarize this article:"; articleText },
    generating: ArticleSummary.self
)

for try await partial in stream {
    withAnimation { self.summary = partial }
}
// partial.headline appears first, then takeaways fill in, then complexity

4. Handle errors:

catch LanguageModelSession.GenerationError.exceededContextWindowSize {
    // Article too long — chunk it or truncate
    session = LanguageModelSession(instructions: originalInstructions)
}

Anti-Patterns

1: Manual JSON Parsing

// WRONG: Manual JSON parsing — fragile, model might produce malformed output
let json = try JSONSerialization.jsonObject(with: response.content.data(using: .utf8)!)

// RIGHT: @Generable with constrained decoding — model cannot produce invalid structure
let person = try await session.respond(to: "Generate a person", generating: Person.self).content

2: Blocking UI with Synchronous Generation

// WRONG: User waits for entire response
self.text = try await session.respond(to: prompt).content

// RIGHT: Streaming for progressive display
for try await partial in session.streamResponse(to: prompt) {
    withAnimation { self.text = partial.content }
}

3: Context Overflow from Unbounded Conversations

// WRONG: Endless multi-turn — crashes when the context window fills

// RIGHT: Catch and recover
catch LanguageModelSession.GenerationError.exceededContextWindowSize {
    session = LanguageModelSession(instructions: originalInstructions)
}

The context window is TOTAL — instructions + schema + transcript + new prompt + output all count against it. Read the actual limit from SystemLanguageModel.default.contextSize; don't assume a fixed number.

4: User Input in Instructions (Prompt Injection)

// WRONG: User input in instructions
let session = LanguageModelSession(instructions: "Summarize: \(userInput)")

// RIGHT: User input in prompt only
let session = LanguageModelSession(instructions: "You summarize text concisely")
let response = try await session.respond(to: userInput)

Instructions are developer-controlled. Model trained to prioritize instructions over prompts.

5: Struct for Stateful Tools

Use class (not struct) when tools track state across calls. Struct copies lose mutations between calls.

6: Over-Complex Instructions Duplicating @Generable

@Generable and @Guide encode output structure at the decoding level. Don't repeat the schema in instructions — it wastes tokens (critical given the limited context window). Use instructions for tone and behavioral constraints only.

Decision Trees

Foundation Models vs Server API

Question	Choice
Privacy required / offline needed / avoid per-request cost?	FM
Summarization, extraction, or classification?	FM
World knowledge, complex reasoning, math, or translation?	Server API
Need more context than `SystemLanguageModel.default.contextSize` allows?	Server API

Both can coexist in one app.

Other Decisions

@Generable vs plain text: Use @Generable when you need structured data or type safety. Plain text when just displaying to user.
Tools vs prompt context: Tools for dynamic/real-time data, device data (contacts, calendar), external APIs. Prompt context for static, short data.
New session vs reuse: Reuse for same topic. New session when context fills, topic changes, or you need different instructions/tools.

Pressure Scenarios

"Use ChatGPT API Instead"

FM: private, offline, no latency, no per-request cost, no API keys. Server API: world knowledge, complex reasoning, larger context, translation. Both can coexist — not either/or.

"One Big Prompt for Everything"

The context window is TOTAL (query it via SystemLanguageModel.default.contextSize). Keep instructions concise, use @Generable instead of describing format, chunk large inputs, monitor with tokenUsage(for:), catch exceededContextWindowSize for multi-turn.

"Skip Availability Checks"

Three unavailable states — .deviceNotEligible (permanent), .appleIntelligenceNotEnabled (user action), .modelNotReady (temporary). Not checking = crashes on unsupported devices. Check on scenePhase activation to catch state changes.

Diagnostics Quick Reference

Symptom	Error	Key Fix
Context too long	`.exceededContextWindowSize`	Fresh/condensed session
Content policy error	`.guardrailViolation`	Rephrase prompt, filter input
Language not supported	`.unsupportedLanguageOrLocale`	Fall back to server
Structured output fails	`.decodingError`	Verify nested `@Generable`, add `@Guide`
Too many requests	`.rateLimited`	Backoff, queue requests
Tool not called	Inspect `session.transcript`	Strengthen instructions and tool description
Slow response	Profile with Instruments	Pre-warm, reduce tokens, stream
Wrong output	Check `@Guide` constraints	Add descriptions, constrain range/count

Full triage procedures and production crisis playbook: references/diagnostics.md

References

API Reference — Complete API with WWDC code examples
Diagnostics — Error triage, Instruments workflow, production crisis defense
WWDC Sessions: 286, 259, 301 (2025, historical anchors). Newer Foundation Models sessions ship with later releases (2026 / Xcode 27) — check the current WWDC catalog for updated guidance.

foundation-models

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

foundation-models

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Foundation Models

Worked Example

Anti-Patterns

1: Manual JSON Parsing

2: Blocking UI with Synchronous Generation

3: Context Overflow from Unbounded Conversations

4: User Input in Instructions (Prompt Injection)

5: Struct for Stateful Tools

6: Over-Complex Instructions Duplicating @Generable

Decision Trees

Foundation Models vs Server API

Other Decisions

Pressure Scenarios

"Use ChatGPT API Instead"

"One Big Prompt for Everything"

"Skip Availability Checks"

Diagnostics Quick Reference

References

Similar Skills

Foundation Models

Worked Example

Anti-Patterns

1: Manual JSON Parsing

2: Blocking UI with Synchronous Generation

3: Context Overflow from Unbounded Conversations

4: User Input in Instructions (Prompt Injection)

5: Struct for Stateful Tools

6: Over-Complex Instructions Duplicating @Generable

Decision Trees

Foundation Models vs Server API

Other Decisions

Pressure Scenarios

"Use ChatGPT API Instead"

"One Big Prompt for Everything"

"Skip Availability Checks"

Diagnostics Quick Reference

References

Similar Skills