From spring-skills
Guide Spring AI usage including ChatClient, structured output, tool calling, RAG, advisors, chat memory, and MCP integration. Use when building AI-powered features with Spring Boot, integrating LLM providers, implementing retrieval-augmented generation, or configuring function/tool calling.
How this skill is triggered — by the user, by Claude, or both
Slash command
/spring-skills:spring-aiThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**Pattern:** Process
Pattern: Process
Framework for integrating AI models into Spring Boot applications. Targets Spring AI 1.0+ on Boot 3.4+, Java 17+.
Mental model: Spring AI mirrors Spring's HTTP client layering. ChatModel is the low-level provider wrapper (like RestTemplate). ChatClient is the fluent high-level API (like RestClient) — it adds advisors, tools, memory, and structured output. Always use ChatClient unless you need raw provider access.
Do NOT Load for general machine learning, data science, or non-LLM AI. For event-driven patterns use spring-events. For web endpoints exposing AI features use spring-web.
| Need | Add to ChatClient |
|---|---|
| Just chat | Nothing — ChatClient alone is sufficient |
| Conversation history | MessageChatMemoryAdvisor + a ChatMemoryRepository |
| Domain knowledge (RAG) | QuestionAnswerAdvisor → graduate to RetrievalAugmentationAdvisor when retrieval quality drops |
| Model takes actions | Tools via @Tool or function beans |
| Multi-step reasoning | Agentic patterns (chain, routing, orchestrator-workers) — see references/agentic-patterns.md |
| Content safety | SafeGuardAdvisor |
Before adding AI to your application, ask:
The primary API. Spring Boot auto-configures a prototype-scoped ChatClient.Builder — each injection gets a fresh builder.
ChatClient chatClient = ChatClient.builder(chatModel)
.defaultSystem("You are a helpful assistant")
.defaultAdvisors(MessageChatMemoryAdvisor.builder(chatMemory).build())
.defaultTools(new DateTimeTools())
.build();
String answer = chatClient.prompt()
.user("What day is tomorrow?")
.call()
.content();
call() — synchronous: .content(), .entity(Class), .chatResponse()stream() — returns Flux<String> or Flux<ChatResponse>. Requires spring-boot-starter-webflux even in servlet apps. Retry does not apply to streaming — handle failures at the application level.options(ChatOptions.builder().temperature(0.9).model("gpt-4o").build())spring.ai.chat.client.enabled=false, create ChatClient instances manually with @Qualifier on each ChatModelchatModel.mutate() to derive new instances with different base URLs, API keys, or optionsExternalize prompts to resource files instead of hardcoding strings:
var template = new PromptTemplate(new ClassPathResource("/prompts/analyze.st"));
String rendered = template.render(Map.of("topic", "Spring Security", "depth", "expert"));
Uses StringTemplate syntax ({variable} delimiters). Use SystemPromptTemplate for system messages. Keep prompts in src/main/resources/prompts/ — they are version-controlled, reviewable, and reusable.
Convert LLM responses to Java types:
record ActorFilms(String actor, List<String> movies) {}
ActorFilms result = chatClient.prompt()
.user("List 5 films with Tom Hanks")
.call()
.entity(ActorFilms.class);
entity(Class) — single object via BeanOutputConverter (JSON Schema DRAFT_2020_12)entity(ParameterizedTypeReference) — for generic types like List<ActorFilms>AdvisorParams.ENABLE_NATIVE_STRUCTURED_OUTPUT. Supported by: OpenAI (GPT-4o+), Anthropic (Claude 3.5+), Vertex AI Gemini, Mistral AIStructured output is best-effort — models are not guaranteed to return valid JSON. Always validate the result. Prefer native structured output when the provider supports it.
Four approaches, ranked by preference:
class DateTimeTools {
@Tool(description = "Get current date and time in the user's timezone")
String getCurrentDateTime() {
return LocalDateTime.now().atZone(LocaleContextHolder.getTimeZone().toZoneId()).toString();
}
}
chatClient.prompt("What time is it?").tools(new DateTimeTools()).call().content();
@Bean @Description returning Function<I, O>, reference via .toolNames("beanName")FunctionToolCallback.builder() for dynamic registrationKey rules:
returnDirect = true — tool result goes directly to the caller, bypasses model post-processingToolContext — pass application context (tenant ID, user) not sent to the modelOptional, CompletableFuture, Mono, Flux, or functional types as parametersspring.ai.tools.throw-exception-on-error=true for fail-fastInterceptors in the request/response pipeline (like servlet filters):
Request -> Advisor1(before) -> Advisor2(before) -> ... -> LLM
Response <- Advisor1(after) <- Advisor2(after) <- ... <- LLM
Ordering via getOrder() — lower values execute first on requests, last on responses (stack behavior).
Built-in advisors:
| Advisor | Purpose |
|---|---|
MessageChatMemoryAdvisor | Adds conversation history as messages |
PromptChatMemoryAdvisor | Appends history as text to system prompt |
VectorStoreChatMemoryAdvisor | Semantic memory retrieval from vector store |
QuestionAnswerAdvisor | Simple naive RAG |
RetrievalAugmentationAdvisor | Modular RAG with composable stages |
SafeGuardAdvisor | Content safety filtering |
SimpleLoggerAdvisor | Debug logging of requests/responses |
ReReadingAdvisor | RE2 technique for improved reasoning |
ToolCallAdvisor | Advisor-controlled tool execution with observability |
For custom advisor creation and the RAG advisor pipeline, load references/rag-advisors.md.
MessageWindowChatMemory keeps the last N messages (preserves system messages):
ChatMemory memory = MessageWindowChatMemory.builder().maxMessages(10).build();
ChatClient chatClient = ChatClient.builder(chatModel)
.defaultAdvisors(MessageChatMemoryAdvisor.builder(memory).build())
.build();
chatClient.prompt().user("My name is James")
.advisors(a -> a.param(ChatMemory.CONVERSATION_ID, "session-123"))
.call().content();
Storage backends: InMemoryChatMemoryRepository (default), JDBC (PostgreSQL, MySQL, etc.), Cassandra, Neo4j, MongoDB, CosmosDB.
Pitfall: MessageWindowChatMemory counts messages, not tokens. A few long messages can exceed context windows. Intermediate tool-call messages are NOT persisted to memory — conversation history may be incomplete.
Spring AI retries failed model calls automatically. Defaults are aggressive — tune for production:
spring.ai.retry.max-attempts=3 # Default: 10 (too many for production)
spring.ai.retry.backoff.initial-interval=2s
spring.ai.retry.backoff.multiplier=5
spring.ai.retry.backoff.max-interval=3m
spring.ai.retry.on-client-errors=false # 4xx errors not retried by default
Retry does NOT apply to streaming API. No built-in circuit breaker — use Resilience4j if needed.
Full Micrometer integration. Observations for chat calls, embedding calls, advisor execution, tool calls, and vector store operations.
spring.ai.chat.observations.log-prompt=true # CAUTION: exposes prompts in traces
spring.ai.chat.observations.log-completion=true # CAUTION: exposes responses in traces
spring.ai.tools.observations.include-content=false # Tool args/results — keep false in prod
All default to false for security. Span attributes include model name, temperature, token counts, and finish reasons. Use gen_ai.client.token.usage counter for cost monitoring.
ChatModel directly instead of ChatClient. Loses advisors, memory, tools, and structured output. Use ChatClient for all application codeMessageWindowChatMemory counts messages, not tokens. A few large messages silently exceed the context window. Monitor token usage via observationsChatClient as a singleton when it holds mutable conversation state. Inject ChatClient.Builder (prototype-scoped) and build per-requeststream() without spring-boot-starter-webflux on the classpath. Fails at runtime with a confusing errorspring.ai.model.chat=<provider>. Auto-configuration fails. Set the property or disable spring.ai.chat.client.enabledLoad on demand for specific topics:
references/rag-advisors.md — RAG pipeline architecture, vector store setup, document ETL, custom advisors, query transformersreferences/providers.md — Provider-specific configuration, model selection, MCP client/server setup, multimodal support, Docker Model Runnerreferences/agentic-patterns.md — Chain, parallelization, routing, orchestrator-workers, evaluator-optimizer workflowsreferences/version-guide.md — Version matrix, feature availability by version, breaking changes timeline, migration guidesProvides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.
npx claudepluginhub rynr/spring-skills --plugin spring-skills