From somnia-agents-skills
Deep-dive reference for the LLM Inference agent on Somnia — invoke a deterministic on-chain LLM (Qwen3-30B) from smart contracts. Covers the 4 functions (inferString, inferNumber, inferChat, inferToolsChat), MCP tool calling, on-chain tool yield/resume pattern, allowed-values constraints, and chain-of-thought. Use when building AI moderation, classification, summarization, sentiment scoring, or agentic DeFi bots that need an LLM to decide which on-chain calls to make.
How this skill is triggered — by the user, by Claude, or both
Slash command
/somnia-agents-skills:somnia-agents-llm-inferenceThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
The LLM Inference agent (`llm-inference`) gives smart contracts access to an on-chain deterministic LLM — Qwen3-30B running with fixed seed and `temperature = 0`, so every validator independently produces **byte-identical** output. That's what makes consensus on AI results possible.
The LLM Inference agent (llm-inference) gives smart contracts access to an on-chain deterministic LLM — Qwen3-30B running with fixed seed and temperature = 0, so every validator independently produces byte-identical output. That's what makes consensus on AI results possible.
Read the master
somnia-agentsskill first for the request lifecycle, gas model, and callback pattern. This document only covers the agent-specific ABI and quirks.
| Field | Value |
|---|---|
agentId | 12847293847561029384 |
| Per-agent price | 0.07 (whole tokens — SOMI on Mainnet, STT on Testnet) |
| Default consensus | Majority — deterministic, byte-identical outputs |
| Source of truth | references/agents.json |
| Function | Purpose |
|---|---|
inferString(prompt, system, chainOfThought, allowedValues) | Single-turn string inference, optionally constrained to a fixed set of values |
inferNumber(prompt, system, minValue, maxValue, chainOfThought) | Single-turn integer inference, clamped to [minValue, maxValue] |
inferChat(roles, messages, chainOfThought) | Multi-turn chat with full message history |
inferToolsChat(roles, messages, mcpServerUrls, onchainTools, maxIterations, chainOfThought) | Multi-turn chat with MCP tool calling (auto-executed) and on-chain tool calling (yielded back to caller as calldata) |
All four return their result via the standard request → callback flow. The full ABI is in references/agents.json under agents["llm-inference"].abi.
inferString — constrained classificationBest for: content moderation, sentiment labels, intent classification, any string output from a closed set.
function inferString(
string prompt,
string system,
bool chainOfThought,
string[] allowedValues
) returns (string response);
system: system prompt; pass "" if you don't need one.allowedValues: pass an empty array for unconstrained text. When non-empty, the model is forced to pick one of the listed strings — this is the safest pattern for on-chain logic that branches on the result.chainOfThought: when true, the model is allowed to reason internally (visible in the receipt) before producing the final answer. Increases latency and token cost; helpful for harder classification.bytes memory payload = abi.encodeWithSelector(
ILLMAgent.inferString.selector,
'Is this review positive or negative? "Absolutely loved it, best purchase ever!"',
"You are a sentiment classifier. Reply with one word.",
false,
_array("positive", "negative", "neutral")
);
The response is decoded as a single string:
string memory label = abi.decode(responses[0].result, (string));
inferNumber — bounded integer inferenceBest for: rating / scoring, count extraction, confidence values.
function inferNumber(
string prompt,
string system,
int256 minValue,
int256 maxValue,
bool chainOfThought
) returns (int256 response);
The agent extracts the first integer from the model's response and clamps it to [minValue, maxValue]. Set minValue = maxValue = 0 to disable clamping.
bytes memory payload = abi.encodeWithSelector(
ILLMAgent.inferNumber.selector,
'Rate the sentiment of this review on a 1-10 scale: "..."',
"You are a sentiment analyst. Reply with a single integer 1-10.",
int256(1),
int256(10),
true // chain-of-thought helps with subjective scores
);
Decode as int256:
int256 score = abi.decode(responses[0].result, (int256));
inferChat — multi-turn conversationPass full message history as parallel roles[] / messages[] arrays — same length, same order.
function inferChat(
string[] roles, // "system" | "user" | "assistant"
string[] messages,
bool chainOfThought
) returns (string response);
string[] memory roles = new string[](4);
string[] memory msgs = new string[](4);
roles[0] = "system"; msgs[0] = "You are a helpful coding assistant.";
roles[1] = "user"; msgs[1] = "How do I reverse a string in JavaScript?";
roles[2] = "assistant"; msgs[2] = "str.split('').reverse().join('')";
roles[3] = "user"; msgs[3] = "Can you explain that step by step?";
bytes memory payload = abi.encodeWithSelector(
ILLMAgent.inferChat.selector, roles, msgs, false
);
Use this when the prompt naturally needs prior context (instructions earlier in the conversation, partial assistant outputs, few-shot examples).
inferToolsChat — tool calling (MCP + on-chain)The most powerful and most subtle of the four. The LLM can call:
mcpServerUrls. Executed in-situ by the agent: the LLM emits a tool call, the agent forwards it to the MCP server, feeds the result back to the LLM, and continues. Caller sees only the final answer.onchainTools. The agent does not execute these; instead, when the LLM wants to call one, the agent yields the calldata back to the caller. The caller executes the call (against any contract, not just the requester) and resumes the conversation by passing the tool result back.function inferToolsChat(
string[] roles,
string[] messages,
string[] mcpServerUrls,
OnchainTool[] onchainTools,
uint256 maxIterations,
bool chainOfThought
) returns (
string finishReason,
string response,
string[] updatedRoles,
string[] updatedMessages,
string[] pendingToolCallIds,
bytes[] pendingToolCalls
);
struct OnchainTool {
string signature; // e.g. "swap(address token, uint256 amount)"
string description; // human-readable description for the LLM
}
Supported types in tool signatures: string, bool, address, uint*, int*, bytes, and arrays of these.
finishReason semantics| Value | What happened | What response / pending* contain |
|---|---|---|
"stop" | LLM finished — possibly after MCP tool calls (which were auto-executed). | response: final text. All other outputs empty. |
"tool_calls" | LLM wants to call on-chain tool(s). | response: empty. updatedRoles/updatedMessages: full conversation incl. any MCP results. pendingToolCallIds[i] ↔ pendingToolCalls[i] parallel arrays — calldata to execute. |
"max_iterations" | Reached maxIterations LLM↔tool round-trips without finishing. | Treat as a soft failure — increase maxIterations or simplify the prompt. |
Caller ──inferToolsChat([..., mcpServerUrls=["http://weather:80/"], onchainTools=[]])──► Agent
│
Agent ─list_tools()─► MCP server ─tools─► Agent
Agent ─prompt + tools─► LLM
LLM ─tool_call: getWeather("Tokyo")─► Agent
Agent ─call_tool─► MCP server ─result─► Agent
Agent ─tool_result─► LLM
LLM ─final answer─► Agent
│
Caller ◄─finishReason="stop", response="Tokyo is 22°C and sunny", [], [], [], []
Caller ──inferToolsChat([..., onchainTools=[swap(address,uint256)]])──► Agent
│
Agent ─prompt + tool defs─► LLM
LLM ─tool_call: swap(0xA0b8..., 1000)─► Agent
│
Caller ◄─finishReason="tool_calls", "", state, [callId], [calldata for swap(...)]
│
Caller executes calldata against the DEX, captures result
│
Caller ──inferToolsChat([state ++ {role:"tool", content:'{"tool_call_id":callId,"content":"success"}'}], ...)──► Agent
│
Agent ─continued conversation─► LLM
LLM ─final answer─► Agent
│
Caller ◄─finishReason="stop", response="Swapped 1000 USDC successfully", [], [], [], []
When finishReason == "tool_calls":
pendingToolCalls[i] — each is calldata (selector + ABI-encoded args).(bool ok, bytes memory result) = target.call(pendingToolCalls[i]);).(role: "tool", message: jsonOf({tool_call_id: pendingToolCallIds[i], content: resultString})) per call.inferToolsChat again with the extended updatedRoles + updatedMessages.Repeat until finishReason == "stop". Each round-trip is a new on-chain createRequest (with its own deposit + consensus cycle), so cap maxIterations and budget accordingly.
interface ILLMAgent {
struct OnchainTool { string signature; string description; }
function inferToolsChat(
string[] calldata roles,
string[] calldata messages,
string[] calldata mcpServerUrls,
OnchainTool[] calldata onchainTools,
uint256 maxIterations,
bool chainOfThought
) external returns (
string memory finishReason,
string memory response,
string[] memory updatedRoles,
string[] memory updatedMessages,
string[] memory pendingToolCallIds,
bytes[] memory pendingToolCalls
);
}
contract AgenticSwapper is IAgentRequesterHandler {
IAgentRequester public immutable platform;
address public immutable dex;
uint256 public constant LLM_AGENT_ID = 12847293847561029384;
uint256 public constant SUBCOMMITTEE_SIZE = 3;
uint256 public constant PRICE_PER_AGENT = 0.07 ether;
// Tracks per-request state for resume
mapping(uint256 => bytes) public requestState; // serialized roles+messages
// ... (createRequest call with onchainTools = [swap(address,uint256)] omitted for brevity)
function handleResponse(
uint256 requestId,
Response[] memory responses,
ResponseStatus status,
Request memory /* details */
) external override {
require(msg.sender == address(platform), "Only platform");
if (status != ResponseStatus.Success || responses.length == 0) return;
(
string memory finishReason,
string memory response,
string[] memory updatedRoles,
string[] memory updatedMessages,
string[] memory pendingToolCallIds,
bytes[] memory pendingToolCalls
) = abi.decode(
responses[0].result,
(string, string, string[], string[], string[], bytes[])
);
if (_streq(finishReason, "stop")) {
// Final answer in `response` — done.
return;
}
if (_streq(finishReason, "tool_calls")) {
// Execute each pending call and resume.
for (uint256 i = 0; i < pendingToolCalls.length; i++) {
(bool ok, bytes memory result) = dex.call(pendingToolCalls[i]);
// append (role:"tool", json("{tool_call_id":callIds[i],"content":...}")) to state
// ...
}
// Submit a new createRequest with updated state. (Funding the chain of
// requests is application-level — keep msg.value escrowed.)
}
}
function _streq(string memory a, string memory b) internal pure returns (bool) {
return keccak256(bytes(a)) == keccak256(bytes(b));
}
receive() external payable {}
}
The chain of inference requests (each with its own deposit, callback, and consensus) is what enables agentic behavior on-chain. Track total budget across the chain; each round-trip costs 0.07 × subSize per the LLM Inference price.
import { encodeFunctionData, parseAbi } from 'viem';
const abi = parseAbi([
'function inferString(string prompt, string system, bool chainOfThought, string[] allowedValues) returns (string)',
'function inferNumber(string prompt, string system, int256 minValue, int256 maxValue, bool chainOfThought) returns (int256)',
'function inferChat(string[] roles, string[] messages, bool chainOfThought) returns (string)',
// inferToolsChat tuple is messy in parseAbi — use the JSON form from references/agents.json
]);
const payload = encodeFunctionData({
abi,
functionName: 'inferString',
args: [
'Classify: "Check out this amazing new product!"',
'You are a content classifier. Reply with one word.',
false,
['safe', 'unsafe', 'spam'],
],
});
For inferToolsChat, load the structured ABI from references/agents.json — agents["llm-inference"].abi — and pass it to encodeFunctionData directly.
block.timestamp, recent block hashes). Two validators executing milliseconds apart can see different values, breaking byte-identical outputs and Majority consensus.allowedValues is a strong contractWhen you pass a non-empty allowedValues to inferString, the model is constrained — but the constraint is enforced post-hoc by the agent, not via grammar-constrained decoding at the model level. Edge case: if the model produces text that doesn't match any allowed value, the response is Failed. Keep allowed values short and unambiguous ("yes" / "no" over "definitely yes" / "absolutely not").
chainOfThought = true is more expensiveChain-of-thought multiplies token throughput. The runner's reported executionCost will be higher (still capped at perAgentBudget). For batch / high-volume use, leave it off unless you've measured a quality gain.
When resuming after finishReason == "tool_calls", the tool result message is a JSON string:
{"tool_call_id": "<callId>", "content": "<result string or stringified JSON>"}
Pass it as a plain string in the messages array with role "tool". Malformed JSON here is the most common reason the resume call fails.
Each resume round sends the full conversation back through createRequest. Long agentic loops can hit gas limits on the dApp side and increase per-request cost on the agent side. Keep system prompts compact and prune old turns when possible.
MCP servers must be reachable from the agent's sandbox — public HTTPS endpoints, not localhost or VPN-only. If the MCP server is down or slow, the LLM hangs on tool calls until either the agent's timeout or maxIterations.
Failed?Check the receipt (see master skill). Common LLM-specific causes:
prompt in the request_received step across receipts.allowedValues miss — model output didn't match any allowed string. Loosen the values or drop the constraint.max_iterations reached — for inferToolsChat, the LLM kept emitting tool calls without converging. Increase maxIterations or simplify the tool surface.somnia-agents — request lifecycle, deposit math, callback patternsomnia-agents-invoke — interactive CLI to fire inferString / inferNumber calls without writing a contractsomnia-agents-llm-parse-website — when you specifically want extraction from a webpage rather than free-form inferencenpx claudepluginhub emrestay/somnia-agents-skills --plugin somnia-agents-skillsProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.