Skill

somnia-agents-llm-inference

Deep-dive reference for the LLM Inference agent on Somnia — invoke a deterministic on-chain LLM (Qwen3-30B) from smart contracts. Covers the 4 functions (inferString, inferNumber, inferChat, inferToolsChat), MCP tool calling, on-chain tool yield/resume pattern, allowed-values constraints, and chain-of-thought. Use when building AI moderation, classification, summarization, sentiment scoring, or agentic DeFi bots that need an LLM to decide which on-chain calls to make.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/somnia-agents-skills:somnia-agents-llm-inference

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

The LLM Inference agent (`llm-inference`) gives smart contracts access to an on-chain deterministic LLM — Qwen3-30B running with fixed seed and `temperature = 0`, so every validator independently produces **byte-identical** output. That's what makes consensus on AI results possible.

SKILL.md

379 lines · ~4.2k tokens

Stats

LanguageTypeScript

Stars0

MaintenanceGood

Last CommitMay 8, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

LLM Inference Agent

The LLM Inference agent (llm-inference) gives smart contracts access to an on-chain deterministic LLM — Qwen3-30B running with fixed seed and temperature = 0, so every validator independently produces byte-identical output. That's what makes consensus on AI results possible.

Read the master somnia-agents skill first for the request lifecycle, gas model, and callback pattern. This document only covers the agent-specific ABI and quirks.

Identity

Field	Value
`agentId`	`12847293847561029384`
Per-agent price	`0.07` (whole tokens — SOMI on Mainnet, STT on Testnet)
Default consensus	Majority — deterministic, byte-identical outputs
Source of truth	`references/agents.json`

Methods

Function	Purpose
`inferString(prompt, system, chainOfThought, allowedValues)`	Single-turn string inference, optionally constrained to a fixed set of values
`inferNumber(prompt, system, minValue, maxValue, chainOfThought)`	Single-turn integer inference, clamped to `[minValue, maxValue]`
`inferChat(roles, messages, chainOfThought)`	Multi-turn chat with full message history
`inferToolsChat(roles, messages, mcpServerUrls, onchainTools, maxIterations, chainOfThought)`	Multi-turn chat with MCP tool calling (auto-executed) and on-chain tool calling (yielded back to caller as calldata)

All four return their result via the standard request → callback flow. The full ABI is in references/agents.json under agents["llm-inference"].abi.

`inferString` — constrained classification

Best for: content moderation, sentiment labels, intent classification, any string output from a closed set.

function inferString(
    string  prompt,
    string  system,
    bool    chainOfThought,
    string[] allowedValues
) returns (string response);

system: system prompt; pass "" if you don't need one.
allowedValues: pass an empty array for unconstrained text. When non-empty, the model is forced to pick one of the listed strings — this is the safest pattern for on-chain logic that branches on the result.
chainOfThought: when true, the model is allowed to reason internally (visible in the receipt) before producing the final answer. Increases latency and token cost; helpful for harder classification.

bytes memory payload = abi.encodeWithSelector(
    ILLMAgent.inferString.selector,
    'Is this review positive or negative? "Absolutely loved it, best purchase ever!"',
    "You are a sentiment classifier. Reply with one word.",
    false,
    _array("positive", "negative", "neutral")
);

The response is decoded as a single string:

string memory label = abi.decode(responses[0].result, (string));

`inferNumber` — bounded integer inference

Best for: rating / scoring, count extraction, confidence values.

function inferNumber(
    string prompt,
    string system,
    int256 minValue,
    int256 maxValue,
    bool   chainOfThought
) returns (int256 response);

The agent extracts the first integer from the model's response and clamps it to [minValue, maxValue]. Set minValue = maxValue = 0 to disable clamping.

bytes memory payload = abi.encodeWithSelector(
    ILLMAgent.inferNumber.selector,
    'Rate the sentiment of this review on a 1-10 scale: "..."',
    "You are a sentiment analyst. Reply with a single integer 1-10.",
    int256(1),
    int256(10),
    true  // chain-of-thought helps with subjective scores
);

Decode as int256:

int256 score = abi.decode(responses[0].result, (int256));

`inferChat` — multi-turn conversation

Pass full message history as parallel roles[] / messages[] arrays — same length, same order.

function inferChat(
    string[] roles,        // "system" | "user" | "assistant"
    string[] messages,
    bool     chainOfThought
) returns (string response);

string[] memory roles = new string[](4);
string[] memory msgs  = new string[](4);
roles[0] = "system";   msgs[0] = "You are a helpful coding assistant.";
roles[1] = "user";     msgs[1] = "How do I reverse a string in JavaScript?";
roles[2] = "assistant"; msgs[2] = "str.split('').reverse().join('')";
roles[3] = "user";     msgs[3] = "Can you explain that step by step?";

bytes memory payload = abi.encodeWithSelector(
    ILLMAgent.inferChat.selector, roles, msgs, false
);

Use this when the prompt naturally needs prior context (instructions earlier in the conversation, partial assistant outputs, few-shot examples).

`inferToolsChat` — tool calling (MCP + on-chain)

The most powerful and most subtle of the four. The LLM can call:

MCP tools — discovered automatically from the URLs you pass in mcpServerUrls. Executed in-situ by the agent: the LLM emits a tool call, the agent forwards it to the MCP server, feeds the result back to the LLM, and continues. Caller sees only the final answer.
On-chain tools — declared as Solidity function signatures in onchainTools. The agent does not execute these; instead, when the LLM wants to call one, the agent yields the calldata back to the caller. The caller executes the call (against any contract, not just the requester) and resumes the conversation by passing the tool result back.

function inferToolsChat(
    string[] roles,
    string[] messages,
    string[] mcpServerUrls,
    OnchainTool[] onchainTools,
    uint256 maxIterations,
    bool    chainOfThought
) returns (
    string  finishReason,
    string  response,
    string[] updatedRoles,
    string[] updatedMessages,
    string[] pendingToolCallIds,
    bytes[] pendingToolCalls
);

struct OnchainTool {
    string signature;    // e.g. "swap(address token, uint256 amount)"
    string description;  // human-readable description for the LLM
}

Supported types in tool signatures: string, bool, address, uint*, int*, bytes, and arrays of these.

`finishReason` semantics

Value	What happened	What `response` / `pending*` contain
`"stop"`	LLM finished — possibly after MCP tool calls (which were auto-executed).	`response`: final text. All other outputs empty.
`"tool_calls"`	LLM wants to call on-chain tool(s).	`response`: empty. `updatedRoles`/`updatedMessages`: full conversation incl. any MCP results. `pendingToolCallIds[i]` ↔ `pendingToolCalls[i]` parallel arrays — calldata to execute.
`"max_iterations"`	Reached `maxIterations` LLM↔tool round-trips without finishing.	Treat as a soft failure — increase `maxIterations` or simplify the prompt.

MCP-only flow (auto-executed)

Caller ──inferToolsChat([..., mcpServerUrls=["http://weather:80/"], onchainTools=[]])──► Agent
                                                          │
Agent ─list_tools()─► MCP server ─tools─► Agent
Agent ─prompt + tools─► LLM
LLM ─tool_call: getWeather("Tokyo")─► Agent
Agent ─call_tool─► MCP server ─result─► Agent
Agent ─tool_result─► LLM
LLM ─final answer─► Agent
                                                          │
Caller ◄─finishReason="stop", response="Tokyo is 22°C and sunny", [], [], [], []

On-chain tool yield/resume flow

Caller ──inferToolsChat([..., onchainTools=[swap(address,uint256)]])──► Agent
                                                          │
Agent ─prompt + tool defs─► LLM
LLM ─tool_call: swap(0xA0b8..., 1000)─► Agent
                                                          │
Caller ◄─finishReason="tool_calls", "", state, [callId], [calldata for swap(...)]
                                                          │
Caller executes calldata against the DEX, captures result
                                                          │
Caller ──inferToolsChat([state ++ {role:"tool", content:'{"tool_call_id":callId,"content":"success"}'}], ...)──► Agent
                                                          │
Agent ─continued conversation─► LLM
LLM ─final answer─► Agent
                                                          │
Caller ◄─finishReason="stop", response="Swapped 1000 USDC successfully", [], [], [], []

Resume protocol

When finishReason == "tool_calls":

Iterate pendingToolCalls[i] — each is calldata (selector + ABI-encoded args).
Execute the call against the appropriate target contract ((bool ok, bytes memory result) = target.call(pendingToolCalls[i]);).
Append the result to the conversation: a new (role: "tool", message: jsonOf({tool_call_id: pendingToolCallIds[i], content: resultString})) per call.
Call inferToolsChat again with the extended updatedRoles + updatedMessages.

Repeat until finishReason == "stop". Each round-trip is a new on-chain createRequest (with its own deposit + consensus cycle), so cap maxIterations and budget accordingly.

Solidity sketch — agentic swap

interface ILLMAgent {
    struct OnchainTool { string signature; string description; }
    function inferToolsChat(
        string[] calldata roles,
        string[] calldata messages,
        string[] calldata mcpServerUrls,
        OnchainTool[] calldata onchainTools,
        uint256 maxIterations,
        bool chainOfThought
    ) external returns (
        string memory finishReason,
        string memory response,
        string[] memory updatedRoles,
        string[] memory updatedMessages,
        string[] memory pendingToolCallIds,
        bytes[] memory pendingToolCalls
    );
}

contract AgenticSwapper is IAgentRequesterHandler {
    IAgentRequester public immutable platform;
    address public immutable dex;
    uint256 public constant LLM_AGENT_ID = 12847293847561029384;
    uint256 public constant SUBCOMMITTEE_SIZE = 3;
    uint256 public constant PRICE_PER_AGENT = 0.07 ether;

    // Tracks per-request state for resume
    mapping(uint256 => bytes) public requestState; // serialized roles+messages

    // ... (createRequest call with onchainTools = [swap(address,uint256)] omitted for brevity)

    function handleResponse(
        uint256 requestId,
        Response[] memory responses,
        ResponseStatus status,
        Request memory /* details */
    ) external override {
        require(msg.sender == address(platform), "Only platform");
        if (status != ResponseStatus.Success || responses.length == 0) return;

        (
            string memory finishReason,
            string memory response,
            string[] memory updatedRoles,
            string[] memory updatedMessages,
            string[] memory pendingToolCallIds,
            bytes[] memory pendingToolCalls
        ) = abi.decode(
            responses[0].result,
            (string, string, string[], string[], string[], bytes[])
        );

        if (_streq(finishReason, "stop")) {
            // Final answer in `response` — done.
            return;
        }

        if (_streq(finishReason, "tool_calls")) {
            // Execute each pending call and resume.
            for (uint256 i = 0; i < pendingToolCalls.length; i++) {
                (bool ok, bytes memory result) = dex.call(pendingToolCalls[i]);
                // append (role:"tool", json("{tool_call_id":callIds[i],"content":...}")) to state
                // ...
            }
            // Submit a new createRequest with updated state. (Funding the chain of
            // requests is application-level — keep msg.value escrowed.)
        }
    }

    function _streq(string memory a, string memory b) internal pure returns (bool) {
        return keccak256(bytes(a)) == keccak256(bytes(b));
    }
    receive() external payable {}
}

The chain of inference requests (each with its own deposit, callback, and consensus) is what enables agentic behavior on-chain. Track total budget across the chain; each round-trip costs 0.07 × subSize per the LLM Inference price.

TypeScript encoding

import { encodeFunctionData, parseAbi } from 'viem';

const abi = parseAbi([
  'function inferString(string prompt, string system, bool chainOfThought, string[] allowedValues) returns (string)',
  'function inferNumber(string prompt, string system, int256 minValue, int256 maxValue, bool chainOfThought) returns (int256)',
  'function inferChat(string[] roles, string[] messages, bool chainOfThought) returns (string)',
  // inferToolsChat tuple is messy in parseAbi — use the JSON form from references/agents.json
]);

const payload = encodeFunctionData({
  abi,
  functionName: 'inferString',
  args: [
    'Classify: "Check out this amazing new product!"',
    'You are a content classifier. Reply with one word.',
    false,
    ['safe', 'unsafe', 'spam'],
  ],
});

For inferToolsChat, load the structured ABI from references/agents.json — agents["llm-inference"].abi — and pass it to encodeFunctionData directly.

Pitfalls specific to llm-inference

Determinism is the whole point — preserve it

Don't include block-dependent data in the prompt (block number, block.timestamp, recent block hashes). Two validators executing milliseconds apart can see different values, breaking byte-identical outputs and Majority consensus.
Don't rely on URLs that return time-sensitive data when used inside MCP tools — same problem.
Don't introduce randomness into prompt construction (random salts, etc.).

`allowedValues` is a strong contract

When you pass a non-empty allowedValues to inferString, the model is constrained — but the constraint is enforced post-hoc by the agent, not via grammar-constrained decoding at the model level. Edge case: if the model produces text that doesn't match any allowed value, the response is Failed. Keep allowed values short and unambiguous ("yes" / "no" over "definitely yes" / "absolutely not").

`chainOfThought = true` is more expensive

Chain-of-thought multiplies token throughput. The runner's reported executionCost will be higher (still capped at perAgentBudget). For batch / high-volume use, leave it off unless you've measured a quality gain.

Tool result formatting

When resuming after finishReason == "tool_calls", the tool result message is a JSON string:

{"tool_call_id": "<callId>", "content": "<result string or stringified JSON>"}

Pass it as a plain string in the messages array with role "tool". Malformed JSON here is the most common reason the resume call fails.

Conversation length cost

Each resume round sends the full conversation back through createRequest. Long agentic loops can hit gas limits on the dApp side and increase per-request cost on the agent side. Keep system prompts compact and prune old turns when possible.

MCP server reachability

MCP servers must be reachable from the agent's sandbox — public HTTPS endpoints, not localhost or VPN-only. If the MCP server is down or slow, the LLM hangs on tool calls until either the agent's timeout or maxIterations.

Why is my response `Failed`?

Check the receipt (see master skill). Common LLM-specific causes:

Non-deterministic prompt — different validators saw slightly different inputs (e.g. trailing whitespace, encoding). Outputs diverged → no Majority. Inspect prompt in the request_received step across receipts.
allowedValues miss — model output didn't match any allowed string. Loosen the values or drop the constraint.
max_iterations reached — for inferToolsChat, the LLM kept emitting tool calls without converging. Increase maxIterations or simplify the tool surface.

Cross-references

somnia-agents — request lifecycle, deposit math, callback pattern
somnia-agents-invoke — interactive CLI to fire inferString / inferNumber calls without writing a contract
somnia-agents-llm-parse-website — when you specifically want extraction from a webpage rather than free-form inference

somnia-agents-llm-inference

Invocation

Context Preview

SKILL.md

somnia-agents-llm-inference

Invocation

Context Preview

SKILL.md

LLM Inference Agent

Identity

Methods

inferString — constrained classification

inferNumber — bounded integer inference

inferChat — multi-turn conversation

inferToolsChat — tool calling (MCP + on-chain)

finishReason semantics

MCP-only flow (auto-executed)

On-chain tool yield/resume flow

Resume protocol

Solidity sketch — agentic swap

TypeScript encoding

Pitfalls specific to llm-inference

Determinism is the whole point — preserve it

allowedValues is a strong contract

chainOfThought = true is more expensive

Tool result formatting

Conversation length cost

MCP server reachability

Why is my response Failed?

Cross-references

Similar Skills

LLM Inference Agent

Identity

Methods

inferString — constrained classification

inferNumber — bounded integer inference

inferChat — multi-turn conversation

inferToolsChat — tool calling (MCP + on-chain)

finishReason semantics

MCP-only flow (auto-executed)

On-chain tool yield/resume flow

Resume protocol

Solidity sketch — agentic swap

TypeScript encoding

Pitfalls specific to llm-inference

Determinism is the whole point — preserve it

allowedValues is a strong contract

chainOfThought = true is more expensive

Tool result formatting

Conversation length cost

MCP server reachability

Why is my response Failed?

Cross-references

Similar Skills

`inferString` — constrained classification

`inferNumber` — bounded integer inference

`inferChat` — multi-turn conversation

`inferToolsChat` — tool calling (MCP + on-chain)

`finishReason` semantics

`allowedValues` is a strong contract

`chainOfThought = true` is more expensive

Why is my response `Failed`?

`inferString` — constrained classification

`inferNumber` — bounded integer inference

`inferChat` — multi-turn conversation

`inferToolsChat` — tool calling (MCP + on-chain)

`finishReason` semantics

`allowedValues` is a strong contract

`chainOfThought = true` is more expensive

Why is my response `Failed`?