Skill

ollama-integration

From llmstxt-generator

Guide for Ollama setup, model selection, and integration for llms.txt generation

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/llmstxt-generator:ollama-integration

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

**Ollama** is a lightweight local LLM runtime that lets you run large language models on your machine without cloud dependencies. The llmstxt-generator plugin uses Ollama to intelligently generate llms.txt files.

SKILL.md

481 lines · ~2.7k tokens

Stats

LanguagePython

Stars0

MaintenanceGood

Last CommitFeb 23, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Ollama Integration Guide

Ollama is a lightweight local LLM runtime that lets you run large language models on your machine without cloud dependencies. The llmstxt-generator plugin uses Ollama to intelligently generate llms.txt files.

What is Ollama?

Ollama is an open-source project that makes running LLMs locally simple and fast:

Free and open-source - No API costs or rate limits
Privacy-first - All processing happens on your machine
Simple CLI - Easy to install and manage models
Fast inference - Optimized for consumer hardware
API server - Can be used programmatically

Official Site: https://ollama.ai GitHub: https://github.com/ollama/ollama

Installation

macOS

Option 1: Download App (Easiest)

Visit https://ollama.ai/download
Download macOS installer
Run the installer
Ollama runs as a background service

Option 2: Homebrew

brew install ollama
ollama serve  # Start the server

Linux

Option 1: Installation Script

curl -fsSL https://ollama.ai/install.sh | sh
ollama serve

Option 2: Docker

docker run -d -v ollama:/root/.ollama -p 11434:11434 \
  --name ollama ollama/ollama:latest

Windows

Download: https://ollama.ai/download/windows
Run installer
Ollama starts automatically on background

Verify Installation

# Check if Ollama is running
ollama list

# Should show available models, or empty list initially

If you see "Connection refused", Ollama isn't running:

# macOS/Linux
ollama serve

# Or check if service is running
ps aux | grep ollama

Model Selection for llms.txt Generation

Different Ollama models have different characteristics. Here's a guide for generating llms.txt:

Recommended Models

llama2 (13B) - ⭐ BEST FOR LLMS.TXT

Size: 13B parameters, ~7GB disk
Speed: Fast (reasonable on consumer hardware)
Quality: Excellent - good structure and descriptions
Use Case: Default choice for llms.txt generation
Memory: ~8GB RAM recommended

ollama pull llama2

Best for: Balanced generation quality and speed

mistral (7B) - Fast & Capable

Size: 7B parameters, ~5GB disk
Speed: Very fast, good for quick generation
Quality: Good - slightly less verbose than llama2
Memory: ~4GB RAM sufficient
Instruction Following: Excellent

ollama pull mistral

Best for: Quick generation on limited hardware

neural-chat (7B) - Conversational

Size: 7B parameters, ~5GB disk
Speed: Very fast
Quality: Excellent for conversational content
Memory: ~4GB RAM
Focus: Optimized for chat and instruction tasks

ollama pull neural-chat

Best for: Natural, conversational link descriptions

dolphin-mixtral (8x7B MoE) - Advanced

Size: 8x7B mixture-of-experts, ~32GB disk
Speed: Slower (requires more compute)
Quality: Excellent - comprehensive and accurate
Memory: ~20GB RAM needed (not recommended for most)

ollama pull dolphin-mixtral

Best for: Maximum quality when resources available

Not Recommended for llms.txt

Too Small:

tinyllama - Lower quality content for complex docs

Too Slow:

llama2:70b - Overkill for llms.txt (slow on consumer hardware)
openchat-3.5-orca - Slower than alternatives without quality benefit

Specialized:

codeup - Optimized for code, not documentation
magicoder - For coding tasks, not general content

Downloading Models

Each model is a one-time download that stores locally:

# Download (first time only)
ollama pull llama2
# ~7GB download, ~15 min depending on internet

# Future runs use local copy (instant)
ollama list
# Shows: NAME              ID              SIZE      MODIFIED
#        llama2:latest     91ab59b18b92    3.8 GB    2 minutes ago

Managing Disk Space

# Remove a model to free space
ollama rm llama2

# Check disk usage
du -sh ~/.ollama/

# Show all models
ollama list

Starting Ollama Server

The llmstxt-generator plugin communicates with Ollama via API (default: http://localhost:11434).

Automatic (macOS with App)

Ollama starts automatically as a background service after installation.

# Verify it's running
ollama list

Manual Start (Linux/macOS)

# In terminal window 1
ollama serve

# In terminal window 2
ollama list

With Docker

docker run -d \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  ollama/ollama:latest

# Download a model
docker exec ollama ollama pull llama2

API Connection

The llmstxt-generator plugin connects to Ollama's API. Here's how it works:

Default Configuration

# Ollama API endpoint (default)
http://localhost:11434

# Generate request example (what the plugin does)
curl http://localhost:11434/api/generate \
  -d '{
    "model": "llama2",
    "prompt": "Create an llms.txt structure for FastHTML library",
    "stream": false
  }'

Custom Configuration

If Ollama runs on different host/port, configure in .claude/llmstxt-generator.local.md:

# llmstxt-generator Configuration

## Ollama Settings
- host: http://192.168.1.100:11434
- default_model: llama2

Testing Connection

# Test if Ollama is accessible
curl http://localhost:11434/api/tags

# Should return:
# {"models":[{"name":"llama2:latest",...}]}

Model Performance Guide

Generation Speed

On typical hardware (MacBook Pro M1, 16GB RAM):

Model	First Token	Full llms.txt	Quality
mistral (7B)	~100ms	~2-3 seconds	Good
neural-chat (7B)	~100ms	~2-3 seconds	Excellent
llama2 (13B)	~200ms	~4-6 seconds	Excellent
dolphin-mixtral (8x7B)	~500ms	~15-20 seconds	Superior

Note: Speeds vary by hardware. GPU acceleration greatly improves performance.

Output Quality Examples

For the prompt: "Create an llms.txt link description for an authentication guide":

mistral:

"Complete OAuth2 implementation with JWT tokens, refresh logic"

llama2:

"Comprehensive OAuth2, API key, and JWT authentication patterns with examples"

neural-chat:

"Step-by-step OAuth2, API keys, and JWT authentication with error handling"

dolphin-mixtral:

"Complete authentication guide: OAuth2 flows, API key management, JWT tokens, session handling, security best practices"

All are good; llama2 and neural-chat offer best balance.

Troubleshooting

Error: "Cannot connect to Ollama at localhost:11434"

Cause: Ollama isn't running

Solution:

# Start Ollama
ollama serve

# Or check if it's running
lsof -i :11434

Error: "Model 'llama2' not found"

Cause: Model hasn't been downloaded yet

Solution:

# Download the model
ollama pull llama2

# List available models
ollama list

Error: "Out of memory"

Cause: Model is too large for available RAM

Solution:

Use smaller model (mistral or neural-chat instead of llama2:70b)
Close other applications
Try ollama serve --num-gpu 0 to use CPU only (slower)

Generation is Very Slow

Cause: Model running on CPU instead of GPU, or model too large

Solution:

# Check if GPU is enabled (look for "metal" or "cuda")
ollama serve

# If using CPU only:
# - Reduce context size
# - Use smaller model (mistral)
# - Add GPU support (check ollama.ai for hardware)

Model Downloaded But Not Appearing

Solution:

# Refresh model list
ollama list

# If still not showing, restart Ollama
pkill ollama
ollama serve

Advanced: Custom Model Files

For advanced users, you can use custom quantized models:

# Create Modelfile (custom model config)
cat > Modelfile << EOF
FROM ./model.gguf
PARAMETER temperature 0.7
PARAMETER top_p 0.9
EOF

# Create custom model
ollama create mymodel -f Modelfile

# Use in generation
ollama run mymodel "prompt"

Performance Tips

1. Use GPU Acceleration

Ollama automatically uses GPU if available:

macOS: Metal acceleration (automatic on Apple Silicon)
Linux/Windows: NVIDIA CUDA (install CUDA toolkit)
Check: Look for "metal" or "cuda" in ollama serve output

2. Adjust Context Window

Smaller context = faster generation:

# Generate with smaller context (faster)
curl http://localhost:11434/api/generate \
  -d '{
    "model": "llama2",
    "prompt": "...",
    "num_ctx": 512
  }'

3. Use Streaming for Feedback

For long-running generation, stream responses:

# Stream mode (get output as it generates)
curl http://localhost:11434/api/generate \
  -d '{
    "model": "llama2",
    "prompt": "...",
    "stream": true
  }'

4. Pre-warm Model

First inference is slower. Subsequent requests are faster:

# Pre-warm to cache model in memory
ollama run llama2 "What is llms.txt?" > /dev/null

Model Selection Decision Tree

Do you have GPU acceleration?
├─ Yes (GPU)
│  └─ Use: llama2 (best quality/speed balance)
│     └─ Or: dolphin-mixtral (maximum quality)
│
└─ No (CPU only)
   ├─ Generation takes >10 seconds?
   │  └─ Use: mistral (fast, good quality)
   │
   └─ Generation is fast enough?
      └─ Use: neural-chat (excellent for descriptions)

Configuration for llmstxt-generator

In .claude/llmstxt-generator.local.md:

# Ollama Settings

## Connection
- host: http://localhost:11434
- timeout: 60  # seconds

## Model Selection
- default_model: llama2
- allow_model_selection: true

## Generation Parameters
- temperature: 0.7        # 0-1: Lower = deterministic, Higher = creative
- top_p: 0.9             # Nucleus sampling (0-1)
- top_k: 40              # Diversity control
- num_ctx: 2048          # Context window size
- num_predict: 512       # Max tokens to generate
- repeat_penalty: 1.1    # Prevent repetition

Next Steps

Install Ollama - https://ollama.ai/download
Pull a model - ollama pull llama2
Generate llms.txt - Use /llmstxt:generate in Claude Code
Monitor performance - Adjust settings based on speed/quality

Resources

Ollama Official: https://ollama.ai
Model Library: https://ollama.ai/library
GitHub: https://github.com/ollama/ollama
Community: https://discord.com/invite/ollama
Model Benchmarks: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard

ollama-integration

Invocation

Context Preview

SKILL.md

ollama-integration

Invocation

Context Preview

SKILL.md

Ollama Integration Guide

What is Ollama?

Installation

macOS

Linux

Windows

Verify Installation

Model Selection for llms.txt Generation

Recommended Models

llama2 (13B) - ⭐ BEST FOR LLMS.TXT

mistral (7B) - Fast & Capable

neural-chat (7B) - Conversational

dolphin-mixtral (8x7B MoE) - Advanced

Not Recommended for llms.txt

Downloading Models

Managing Disk Space

Starting Ollama Server

Automatic (macOS with App)

Manual Start (Linux/macOS)

With Docker

API Connection

Default Configuration

Custom Configuration

Testing Connection

Model Performance Guide

Generation Speed

Output Quality Examples

Troubleshooting

Error: "Cannot connect to Ollama at localhost:11434"

Error: "Model 'llama2' not found"

Error: "Out of memory"

Generation is Very Slow

Model Downloaded But Not Appearing

Advanced: Custom Model Files

Performance Tips

1. Use GPU Acceleration

2. Adjust Context Window

3. Use Streaming for Feedback

4. Pre-warm Model

Model Selection Decision Tree

Configuration for llmstxt-generator

Next Steps

Resources

See Also

Similar Skills

Ollama Integration Guide

What is Ollama?

Installation

macOS

Linux

Windows

Verify Installation

Model Selection for llms.txt Generation

Recommended Models

llama2 (13B) - ⭐ BEST FOR LLMS.TXT

mistral (7B) - Fast & Capable

neural-chat (7B) - Conversational

dolphin-mixtral (8x7B MoE) - Advanced

Not Recommended for llms.txt

Downloading Models

Managing Disk Space

Starting Ollama Server

Automatic (macOS with App)

Manual Start (Linux/macOS)

With Docker

API Connection

Default Configuration

Custom Configuration

Testing Connection

Model Performance Guide

Generation Speed

Output Quality Examples