Skill

databases

PostgreSQL databases on Giratina - PubMed, Legal, StackOverflow, ArXiv. Use when querying research data, legal documents, programming solutions, or scientific papers.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/neely-brain-dump:databases

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

Bashmcp__postgres__query

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

**Server:** 192.168.1.100

SKILL.md

364 lines · ~2.2k tokens

Stats

LanguageJavaScript

Parent stars0

MaintenanceGood

Last CommitFeb 7, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

PostgreSQL Databases (Giratina)

Server: 192.168.1.100 PostgreSQL Version: 17.6 Location: /mnt/raid6/databases/postgresql/17/main Storage: 13TB RAID6 array (4.9TB available)

Database Summary

Database	Size	Records	Primary Use
pmc_fulltext	227 GB	4.5M articles, 255M citations	Medical/scientific research
legal_db	81 GB	9.2M documents	Legal document search
stackoverflow_complete	53 GB	18.5M solutions	Programming Q&A
arxiv_papers	16 GB	2.8M papers	Scientific preprints
TOTAL	377 GB	~340M records

pmc_fulltext (227 GB) - PubMed Central Research

Medical and scientific research articles with full citation network.

Stats

4,483,066 articles with full-text (68 GB raw text)
255,138,874 citations - massive citation network
88,560,660 author records
35 tables including network analysis, co-citation graphs
Query performance: 23ms for indexed queries

Key Tables

-- Articles with full text
SELECT id, title, abstract, full_text FROM articles LIMIT 10;

-- Citation network
SELECT * FROM citations WHERE source_pmid = '12345678';

-- Author search
SELECT * FROM authors WHERE name ILIKE '%Smith%';

Example Queries

-- Find articles by keyword in title
SELECT id, title, pub_date FROM articles
WHERE title ILIKE '%cancer%' AND pub_date > '2020-01-01'
LIMIT 100;

-- Get citation count for an article
SELECT COUNT(*) FROM citations WHERE target_pmid = '12345678';

-- Find co-authors
SELECT DISTINCT a2.name FROM authors a1
JOIN article_authors aa1 ON a1.id = aa1.author_id
JOIN article_authors aa2 ON aa1.article_id = aa2.article_id
JOIN authors a2 ON aa2.author_id = a2.id
WHERE a1.name = 'John Smith' AND a2.name != 'John Smith';

legal_db (81 GB) - Legal Documents

Legal documents from the "Pile of Law" dataset with excellent full-text search.

Stats

9,206,663 documents (124 GB raw → 11 GB stored, 91% compression)
70 GB of indexes (full-text search, trigrams, hash indexes)
Query performance: 1ms for simple lookups

Document Categories

Category	Count
Caselaw	5.9M
Miscellaneous	2.2M
Patents	543K
Contracts	353K

Key Tables

-- Main documents table
SELECT id, title, category, text FROM documents LIMIT 10;

-- Full-text search
SELECT id, title FROM documents
WHERE to_tsvector('english', text) @@ to_tsquery('contract & breach');

Example Queries

-- Search by category
SELECT id, title, LEFT(text, 500) FROM documents
WHERE category = 'caselaw'
LIMIT 100;

-- Full-text search with ranking
SELECT id, title, ts_rank(to_tsvector('english', text), query) AS rank
FROM documents, to_tsquery('intellectual & property') query
WHERE to_tsvector('english', text) @@ query
ORDER BY rank DESC
LIMIT 20;

-- Find patents
SELECT id, title FROM documents
WHERE category = 'patents' AND title ILIKE '%machine learning%';

stackoverflow_complete (53 GB) - Programming Solutions

Programming Q&A from Stack Overflow.

Stats

18,563,455 solutions
12,718,160 unique questions
12 programming languages tracked
Full-text search enabled on answers
Query performance: ~26 seconds for complex queries (needs optimization)

Top Languages

Language	Solutions
JavaScript	3.4M
Java	2.6M
Python	1.9M
C#	1.5M
PHP	1.2M

Key Tables

-- Questions
SELECT id, title, tags, score FROM questions LIMIT 10;

-- Answers/Solutions
SELECT id, question_id, body, score FROM answers LIMIT 10;

Example Queries

-- Find Python questions by keyword
SELECT id, title, score FROM questions
WHERE tags LIKE '%python%' AND title ILIKE '%pandas%'
ORDER BY score DESC
LIMIT 20;

-- Get top answers for a question
SELECT body, score FROM answers
WHERE question_id = 12345
ORDER BY score DESC;

-- Find high-scoring solutions
SELECT q.title, a.body, a.score
FROM answers a
JOIN questions q ON a.question_id = q.id
WHERE q.tags LIKE '%python%' AND a.score > 100
ORDER BY a.score DESC
LIMIT 10;

arxiv_papers (16 GB) - Research Papers

Scientific preprints from ArXiv.

Stats

2,770,235 papers
2.2 GB of abstracts with full-text search
Multiple full-text search indexes
Query performance: 23ms for category searches

Key Tables

-- Papers
SELECT id, title, abstract, categories, authors FROM papers LIMIT 10;

Example Queries

-- Search by category (cs.AI, physics, math, etc.)
SELECT id, title, authors FROM papers
WHERE categories LIKE '%cs.AI%'
ORDER BY id DESC
LIMIT 100;

-- Full-text search in abstracts
SELECT id, title FROM papers
WHERE to_tsvector('english', abstract) @@ to_tsquery('transformer & attention')
LIMIT 50;

-- Find papers by author
SELECT id, title, categories FROM papers
WHERE authors ILIKE '%Hinton%';

-- Recent ML papers
SELECT id, title, abstract FROM papers
WHERE categories LIKE '%cs.LG%'
ORDER BY id DESC
LIMIT 20;

Connection Details

User Credentials

Read-Only User (Recommended for Scripts)

Username: readonly
Password: ReadOnly2025
Permissions: SELECT only (cannot modify data)

Admin User (Local Only)

Username: postgres
Network access: Not configured (use locally on Giratina)

From Giratina (localhost)

psql -U postgres -d pmc_fulltext
psql -U postgres -d legal_db
psql -U postgres -d stackoverflow_complete
psql -U postgres -d arxiv_papers

From Other Nodes (192.168.1.x)

# Read-only access (recommended for all network connections)
PGPASSWORD='ReadOnly2025' psql -h 192.168.1.100 -U readonly -d pmc_fulltext

Connection String

postgresql://readonly:[email protected]:5432/DATABASE_NAME

Note: No special characters in password - exclamation marks cause md5 auth failures with psycopg2.

Code Examples

Python (psycopg2)

import psycopg2

conn = psycopg2.connect(
    host="192.168.1.100",
    port=5432,
    database="pmc_fulltext",
    user="readonly",
    password="ReadOnly2025"
)
cursor = conn.cursor()
cursor.execute("SELECT COUNT(*) FROM articles")
print(cursor.fetchone())

Python (SQLAlchemy + Pandas)

from sqlalchemy import create_engine
import pandas as pd

engine = create_engine(
    "postgresql://readonly:[email protected]:5432/legal_db"
)
df = pd.read_sql("SELECT * FROM documents LIMIT 1000", engine)

Node.js (pg)

const { Client } = require('pg');

const client = new Client({
  host: '192.168.1.100',
  port: 5432,
  database: 'stackoverflow_complete',
  user: 'readonly',
  password: 'ReadOnly2025'
});

await client.connect();
const res = await client.query('SELECT COUNT(*) FROM solutions');
console.log(res.rows);

Server-Side Cursors (Large Datasets)

import psycopg2

conn = psycopg2.connect(
    host="192.168.1.100",
    database="pmc_fulltext",
    user="readonly",
    password="ReadOnly2025"
)

# Named cursor = server-side cursor (streams results)
cursor = conn.cursor(name='fetch_large_data')
cursor.execute("SELECT * FROM citations WHERE cited_year > 2020")

# Fetch in batches to avoid memory issues
while True:
    rows = cursor.fetchmany(10000)  # 10k rows at a time
    if not rows:
        break
    for row in rows:
        process(row)

Network Access

Local Network (192.168.1.x) ✅

Direct connection works from any cluster node:

Silvally (192.168.1.52)
Hoopa (192.168.1.79)
Victini (192.168.1.115)
Talon (192.168.1.7)
Any laptop/desktop on the network

External Access

Use Cloudflare Tunnel:

cloudflared access tcp --hostname giratina-db.built-simple.ai --url localhost:5432 &
psql -h localhost -p 5432 -U readonly -d pmc_fulltext

Performance Tips

Operation	Speed	Notes
Simple SELECT by ID	~1ms	Using indexes
Category filter	~23ms	Indexed columns
Full-text search	Fast	GIN indexes enabled
Complex joins	Varies	Use EXPLAIN ANALYZE

Best Practices:

Use readonly user for analysis scripts
Fetch only needed columns (avoid SELECT *)
Filter on server side, not in Python
Use server-side cursors for large results
Use parallel processing for big jobs

Backup Files (on RAID6)

File	Size
legal_db.sql.gz	40 GB
legal_documents_20250828_095834.parquet.7z	78 GB
legal_master_duckdb.db.7z	39 GB

databases

Invocation

Tool Access

Context Preview

SKILL.md

databases

Invocation

Tool Access

Context Preview

SKILL.md

PostgreSQL Databases (Giratina)

Database Summary

pmc_fulltext (227 GB) - PubMed Central Research

Stats

Key Tables

Example Queries

legal_db (81 GB) - Legal Documents

Stats

Document Categories

Key Tables

Example Queries

stackoverflow_complete (53 GB) - Programming Solutions

Stats

Top Languages

Key Tables

Example Queries

arxiv_papers (16 GB) - Research Papers

Stats

Key Tables

Example Queries

Connection Details

User Credentials

From Giratina (localhost)

From Other Nodes (192.168.1.x)

Connection String

Code Examples

Python (psycopg2)

Python (SQLAlchemy + Pandas)

Node.js (pg)

Server-Side Cursors (Large Datasets)

Network Access

Local Network (192.168.1.x) ✅

External Access

Performance Tips

Backup Files (on RAID6)

Similar Skills

PostgreSQL Databases (Giratina)

Database Summary

pmc_fulltext (227 GB) - PubMed Central Research

Stats

Key Tables

Example Queries

legal_db (81 GB) - Legal Documents

Stats

Document Categories

Key Tables

Example Queries

stackoverflow_complete (53 GB) - Programming Solutions

Stats

Top Languages

Key Tables

Example Queries

arxiv_papers (16 GB) - Research Papers

Stats

Key Tables

Example Queries

Connection Details

User Credentials

From Giratina (localhost)

From Other Nodes (192.168.1.x)

Connection String

Code Examples

Python (psycopg2)

Python (SQLAlchemy + Pandas)

Node.js (pg)

Server-Side Cursors (Large Datasets)

Network Access

Local Network (192.168.1.x) ✅

External Access

Performance Tips