Searches and retrieves 3D biomolecular structures from RCSB PDB by text, sequence, or structural similarity, and downloads coordinates in PDB/mmCIF format with metadata.
How this skill is triggered — by the user, by Claude, or both
Slash command
/alterlab-writing-tools:alterlab-pdbThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
RCSB PDB is the worldwide repository for 3D structural data of biological macromolecules. Search for structures, retrieve coordinates and metadata, perform sequence and structure similarity searches across 200,000+ experimentally determined structures and computed models.
RCSB PDB is the worldwide repository for 3D structural data of biological macromolecules. Search for structures, retrieve coordinates and metadata, perform sequence and structure similarity searches across 200,000+ experimentally determined structures and computed models.
This skill should be used when:
Find PDB entries using various search criteria:
Text Search: Search by protein name, keywords, or descriptions
from rcsbapi.search import TextQuery
query = TextQuery("hemoglobin")
results = list(query())
print(f"Found {len(results)} structures")
Attribute Search: Query specific properties (organism, resolution, method, etc.)
from rcsbapi.search import AttributeQuery
from rcsbapi.search import search_attributes as attrs
# Find human protein structures (idiomatic form — recommended)
query = attrs.rcsb_entity_source_organism.scientific_name == "Homo sapiens"
results = list(query())
# OR explicit AttributeQuery with a dotted-path STRING (not the Attr object):
query = AttributeQuery(
attribute="rcsb_entity_source_organism.scientific_name",
operator="exact_match",
value="Homo sapiens",
)
results = list(query())
Sequence Similarity: Find structures similar to a given sequence
from rcsbapi.search import SeqSimilarityQuery
query = SeqSimilarityQuery(
value="MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHHYREQIKRVKDSEDVPMVLVGNKCDLPSRTVDTKQAQDLARSYGIPFIETSAKTRQGVDDAFYTLVREIRKHKEKMSKDGKKKKKKSKTKCVIM",
evalue_cutoff=0.1,
identity_cutoff=0.9,
sequence_type="protein"
)
results = list(query())
Structure Similarity: Find structures with similar 3D geometry
from rcsbapi.search import StructSimilarityQuery
query = StructSimilarityQuery(
structure_search_type="entry",
entry_id="4HHB" # Hemoglobin
)
results = list(query())
Combining Queries: Use logical operators to build complex searches
from rcsbapi.search import search_attributes as attrs
# High-resolution human proteins
query1 = attrs.rcsb_entity_source_organism.scientific_name == "Homo sapiens"
query2 = attrs.rcsb_entry_info.resolution_combined < 2.0
combined_query = query1 & query2 # AND operation
results = list(combined_query())
Access detailed information about specific PDB entries:
Basic Entry Information:
from rcsbapi.data import DataQuery
# Get entry-level data
query = DataQuery(
input_type="entries",
input_ids=["4HHB"],
return_data_list=["struct.title", "exptl.method"],
)
data = query.exec() # in Python 3.14+/Jupyter: await query.exec()
entry = data["data"]["entries"][0]
print(entry["struct"]["title"])
print(entry["exptl"][0]["method"])
Polymer Entity Information:
from rcsbapi.data import DataQuery
# Get protein/nucleic acid information
query = DataQuery(
input_type="polymer_entities",
input_ids=["4HHB_1"],
return_data_list=["entity_poly.pdbx_seq_one_letter_code"],
)
data = query.exec()
entity = data["data"]["polymer_entities"][0]
print(entity["entity_poly"]["pdbx_seq_one_letter_code"])
Building Queries (GraphQL under the hood):
from rcsbapi.data import DataQuery
# DataQuery builds the GraphQL query for you from input_type/input_ids/return_data_list;
# there is no separate fetch(query_type="graphql", ...) entry point.
query = DataQuery(
input_type="entries",
input_ids=["4HHB"],
return_data_list=[
"struct.title",
"exptl.method",
"rcsb_entry_info.resolution_combined",
"rcsb_entry_info.deposited_atom_count",
],
)
# Inspect the auto-generated GraphQL / open it in the editor:
print(query.get_editor_link())
data = query.exec()
Retrieve coordinate files in various formats:
Download Methods:
https://files.rcsb.org/download/{PDB_ID}.pdbhttps://files.rcsb.org/download/{PDB_ID}.cifhttps://files.rcsb.org/download/{PDB_ID}.pdb1 (for assembly 1)Example Download:
import requests
pdb_id = "4HHB"
# Download PDB format
pdb_url = f"https://files.rcsb.org/download/{pdb_id}.pdb"
response = requests.get(pdb_url)
with open(f"{pdb_id}.pdb", "w") as f:
f.write(response.text)
# Download mmCIF format
cif_url = f"https://files.rcsb.org/download/{pdb_id}.cif"
response = requests.get(cif_url)
with open(f"{pdb_id}.cif", "w") as f:
f.write(response.text)
Common operations with retrieved structures:
Parse and Analyze Coordinates: Use BioPython or other structural biology libraries to work with downloaded files:
from Bio.PDB import PDBParser
parser = PDBParser()
structure = parser.get_structure("protein", "4HHB.pdb")
# Iterate through atoms
for model in structure:
for chain in model:
for residue in chain:
for atom in residue:
print(atom.get_coord())
Extract Metadata:
from rcsbapi.data import DataQuery
# Get experimental details
query = DataQuery(
input_type="entries",
input_ids=["4HHB"],
return_data_list=[
"rcsb_entry_info.resolution_combined",
"exptl.method",
"rcsb_accession_info.deposit_date",
],
)
data = query.exec()["data"]["entries"][0]
resolution = data.get("rcsb_entry_info", {}).get("resolution_combined")
method = data.get("exptl", [{}])[0].get("method")
deposition_date = data.get("rcsb_accession_info", {}).get("deposit_date")
print(f"Resolution: {resolution} Å")
print(f"Method: {method}")
print(f"Deposited: {deposition_date}")
Process multiple structures efficiently:
from rcsbapi.data import DataQuery
pdb_ids = ["4HHB", "1MBN", "1GZX"] # Hemoglobin, myoglobin, etc.
# A single DataQuery can fetch all entries at once
query = DataQuery(
input_type="entries",
input_ids=pdb_ids,
return_data_list=[
"rcsb_id",
"struct.title",
"rcsb_entry_info.resolution_combined",
"rcsb_entity_source_organism.scientific_name",
],
)
results = {}
for data in query.exec()["data"]["entries"]:
pdb_id = data["rcsb_id"]
results[pdb_id] = {
"title": data["struct"]["title"],
"resolution": data.get("rcsb_entry_info", {}).get("resolution_combined"),
"organism": data.get("rcsb_entity_source_organism", [{}])[0].get("scientific_name")
}
# Display results
for pdb_id, info in results.items():
print(f"\n{pdb_id}: {info['title']}")
print(f" Resolution: {info['resolution']} Å")
print(f" Organism: {info['organism']}")
Install the official RCSB PDB Python API client:
# Current recommended package
uv pip install rcsb-api
# For legacy code (deprecated, use rcsb-api instead)
uv pip install rcsbsearchapi
The rcsb-api package provides unified access to both Search and Data APIs through the rcsbapi.search and rcsbapi.data modules.
PDB ID: Unique 4-character identifier (e.g., "4HHB") for each structure entry. AlphaFold and ModelArchive entries start with "AF_" or "MA_" prefixes.
mmCIF/PDBx: Modern file format that uses key-value structure, replacing legacy PDB format for large structures.
Biological Assembly: The functional form of a macromolecule, which may contain multiple copies of chains from the asymmetric unit.
Resolution: Measure of detail in crystallographic structures (lower values = higher detail). Typical range: 1.5-3.5 Å for high-quality structures.
Entity: A unique molecular component in a structure (protein chain, DNA, ligand, etc.).
This skill includes reference documentation in the references/ directory:
Comprehensive API documentation covering:
Use this reference when you need in-depth information about API capabilities, complex query construction, or detailed data schema information.
npx claudepluginhub alterlab-ieu/alterlab-academic-skills --plugin alterlab-visualizationSearches RCSB PDB for 3D protein/nucleic acid structures by text, sequence, or structure similarity. Downloads coordinates and retrieves metadata for structural biology and drug discovery.
Queries RCSB PDB (200K+ structures) via REST + GraphQL APIs using plain requests. Search by text, attribute, sequence, or 3D similarity; retrieve metadata; download PDB/mmCIF files.
Retrieves protein structures from RCSB PDB, PDBe, and AlphaFold with disambiguation, quality assessment (resolution, R-factor, pLDDT), and metadata. Useful for structure-quality comparison and selecting structures for drug design or modeling.