From gpcam
Runs exact Gaussian process inference on datasets from 10k to millions of points using Wendland kernels, Dask distributed computing, and sparse linear algebra.
How this skill is triggered — by the user, by Claude, or both
Slash command
/gpcam:gp2scale-advancedThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Design experiments with tens of thousands to millions of data points using gpCAM's gp2Scale mode for exact GP computation at scale.
Design experiments with tens of thousands to millions of data points using gpCAM's gp2Scale mode for exact GP computation at scale.
gp2Scale uses:
imate)from distributed import Client
from gpcam import GPOptimizer
# Start a local Dask cluster
client = Client() # uses all available cores
client.wait_for_workers(4) # good practice: wait for workers before constructing
gpo = GPOptimizer(
x_data=x_data,
y_data=y_data,
gp2Scale=True,
dask_client=client,
gp2Scale_batch_size=500, # typical; tune up for large clusters
init_hyperparameters=np.array([0.73, 0.0014]), # signal var, length scale
)
gpo.train(hyperparameter_bounds=hps_bounds, max_iter=25, info=True)
When gp2Scale=True, the kernel MUST produce a sparse matrix. The default switches to an anisotropic Wendland kernel automatically if no custom kernel is provided.
If providing a custom kernel, it must have compact support:
from gpcam.kernels import wendland_anisotropic
def my_gp2scale_kernel(x1, x2, hps):
"""Custom kernel with compact support for gp2Scale."""
return wendland_anisotropic(x1, x2, hps)
hps[0] = signal variance
hps[1:D+1] = per-dimension length scales (also control support radius)
The length scales in the Wendland kernel also determine the support radius — points further apart than the length scale have zero covariance.
| Mode | Description |
|---|---|
"Chol" | Sparse Cholesky — fastest for moderate sparsity |
"sparseLU" | Sparse LU decomposition |
"sparseCG" | Conjugate gradient (iterative) |
"sparseMINRES" | MINRES (iterative) |
For expensive gp2Scale likelihoods, standard method="mcmc" training may be too slow. gpCAM exposes a block Metropolis-Hastings sampler you can drive directly against the GP's log-likelihood:
import numpy as np
from gpcam import gpMCMC, ProposalDistribution
def in_bounds(v, bounds):
return not (any(v < bounds[:, 0]) or any(v > bounds[:, 1]))
def prior_function(theta, args):
return 0.0 if in_bounds(theta, args["bounds"]) else -np.inf
def log_likelihood(hps, args):
return gpo.log_likelihood(hyperparameters=hps) # exposed on GPOptimizer
pd = ProposalDistribution([0, 1], init_prop_Sigma=np.identity(2) * 0.01)
mcmc = gpMCMC(log_likelihood, prior_function, [pd],
args={"bounds": hps_bounds})
result = mcmc.run_mcmc(x0=np.array([1.0, 0.01]), n_updates=200, info=True)
gpo.set_hyperparameters(result["mean(x)"])
ProposalDistribution takes the list of hyperparameter indices in that block and an initial proposal covariance. Stack multiple ProposalDistributions for block-wise updates of high-dimensional hyperparameter vectors (deep kernels, etc.).
from dask_jobqueue import SLURMCluster
from distributed import Client
cluster = SLURMCluster(
cores=32,
memory="64GB",
walltime="01:00:00",
)
cluster.scale(jobs=4) # 4 nodes × 32 cores
client = Client(cluster)
from gpcam.gp_optimizer import gp2Scale_time_estimate
gp2Scale_time_estimate(n_workers=8, worker_speed=500, n_data=100000)
npx claudepluginhub lbl-camera/gpcam --plugin gpcamTranslates scientist's experiment descriptions into gpCAM scripts for autonomous adaptive sampling, peak-finding, and parameter optimization.
Scales pandas/NumPy workflows to larger-than-memory datasets using Dask's parallel DataFrames, arrays, and delayed task graphs for single-machine or cluster execution.
Processes larger-than-RAM datasets in parallel with Dask's DataFrames (parallel pandas), Arrays (parallel NumPy), Bags, Futures, Schedulers. Scales from laptop to HPC clusters.