From dak
Guides Jupyter notebook usage for data analysis, exploration, and visualization with BigQuery. Covers best practices for execution, library install, structuring notebooks, data cleaning, plotting, and %%bqsql magics.
How this skill is triggered — by the user, by Claude, or both
Slash command
/dak:notebook-guidanceThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Before choosing to use a notebook, evaluate the task complexity using these
Before choosing to use a notebook, evaluate the task complexity using these heuristics.
Use a notebook if you meet at least one of these criteria:
Do NOT use a notebook ONLY if:
Golden Rule of Data Storytelling: If any analytical insight, trend, or comparison is involved, favor a notebook and a visualization. A notebook is the "standard" environment for our developer workflow; do not avoid it because of "overhead".
[!IMPORTANT]
Agent execution rules: Your behavior MUST depend on whether the
notebook_execute_celltool is available in your current context: * If notebookexecute_celltool is available: You MUST follow the incremental GENERATE CELL -> EXECUTE CELL -> VALIDATE flow. * If notebookexecute_celltool is NOT available: You MUST generate the complete notebook and request user execution.
execute_cell tool is available: Follow the STEP BY
STEP GENERATE CELL -> EXECUTE CELL -> VALIDATE OUTPUT flow. Generate
ONE cell, execute it, then verify the output. If the output is data
(e.g. a dataframe), you MUST inspect it to confirm the logic is correct
before generating the next step. Batch generation of an entire notebook
is strictly prohibited because error propagation in notebooks is
expensive to fix.execute_cell tool is NOT available:
@skill:discovering-gcp-data-assets or
BigQuery list tools to find the correct project.dataset.table before
writing ANY code. If the table ID is missing, ask the user.%%bqsql magic cell followed immediately by a Python visualization
cell for those results). Use descriptive markdown cells to separate and
document different logical sections.Notebooks run in specific Kernels (execution backends). You MUST ensure the
kernel’s Python environment contains the necessary libraries (bigframes,
ipykernel, etc.).
@skill:managing-python-dependencies to verify if
a virtual environment exists. If not, create one. Ensure ipykernel is
installed in that environment. Install any other relevant libraries.[!IMPORTANT]
HARD STOP on kernel failure: If a cell execution returns "no active kernel" or any kernel-not-found error, you MUST stop immediately. Do NOT scaffold, generate, or insert any further cells. Inform the user which kernel is needed (e.g., PySpark / Dataproc Serverless) and wait for explicit confirmation that a kernel is active before proceeding with notebook execution.
Before installing any python libraries, you MUST use
@skill:managing-python-dependencies to detect how python dependencies are
managed in the project.
Since these are often ephemeral or managed by GCP:
%pip install cell, run
%pip list or import <package> to confirm the package is not already
present. Managed runtimes (Dataproc Serverless, Colab) pre-install many
common packages. Only install what is confirmed missing.%pip install <package> in the first cell if a package is confirmed
missing and it's the only way to modify the runtime.When in doubt about the kernel type or preferred installation method, ask the user for clarification.
Guidelines for performing exploratory data analysis, data cleaning, and visualization in notebooks.
The notebook should read like a story. While you have flexibility (e.g., multiple visualizations for one data cell, or data cells building on each other), aim for this general flow:
# Retention Analysis)## Exploring User Retention)%%bqsql
magics)
df.head() or assert sanity checks.df.plot()).Repeat steps 2-5 for each new sub-topic or insight. You can have multiple Data cells before a Visualization, or multiple Visualizations from one Data cell. The key is to keep them grouped logically and separated by Markdown headers.
Final Summary (Markdown Cell)
Next Steps: After the notebook has been successfully executed and verified, and the summary is complete, notify the user and propose next step suggestions.
Refer to the following resources for guidance on specific notebook topics:
Use BigFrames magics %%bqsql for BigQuery SQL queries. These cells support
native BigQuery SQL execution and data export to BigFrames dataframes.
[!IMPORTANT]
- Unless specified by the user, always use SQL for querying BigQuery.
- DO NOT use the standard BigQuery Python client library (
google.cloud.bigquery) orpandas.read_gbq.- Mandatory dataframe export: Always provide a dataframe name e.g.
%%bqsql <df_name>. This makes it easy to use results in follow up Python cells.- Verify that
bigframesversion number2.38.0and above is installed in the notebook runtime environment. If it is missing, ask the user if they would like you to upgrade for them.
Example %%bqsql magic usage:
# Initialize BigFrames and load %%bqsql magics
import bigframes
import bigframes.pandas as bpd
%load_ext bigframes
[!CAUTION]
Always use
%load_ext bigframesexactly as shown. Do not load submodules — for example,%load_ext bigframes.magicsor%load_ext bigframes.bigqueryare not valid and must not be used.
[!IMPORTANT]
The
bigframeslibrary must be installed. Determine if bigframes needs to be installed by following @skill:managing-python-dependencies.
%%bqsql df_sample
SELECT * FROM `project.dataset.table` LIMIT 10
[!CAUTION]
- NO Python SDK for Queries: Do not switch to
client.query(sql).to_dataframe()if SQL fails. Fix the SQL syntax instead.- NO Mixing Logic: Do not put Python code in the same cell as
%%bqsqlmagics.
Magic cells with %%bqsql <df_name> produce a BigQuery DataFrame. In
subsequent cells, you can use <df_name> directly.
[!IMPORTANT]
You MUST use BigFrames for data exploration, manipulation, splitting etc. You MUST use BQML SQL or bigframes.ml for machine learning tasks. You MUST NOT use pandas or Scikit-learn.
.to_pandas(): You MUST NOT use .to_pandas() to download the
entire dataset into memory. There are some exceptions:
to_pandas().to_pandas()read_gbq() for SQL: Do not write SQL queries and execute them
with read_gbq(). Use BigFrames Dataframe/Series methods instead.bigframes.ml.df.col.str.*, df.col.dt.*) over
remote UDFs.Series.map() or DataFrame.apply()..dtypes after loading, and use display() with .head() or .peek().model.to_gbq(). To load a
persisted model, use bpd.read_gbq_model().Integration with machine learning workflows and best practices. - Guide: Use
@skill:ml-best-practices. - MUST READ WHEN: The task involves machine
learning, training a model, clustering, classification, regression, or
time-series forecasting.
If any "MUST READ WHEN" condition is met, you MUST read the corresponding guide before proceeding.
npx claudepluginhub gemini-cli-extensions/data-agent-kit-starter-pack --plugin dakProvides optimization, BigFrames Python, and BigQuery ML/AI guidance. Use for BigQuery SQL tuning, data manipulation, or BQML functions.
Generates Jupyter notebooks for FiftyOne workflows: getting-started guides, tutorials, recipes, and full ML pipelines. Automates notebook creation for data loading, exploration, inference, evaluation, and export.
Creates and edits reproducible Jupyter notebooks (.ipynb) for experiments, explorations, or tutorials using templates and helper script to avoid JSON errors.