From exasol
Exasol User Defined Functions (UDFs) and Script Language Containers (SLCs). Covers CREATE SCRIPT, SCALAR and SET functions, ExaIterator API, Python/Java/Lua/R scripts, BucketFS file access, GPU-accelerated UDFs, and building/deploying custom Script Language Containers with exaslct.
How this skill is triggered — by the user, by Claude, or both
Slash command
/exasol:exasol-udfsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Trigger when the user mentions **UDF**, **user defined function**, **CREATE SCRIPT**, **ExaIterator**, **SCALAR**, **SET EMITS**, **BucketFS**, **script language container**, **SLC**, **exaslct**, **custom packages**, **GPU UDF**, **ctx.emit**, **ctx.next**, **variadic script**, **dynamic parameters**, **EMITS(...)**, **default_output_columns**, or any UDF/SLC-related topic.
Trigger when the user mentions UDF, user defined function, CREATE SCRIPT, ExaIterator, SCALAR, SET EMITS, BucketFS, script language container, SLC, exaslct, custom packages, GPU UDF, ctx.emit, ctx.next, variadic script, dynamic parameters, EMITS(...), default_output_columns, or any UDF/SLC-related topic.
Use UDFs to extend SQL with custom logic that runs inside the Exasol cluster:
| SCALAR | SET | |
|---|---|---|
| Input | One row at a time | Group of rows (via GROUP BY) |
| Output | RETURNS <type> (single value) | EMITS (col1 TYPE, ...) (zero or more rows) |
| Row iteration | Not needed | ctx.next() loop required |
| SQL usage | SELECT udf(col) FROM t | SELECT udf(col) FROM t GROUP BY key |
| Use case | Per-row transforms | Aggregation, ML batch predict, multi-row emit |
| Language | Startup | Best For | Expandable via SLC? |
|---|---|---|---|
| Python 3 (3.10 or 3.12) | ~200ms | ML, data science, pandas, string processing | Yes |
| Java (11 or 17) | ~1s | Enterprise libs, type safety, Virtual Schema adapters | Yes |
| Lua 5.4 | <10ms | Low-latency transforms, row-level security | No (natively compiled into Exasol) |
| R (4.4) | ~200ms | Statistical modeling, R model deployment | Yes |
CREATE OR REPLACE PYTHON3 SCALAR SCRIPT my_schema.clean_text(input VARCHAR(10000))
RETURNS VARCHAR(10000) AS
import re
def run(ctx):
if ctx.input is None:
return None
return re.sub(r'[^\w\s]', '', ctx.input).strip().lower()
/
SELECT clean_text(description) FROM products;
CREATE OR REPLACE PYTHON3 SET SCRIPT my_schema.top_n(
item VARCHAR(200), score DOUBLE, n INT
)
EMITS (item VARCHAR(200), score DOUBLE) AS
def run(ctx):
rows = []
limit = ctx.n
while True:
rows.append((ctx.item, ctx.score))
if not ctx.next():
break
rows.sort(key=lambda x: x[1], reverse=True)
for item, score in rows[:limit]:
ctx.emit(item, score)
/
SELECT top_n(product, revenue, 5) FROM sales GROUP BY category;
CREATE OR REPLACE JAVA SCALAR SCRIPT my_schema.hash_value(input VARCHAR(2000))
RETURNS VARCHAR(64) AS
import java.security.MessageDigest;
class HASH_VALUE {
static String run(ExaMetadata exa, ExaIterator ctx) throws Exception {
String input = ctx.getString("input");
if (input == null) return null;
MessageDigest md = MessageDigest.getInstance("SHA-256");
byte[] hash = md.digest(input.getBytes("UTF-8"));
StringBuilder hex = new StringBuilder();
for (byte b : hash) hex.append(String.format("%02x", b));
return hex.toString();
}
}
/
CREATE OR REPLACE JAVA SCALAR SCRIPT my_schema.custom(input VARCHAR(2000))
RETURNS VARCHAR(2000) AS
%scriptclass com.mycompany.MyProcessor;
%jar /buckets/bfsdefault/default/jars/my-lib.jar;
/
CREATE OR REPLACE LUA SCALAR SCRIPT my_schema.my_avg(a DOUBLE, b DOUBLE)
RETURNS DOUBLE AS
function run(ctx)
if ctx.a == nil or ctx.b == nil then return null end
return (ctx.a + ctx.b) / 2
end
/
CREATE OR REPLACE R SET SCRIPT my_schema.predict(
feature1 DOUBLE, feature2 DOUBLE
)
EMITS (prediction DOUBLE) AS
run <- function(ctx) {
model <- readRDS("/buckets/bfsdefault/default/models/model.rds")
repeat {
if (!ctx$next_row(1000)) break
df <- data.frame(f1 = ctx$feature1, f2 = ctx$feature2)
ctx$emit(predict(model, newdata = df))
}
}
/
Use ... to accept any number of input columns, output columns, or both.
CREATE OR REPLACE PYTHON3 SCALAR SCRIPT schema.to_json(...) RETURNS VARCHAR(2000000) AS
import simplejson
def run(ctx):
obj = {}
for i in range(0, exa.meta.input_column_count, 2):
obj[ctx[i]] = ctx[i+1] # caller passes: name, value, name, value, ...
return simplejson.dumps(obj)
/
SELECT to_json('fruit', fruit, 'price', price) FROM products;
ctx[i] — 0-based in Python/Java, 1-based in Lua/R0, 1, 2, ... — never the original column namesexa.meta.input_column_count — total number of input columnsexa.meta.input_columns[i].name / .sql_type — per-column metadataEMITS(...))Declare EMITS(...) in CREATE SCRIPT. At call time, columns must be provided one of two ways:
| Method | Where specified | Use when |
|---|---|---|
| EMITS in SELECT | Caller's SQL query | Output structure depends on data values |
default_output_columns() | Script body | Output structure derivable from input column count/types alone |
-- EMITS in SELECT (required when output depends on data content)
SELECT split_csv(line) EMITS (a VARCHAR(100), b VARCHAR(100), c VARCHAR(100)) FROM t;
# default_output_columns() — called before run(), no ctx/data access available
def default_output_columns():
parts = []
for i in range(exa.meta.input_column_count):
parts.append("c" + exa.meta.input_columns[i].name + " " + exa.meta.input_columns[i].sql_type)
return ",".join(parts)
If neither is provided, the query fails with:
The script has dynamic return arguments. Either specify the return arguments in the query via EMITS or implement the method default_output_columns in the UDF.
| Method/Property | SCALAR | SET | Description |
|---|---|---|---|
ctx.<column> | yes | yes | Access input column value |
return value | yes | no | Return single value (RETURNS) |
ctx.emit(v1, v2, ...) | no | yes | Emit output row (EMITS) |
ctx.emit(dataframe) | no | yes | Emit DataFrame as rows |
ctx.next() | no | yes | Advance to next row; returns False at end |
ctx.size() | no | yes | Number of rows in current group |
ctx.reset() | no | yes | Reset iterator to first row |
ctx.get_dataframe(num_rows, start_col) | no | yes | Get rows as pandas DataFrame |
Important: There is no emit_dataframe() method — use ctx.emit(dataframe) to emit a DataFrame.
| Method | Description |
|---|---|
ctx.getString("col") | Get string value |
ctx.getInteger("col") | Get integer value |
ctx.getDouble("col") | Get double value |
ctx.getBigDecimal("col") | Get decimal value |
ctx.getDate("col") | Get date value |
ctx.getTimestamp("col") | Get timestamp value |
ctx.next() | Advance to next row (SET only) |
ctx.emit(v1, v2, ...) | Emit output row (SET only) |
ctx.size() | Row count in group (SET only) |
ctx.reset() | Reset to first row (SET only) |
All languages can read files from BucketFS at /buckets/<service>/<bucket>/<path>:
# Python — load a pickled ML model
import pickle
with open('/buckets/bfsdefault/default/models/model.pkl', 'rb') as f:
model = pickle.load(f)
// Java — reference JARs via %jar directive
%jar /buckets/bfsdefault/default/jars/my-library.jar;
Performance tip: Load models/resources once (outside the row loop or in a module-level variable), not per-row.
Exasol supports GPU-accelerated UDFs via CUDA-enabled Script Language Containers:
template-Exasol-8-python-3.{10,12}-cuda-conda flavorsUDFs run inside Script Language Containers — Docker-based runtime environments. The default SLC includes standard libraries. When you need additional packages (e.g., scikit-learn, PyTorch, custom JARs), build a custom SLC.
-- Activate for current session
ALTER SESSION SET SCRIPT_LANGUAGES='PYTHON3=localzmq+protobuf:///<bfs-name>/<bucket>/<path>/<container>?lang=python#buckets/<bfs-name>/<bucket>/<path>/<container>/exaudf/exaudfclient_py3';
-- Activate system-wide (requires admin)
ALTER SYSTEM SET SCRIPT_LANGUAGES='...';
pip install exasol-script-languages-container-tool
npx claudepluginhub exasol-labs/exasol-agent-skills --plugin exasolReferences Databricks SQL (DBSQL) advanced features including SQL scripting, stored procedures, materialized views, pipe syntax, geospatial functions, AI functions like ai_query, and data modeling patterns.
Guides SAP HANA SQLScript development for procedures, table functions, exception handling, cursors, performance optimization, and AMDP integration.
Guides Snowflake development with SQL best practices, data pipelines (Dynamic Tables, Streams, Tasks, Snowpipe), Cortex AI functions/Agents, Snowpark Python, dbt integration, performance tuning, and security hardening.