Customize ClickHouse Assistant chat with a semantic layer

The ClickHouse Assistant chat agent can be customized to understand your specific business logic, data structures, and domain knowledge through AGENTS.md—a special saved query that acts as a semantic layer over the agent's system prompt.

By creating an AGENTS.md file, you can provide custom instructions that are injected at the start of every conversation to guide SQL query generation and data analysis based on your organization's unique requirements, calculations, and conventions.

How it works

When you save a query named "AGENTS.md" (case-sensitive) in the Cloud Console:

The ClickHouse Assistant chat agent automatically loads this file when a message is sent
The content is placed within a structured content tag and injected into the agent's system prompt
The instructions are applied to all ClickHouse Assistant chat conversations in that service

Creating AGENTS.md

Create the saved query

In the Cloud Console, create a new query
Name it exactly: "AGENTS.md" (case-sensitive)
Write your custom instructions in the query text editor (not actual SQL)
Save the query

Add your instructions

Structure your instructions using clear, actionable language. Include:

Business rules and calculations
Data structure guidance
Domain-specific terminology
Common query patterns
Performance optimization rules

Best practices

Treat context as a finite resource

Context is precious—every token depletes the agent's "attention budget." Like humans with limited working memory, language models experience performance degradation as context grows. This means finding the smallest possible set of high-signal tokens that maximize the likelihood of your desired outcome.

Find the right altitude

Strike a balance between two extremes:

Too specific: Hard coding brittle if-else logic that creates fragility and maintenance complexity
Too vague: High-level guidance that fails to give concrete signals or falsely assumes shared context

The optimal altitude is specific enough to guide behavior effectively, yet flexible enough for the model to apply strong heuristics. Start with a minimal prompt on the best model available, then add clear instructions based on observed failure modes.

Organize with structured sections

Use XML tags or Markdown headers to create distinct, scannable sections:

<background_information>
Context about your data and domain
</background_information>

<calculation_rules>
Specific formulas and business logic
</calculation_rules>

<tool_guidance>
How to use specific ClickHouse features
</tool_guidance>

Provide diverse, canonical examples

Examples are the "pictures worth a thousand words." Rather than stuffing every edge case into your prompt, curate a focused set of diverse examples that effectively portray expected behavior.

Keep it minimal yet complete

Include only frequently-needed instructions
Be concise—larger context degrades performance due to "context rot"
Remove outdated or rarely-used rules
Ensure sufficient information to guide desired behavior

Tip

Minimal doesn't necessarily mean short. You need enough detail to ensure the agent adheres to expected behavior, just avoid unnecessary verbosity.

Example: Calculated Metrics from raw data

Guide the agent when metrics require specific calculations rather than direct column access:

<metric_calculations>
IMPORTANT: "active_sessions" is NOT a column. It must be calculated.

To calculate active sessions:
COUNT(DISTINCT session_id || '|' || user_id) AS active_sessions

This counts unique combinations of session and user identifiers.

When the user asks for "active sessions" or "session count", always use this formula:
SELECT
    date,
    COUNT(DISTINCT session_id || '|' || user_id) AS active_sessions
FROM events
GROUP BY date;

</metric_calculations>

Example: Business logic rules

Define domain-specific calculations and categorizations:

<business_rules>
Revenue Calculation:
- Exclude refunded transactions: WHERE transaction_status != 'refunded'
- Apply regional tax rates using CASE expressions
- Use MRR for subscriptions:
  SUM(CASE
    WHEN billing_cycle = 'monthly' THEN amount
    WHEN billing_cycle = 'yearly' THEN amount / 12
    ELSE 0
  END) AS mrr

Traffic Source Classification:
Use CASE expression to categorize:
CASE
  WHEN traffic_source IN ('google', 'bing', 'organic') THEN 'Organic Search'
  WHEN traffic_source IN ('facebook', 'instagram', 'social') THEN 'Social Media'
  WHEN traffic_source = 'direct' THEN 'Direct'
  ELSE 'Other'
END AS source_category

Customer Segmentation:
- Enterprise: annual_contract_value >= 100000
- Mid-Market: annual_contract_value >= 10000 AND annual_contract_value < 100000
- SMB: annual_contract_value < 10000

Always include these categorizations when generating traffic or revenue reports.
</business_rules>

Example: Data structure quirks

Document unconventional data formats or legacy schema decisions:

<data_structure_notes>
The user_status column uses numeric codes, not strings:
- 1 = 'active'
- 2 = 'inactive'
- 3 = 'suspended'
- 99 = 'deleted'

When filtering or displaying user status, always use:
CASE user_status
  WHEN 1 THEN 'active'
  WHEN 2 THEN 'inactive'
  WHEN 3 THEN 'suspended'
  WHEN 99 THEN 'deleted'
END AS status_label

The product_metadata column contains JSON strings that must be parsed:
SELECT
    product_id,
    JSONExtractString(product_metadata, 'category') AS category,
    JSONExtractInt(product_metadata, 'inventory_count') AS inventory
FROM products;
</data_structure_notes>

Example: domain terminology

Map business terms to technical implementation:

<terminology>
When users refer to "conversions", they mean:
- For e-commerce: transactions WHERE transaction_type = 'purchase'
- For SaaS: subscriptions WHERE subscription_status = 'active' AND first_payment_date IS NOT NULL

"Churn" is calculated as:
COUNT(DISTINCT user_id) WHERE last_active_date < today() - INTERVAL 90 DAY
AND previous_subscription_status = 'active'

"DAU" (Daily Active Users) means:
COUNT(DISTINCT user_id) WHERE activity_date = today()

"Qualified leads" must meet ALL criteria:
- lead_score >= 70
- company_size >= 50
- budget_confirmed = true
- contact_role IN ('Director', 'VP', 'C-Level')
</terminology>

How it works​

Creating AGENTS.md​

Create the saved query​

Add your instructions​

Best practices​

Treat context as a finite resource​

Find the right altitude​

Organize with structured sections​

Provide diverse, canonical examples​

Keep it minimal yet complete​

Example: Calculated Metrics from raw data​

Example: Business logic rules​

Example: Data structure quirks​

Example: domain terminology​