Ingesting with Vector
Vector is a high-performance, vendor-neutral observability data pipeline. It is commonly used to collect, transform, and route logs and metrics from a wide range of sources, and is especially popular for log ingestion due to its flexibility and low resource footprint.
When using Vector with ClickStack, users are responsible for defining their own schemas. These schemas may follow OpenTelemetry conventions, but they can also be entirely custom, representing user-defined event structures. In practice, Vector ingestion is most commonly used for logs, where users want full control over parsing and enrichment before data is written to ClickHouse.
This guide focuses on onboarding data into ClickStack using Vector for both ClickStack Open Source and Managed ClickStack. For simplicity, it does not cover Vector sources or pipeline configuration in depth. Instead, it focuses on configuring the sink that writes data into ClickHouse and ensuring the resulting schema is compatible with ClickStack.
The only strict requirement for ClickStack, whether using the open-source or managed deployment, is that the data includes a timestamp column (or equivalent time field), which can be declared when configuring the data source in the ClickStack UI.
Sending data with Vector
- Managed ClickStack
- OpenSource ClickStack
The following guide assumes you have already created a Managed ClickStack service and recorded your service credentials. If you haven't, follow the Getting Started guide for Managed ClickStack until promoted to configure Vector.
Create a database and table
Vector requires a table and schema to be defined prior to data ingestion.
First create a database. This can be done via the ClickHouse Cloud console.
In the example below, we use logs:
Create a table for your data. This should match the output schema of your data. The example below assumes a classic Nginx structure. Adjust accordingly to your data, adhering to schema best practices. We strongly recommend familiarizing yourself with the concept of Primary keys, selecting your primary key based on the guidelines outlined here.
The primary key above assumes typical access patterns in the ClickStack UI for Nginx logs, but may need to be adjusted depending on your workload in production environments.
Add ClickHouse sink to vector configuration
Modify your Vector configuration to include the ClickHouse sink, updating the inputs field to receive events from your existing pipelines.
This configuration assumes that your upstream Vector pipeline has already prepared the data to match the target ClickHouse schema, meaning that fields are parsed, named correctly, and typed appropriately for insertion. See the Nginx example below for a complete illustration of parsing and normalizing raw log lines into a schema suitable for ClickStack.
By default, we recommend using the json_each_row format, which encodes each event as a single JSON object per row. This is the default and recommended format for ClickStack when ingesting JSON data, and should be preferred over alternative formats such as JSON objects encoded as strings.
The ClickHouse sink also supports Arrow stream encoding (currently in beta). This can offer higher throughput but comes with important constraints: the database and table must be static, as the schema is fetched once at startup, and dynamic routing is not supported. For this reason, Arrow encoding is best suited for fixed, well-defined ingestion pipelines.
We recommend reviewing the available sink configuration options in the Vector documentation:
The example above uses the default user for Managed ClickStack. For production deployments, we recommend creating a dedicated ingestion user with appropriate permissions and limits.
Navigate to the ClickStack UI
Navigate to your Managed ClickStack service and select "ClickStack" from the left-hand menu. If you’ve already completed the onboarding, this will launch the ClickStack UI in a new tab, and you will be automatically authenticated. If not, you can proceed through the onboarding and select “Launch ClickStack” once you’ve selected Vector as your input source.
Create a datasource
Create a logs data source. If no data sources exist, you will be prompted to create one on your first login. Otherwise, navigate to Team Settings and add a new data source.
The configuration above assumes an Nginx-style schema with a time_local column used as the timestamp. This should be, where possible, the timestamp column declared in the primary key. This column is mandatory.
We also recommend updating the Default SELECT to explicitly define which columns are returned in the logs view. If additional fields are available, such as service name, log level, or a body column, these can also be configured. The timestamp display column can also be overridden if it differs from the column used in the table's primary key and configured above.
In the example above, a Body column does not exist in the data. Instead, it is defined using a SQL expression that reconstructs an Nginx log line from the available fields.
For other possible options, see the configuration reference.
Create a database and table
Vector requires a table and schema to be defined prior to data ingestion.
First create a database. This can be done via the ClickHouse Web user interface at http://localhost:8123/play. Use the default username and password api:api.
In the example below, we use logs:
Create a table for your data. This should match the output schema of your data. The example below assumes a classic Nginx structure. Adjust accordingly to your data, adhering to schema best practices. We strongly recommend familiarizing yourself with the concept of Primary keys, selecting your primary key based on the guidelines outlined here.
The primary key above assumes typical access patterns in the ClickStack UI for Nginx logs, but may need to be adjusted depending on your workload in production environments.
Add ClickHouse sink to vector configuration
Ingestion to ClickStack for Vector should occur directly to ClickHouse, bypassing the OTLP endpoint exposed by the collector.
Modify your Vector configuration to include the ClickHouse sink, updating the inputs field to receive events from your existing pipelines.
This configuration assumes that your upstream Vector pipeline has already prepared the data to match the target ClickHouse schema, meaning that fields are parsed, named correctly, and typed appropriately for insertion. See the Nginx example below for a complete illustration of parsing and normalizing raw log lines into a schema suitable for ClickStack.
By default, we recommend using the json_each_row format, which encodes each event as a single JSON object per row. This is the default and recommended format for ClickStack when ingesting JSON data, and should be preferred over alternative formats such as JSON objects encoded as strings.
The ClickHouse sink also supports Arrow stream encoding (currently in beta). This can offer higher throughput but comes with important constraints: the database and table must be static, as the schema is fetched once at startup, and dynamic routing is not supported. For this reason, Arrow encoding is best suited for fixed, well-defined ingestion pipelines.
We recommend reviewing the available sink configuration options in the Vector documentation:
The example above uses the api user for ClickStack Open Source. For production deployments, we recommend creating a dedicated ingestion user with appropriate permissions and limits. The above configuration also assumes that Vector is running on the same host as ClickStack. In production deployments, this is likely to be different. We would recommend sending data over the secure HTTPS port 8443.
Navigate to the ClickStack UI
Navigate to the ClickStack UI at http://localhost:8080. Create a user if you haven't completed the onboarding.
Create a datasource
Navigate to Team Settings and add a new data source.
The configuration above assumes an Nginx-style schema with a time_local column used as the timestamp. This should be, where possible, the timestamp column declared in the primary key. This column is mandatory.
We also recommend updating the Default SELECT to explicitly define which columns are returned in the logs view. If additional fields are available, such as service name, log level, or a body column, these can also be configured. The timestamp display column can also be overridden if it differs from the column used in the table's primary key and configured above.
In the example above, a Body column does not exist in the data. Instead, it is defined using a SQL expression that reconstructs an Nginx log line from the available fields.
For other possible options, see the configuration reference.
Example dataset with Vector
For a more complete example, we use an Nginx log file below.
- Managed ClickStack
- Open Source ClickStack
The following guide assumes you have already created a Managed ClickStack service and recorded your service credentials. If you haven't, follow the Getting Started guide for Managed ClickStack until promoted to configure Vector.
Installing Vector
Before proceeding, ensure that Vector is installed on the system where you plan to run your ingestion pipeline. Follow the official Vector installation guide to install a prebuilt binary or package appropriate for your environment:
Once installed, verify that the vector binary is available on your path before continuing with the configuration steps below.
This can be installed on the same instance as your ClickStack OTel collector.
Follow best practices for architecture and security when moving Vector to production.
Download the sample data
If you wish to experiment with a sample dataset, download the following example nginx sample.
This data has been collected from an Nginx instance configured to output logs in JSON format for easier parsing. For the Nginx configuration for these logs, see "Monitoring Nginx Logs with ClickStack".
Create a database and table
Vector requires a table and schema to be defined prior to data ingestion.
First create a database. This can be done via the ClickHouse Cloud console.
Create a database logs:
Create a table for your data.
The primary key above assumes typical access patterns in the ClickStack UI for Nginx logs, but may need to be adjusted depending on your workload in production environments.
Copy Vector configuration
Copy the vector configuration and create a file nginx.yaml, setting the CLICKHOUSE_ENDPOINT and CLICKHOUSE_PASSWORD.
The example above uses the default user for Managed ClickStack. For production deployments, we recommend creating a dedicated ingestion user with appropriate permissions and limits.
Start Vector
Start Vector with the following command, creating the data directory first to record file offsets.
Navigate to the ClickStack UI
Navigate to your Managed ClickStack service and select "ClickStack" from the left-hand menu. If you’ve already completed the onboarding, this will launch the ClickStack UI in a new tab, and you will be automatically authenticated. If not, you can proceed through the onboarding and select “Launch ClickStack” once you’ve selected Vector as your input source.
Create a datasource
Create a logs data source. If no data sources exist, you will be prompted to create one on first login. Otherwise, navigate to Team Settings and add a new data source.
The configuration assumes the Nginx schema with a time_local column used as the timestamp. This is the timestamp column declared in the primary key. This column is mandatory.
We have also specified the default select to be time_local, remote_addr, status, request, which defines which columns are returned in the logs view.
In the example above, a Body column does not exist in the data. Instead, it is defined as the SQL expression:
This reconstructs the log line from the structured fields.
For other possible options, see the configuration reference.
Explore the data
Navigate to the search view for October 20th, 2025 to explore the data and begin using ClickStack.
The following guide assumes you have set up ClickStack Open Source with the Getting Started guide.
Installing vector
Before proceeding, ensure that Vector is installed on the system where you plan to run your ingestion pipeline. Follow the official Vector installation guide to install a prebuilt binary or package appropriate for your environment:
Once installed, verify that the vector binary is available on your path before continuing with the configuration steps below.
This can be installed on the same instance as your ClickStack OTel collector.
Follow best practices for architecture and security when moving Vector to production.
Download the sample data
If you wish to experiment with a sample dataset, download the following example nginx sample.
This data has been collected from an Nginx instance configured to output logs in JSON format for easier parsing. For the Nginx configuration for these logs, see "Monitoring Nginx Logs with ClickStack".
Create a database and table
Vector requires a table and schema to be defined prior to data ingestion.
First create a database. This can be done via the ClickHouse Web user interface at http://localhost:8123/play. Use the default username and password api:api.
Create a database logs:
Create a table for your data.
The primary key above assumes typical access patterns in the ClickStack UI for Nginx logs, but may need to be adjusted depending on your workload in production environments.
Copy Vector configuration
Ingestion to ClickStack for Vector should occur directly to ClickHouse, bypassing the OTLP endpoint exposed by the collector.
Copy the vector configuration and create a file nginx.yaml.
The example above uses the api user for ClickStack Open Source. For production deployments, we recommend creating a dedicated ingestion user with appropriate permissions and limits. The above configuration also assumes that Vector is running on the same host as ClickStack. In production deployments, this is likely to be different. We would recommend sending data over the secure HTTPS port 8443.
Create a datasource
Create a logs data source via Team -> Sources
The configuration assumes the Nginx schema with a time_local column used as the timestamp. This is the timestamp column declared in the primary key. This column is mandatory.
We have also specified the default select to be time_local, remote_addr, status, request, which defines which columns are returned in the logs view.
In the example above, a Body column does not exist in the data. Instead, it is defined as the SQL expression:
This reconstructs the log line from the structured fields.
For other possible options, see the configuration reference.
Navigate to the ClickStack UI
Navigate to the ClickStack UI at http://localhost:8080. Create a user if you haven't completed the onboarding.
Explore the data
Navigate to the search view for October 20th, 2025 to explore the data and begin using ClickStack.