Langfuse
What is Langfuse?
Langfuse is an open-source LLM engineering platform that helps teams collaboratively debug, analyze, and iterate on their LLM applications. It is part of the ClickHouse ecosystem and relies on ClickHouse at its core to provide a scalable, high-performance observability backend.
By leveraging ClickHouse's columnar storage and fast analytical capabilities, Langfuse can handle billions of traces and events with low latency, making it suitable for high-throughput production workloads.
Why Langfuse?
- Open source: Fully open source with public API for custom integrations
- Production optimized: Designed with minimal performance overhead
- Best-in-class SDKs: Native SDKs for Python and JavaScript
- Framework support: Integrated with popular frameworks like OpenAI SDK, LangChain, and LlamaIndex
- Multi-modal: Support for tracing text, images and other modalities
- Full platform: Suite of tools for the complete LLM application development lifecycle
Deployment options
Langfuse offers flexible deployment options to meet different security and infrastructure needs.
Langfuse Cloud is a fully managed service powered by a managed ClickHouse cluster for optimal performance. It is SOC 2 Type II and ISO 27001 certified, GDPR compliant, and available in US (AWS us-west-2) and EU (AWS eu-west-1) data regions.
Self-hosted Langfuse is fully open-source (MIT license) and free to deploy on your own infrastructure using Docker or Kubernetes. You run your own ClickHouse instance (or use ClickHouse Cloud) to store observability data, ensuring complete control over your data.
Architecture
Langfuse only depends on open source components and can be deployed locally, on cloud infrastructure, or on-premises:
- ClickHouse: Stores high-volume observability data (traces, spans, generations, scores). It enables fast aggregation and analytics for dashboards.
- Postgres: Stores transactional data like user accounts, project configurations, and prompt definitions.
- Redis: Handles event queuing and caching.
- S3/Blob Storage: Stores large payloads and raw event data.
Features
Observability
Observability is essential for understanding and debugging LLM applications. Unlike traditional software, LLM applications involve complex, non-deterministic interactions that can be challenging to monitor and debug. Langfuse provides comprehensive tracing capabilities that help you understand exactly what's happening in your application.
📹 Want to learn more? Watch end-to-end walkthrough of Langfuse Observability and how to integrate it with your application.
- Trace Details
- Sessions
- Timeline
- Users
- Agent Graphs
- Dashboard
Traces allow you to track every LLM call and other relevant logic in your app.
Sessions allow you to track multi-step conversations or agentic workflows.
Debug latency issues by inspecting the timeline view.
Add your own userId to monitor costs and usage for each user. Optionally, create a deep link to this view in your systems.
LLM agents can be visualized as a graph to illustrate the flow of complex agentic workflows.
See quality, cost, and latency metrics in the dashboard to monitor your LLM application.
Prompt management
Prompt Management is critical in building effective LLM applications. Langfuse provides tools to help you manage, version, and optimize your prompts throughout the development lifecycle.
📹 Want to learn more? Watch end-to-end walkthrough of Langfuse Prompt Management and how to integrate it with your application.
- Create
- Version Control
- Deploy
- Metrics
- Test in Playground
- Link with Traces
- Track Changes
Create a new prompt via UI, SDKs, or API.
Collaboratively version and edit prompts via UI, API, or SDKs.
Deploy prompts to production or any environment via labels - without any code changes.
Compare latency, cost, and evaluation metrics across different versions of your prompts.
Instantly test your prompts in the playground.
Link prompts with traces to understand how they perform in the context of your LLM application.
Track changes to your prompts to understand how they evolve over time.
Evaluation & datasets
Evaluation is crucial for ensuring the quality and reliability of your LLM applications. Langfuse provides flexible evaluation tools that adapt to your specific needs, whether you're testing in development or monitoring production performance.
📹 Want to learn more? Watch end-to-end walkthrough of Langfuse Evaluation and how to use it to improve your LLM application.
- Analytics
- User Feedback
- LLM-as-a-Judge
- Experiments
- Annotation Queue
- Custom Evals
Plot evaluation results in the Langfuse Dashboard.
Collect feedback from your users. Can be captured in the frontend via our Browser SDK, server-side via the SDKs or API. Video includes example application.
Run fully managed LLM-as-a-judge evaluations on production or development traces. Can be applied to any step within your application for step-wise evaluations.
Evaluate prompts and models on datasets directly in the user interface. No custom code is needed.
Baseline your evaluation workflow with human annotations via Annotation Queues.
Add custom evaluation results, supports numeric, boolean and categorical values.
Add scores via Python or JS SDK.
Quickstarts
Get up and running with Langfuse in minutes. Choose the path that best fits your current needs: