Weights & Biases (W&B) is an AI developer platform founded in 2017 to help engineering teams overcome one of the most frustrating challenges in AI development: taking projects from impressive prototypes to reliable production systems.
At its core, W&B provides tools to track experiments, version datasets, and manage machine learning workflows, making AI development more efficient and reproducible. This is important because, as W&B co-founder and CEO Lukas Biewald says, “the real value in AI development lies in learning from your experiments.” A reliable and robust infrastructure helps teams ensure that learnings are preserved and easily scaled.
To handle the huge data volumes that come with scaling AI projects for production, W&B relies on the high-performance, low-latency capabilities of ClickHouse, a columnar OLAP database that supports seamless scaling without compromising speed. ClickHouse allows W&B’s customers to handle real-time data streams efficiently, ensuring that insights are delivered quickly, even as datasets grow exponentially.
At a September 2024 meetup in San Francisco, Lukas shared more about the work W&B is doing to help AI companies move from demo to production, including ClickHouse’s role in managing this transition.
“We feel grateful to ClickHouse and their whole team,” Lukas says. “It has really deepened a lot of the most important things we can do for our customers.”
Barriers to production
As someone who has worked in machine learning for over 20 years, Lukas is encouraged by what he calls the “democratization of AI.” The emergence of new foundation models, he says, has made it easier for companies around the world, in all industries, to invest in AI capabilities and build applications that harness AI in “exciting and unexpected ways.”
However, as more teams develop AI models, there’s an expectation that they can seamlessly move these models into production to create real-world impact. The reality, as Lukas explains, is that “AI is so easy to demo, but so hard to get into production.” In other words, for all the excitement generated by AI prototypes, most teams find it difficult to transition these models into scalable and dependable systems that deliver practical value.
“In software, we joke about the first 90 percent and the second 90 percent,” Lukas says. “In AI, you might have ten 90 percents, and even then it never makes it into production.”
Part of the problem, he says, is the gap in developer tools for GenAI operations. Traditional software development benefits from known workflows and established tools for managing code and processes — GitHub for version control, Jira for project management, Figma for design, and so on. By contrast, AI developer tools lack the same level of maturity.
When Lukas started Weights & Biases, investors frequently asked him what makes software development different from AI development. “Software development is linear — you’re always moving forward, adding functionality and features,” he says. “AI, on the other hand, is experimental. Almost everything you try never sees the light of day. The workflow is iterative; it’s all about running experiments. When you make something better, something else always gets worse, and you have to decide how you feel about the trade-offs.”
Unlike software development, where code represents your intellectual property, Lukas says the primary value in AI lies in “the things you learn along the way.” But without adequate tools to track, version, and manage these learnings, much of that value is lost, leaving engineering teams struggling to move forward efficiently.
Scaling AI development
Lukas and his co-founders set out to address the experimental, often chaotic nature of AI development. With tools for tracking experiments, versioning datasets, and managing workflows, W&B brings order to the process, making it reproducible and scalable. This helps engineering teams iterate faster, build on past learnings, and ensure progress isn’t lost.
The W&B platform serves three main audiences: foundation model builders, ML engineers, and software developers. Foundation model builders — those developing large-scale models like OpenAI’s GPT or Meta’s Llama — need sophisticated experimentation and tracking tools, which W&B provides with features designed for complex workflows. ML engineers working on real-world applications like autonomous vehicles and drug discovery use W&B to bridge the gap between research and production, making sure models are robust and scalable.
To support the growing number of software developers building GenAI applications, W&B recently launched Weave, an observability tool designed for developers who may be new to AI. Weave makes AI development more accessible by helping developers visualize LLM calls to better understand how data flows through their models. It also simplifies the process of saving and versioning datasets, providing tools to handle complex data scenarios, such as privacy requirements or dynamic data changes.
As Lukas explains, Weave allows for rigorous model evaluations across multiple metrics, helping developers assess performance and identify areas for improvement. In addition, Weave captures user feedback — “whether it’s an internal business user or an external customer” — and synthesizes it in one place so it can be quickly incorporated into the product.
Finally, Weave provides tools for comparing different foundation models, making it easy for developers to determine which model performs best in specific scenarios, and ultimately helping them select the one that best meets their needs.
The role of ClickHouse
ClickHouse plays an important role in supporting W&B’s infrastructure, allowing the platform to manage the immense data demands that come with AI experimentation at scale.
Lukas notes that with a product like Weave, “customers want fast, live results immediately on their data at small scale, but then they never shut it off, and it needs to scale to billions of records almost immediately.” ClickHouse’s columnar storage and low-latency capabilities make this possible, allowing W&B’s customers to transition from small-scale experiments to production environments without having to re-architect their infrastructure.
As Lukas explains, ClickHouse’s flexibility is especially valuable in AI development, where the line between pre-production and production is often blurred. “When you do this process well, and you start with a lightweight prototype and then iterate, there’s not really a pre-production process — whatever you’re doing pre-production ends up in post-production,” he says.
He also highlights the support W&B and its customers have received from ClickHouse, noting that the W&B and ClickHouse teams share a similar approach to doing whatever it takes to help customers, including communicating directly through Slack channels.
“We’ve loved working with ClickHouse,” Lukas says. “The product is great, the architecture has been super easy to set up, and the low latency and scalability are amazing.”
Empowering AI developers
As foundation models continue to democratize AI, more and more software developers are stepping into the world of AI development, experimenting with new capabilities and building exciting applications. With Weave — and a helping hand from ClickHouse — W&B is supporting these developers who may not have deep expertise in machine learning, but who are eager to incorporate the power of AI into their applications and workflows.
By making it easier for engineering teams to track, version, and manage experiments, W&B is further lowering the barrier to AI development and accelerating the adoption of GenAI across a range of different industries. This expansion reflects Lukas and the team’s commitment to supporting all developers, whether they’re AI specialists or software engineers, and helping them transition their most innovative ideas to production with confidence.