🍃 Announcing molab:

Read our announcement

Why DNB's Cyber Defense Center team moved off Databricks notebooks and onto marimo

Why DNB's Cyber Defense Center team moved off Databricks notebooks and onto marimo

Across industry, teams are realizing that modern analytics workloads can run efficiently with the right combination of open-source engines and flexible interfaces. By choosing marimo as the interface layer, organizations can break free from vendor lock-in while gaining AI-native capabilities as well as the ability to push notebooks to production as apps or scripts.

In this case study, DNB staff engineer Kyrre Wahl KongsgĂ„rd explains how marimo was critical in enabling DNB’s Cyber Defense Center team to move off Databricks and onto an in-house data platform built from composable open-source components.

marimo is the world’s best programming environment for working with data, and it’s all free and open source. Just pip install marimo or uv add marimo to get started.

Threat detection and response

DNB is Norway’s largest financial services group, serving millions of customers across banking, insurance, asset management, and capital markets. For a financial institution of this scale, effective threat detection and response depends on the ability to store and process millions of security events daily.

A common approach has been to use SIEM platforms such as Splunk, Elastic, or Sentinel, and route all log data into a central location for analysis. However, the shift to cloud-native architectures and the widespread deployment of Endpoint Detection and Response (EDR) tools has driven a sharp increase in log volumes. These tools record nearly everything that happens on a system, including executed processes, network connections, file modifications, registry operations, and suspicious API calls. Cloud environments generate similar volumes, with logs capturing actions taken by identities, changes to virtual networks, audit trail activity, and events across containerized environments.

This has led many organizations to build a security data lake or lakehouse, where log data is stored in object storage such as Azure Storage Account, Blob Storage, or S3 using an open table format like Delta or Iceberg, and queried through engines such as Databricks, Trino, or Snowflake. Centralizing data in this way makes sense in theory, but in practice it can be complex and expensive.

A practical middle ground is partial migration: keeping some data in Splunk or another SIEM for operational use, while copying or streaming other datasets into low-cost object storage for analytics and long-term retention.

At DNB, we introduced Azure Databricks as a complementary threat detection and response platform alongside Splunk. It supported streaming-based detections and interactive notebooks for investigation, threat hunting, and data analysis. Security logs from event sources were stored in Delta tables.

The detection workflow was designed so that each rule had an accompanying notebook describing the rule, summarizing the alert, and outlining the steps analysts should take. As analysts completed their work, they uploaded notebooks to a shared repository.

Composability

With Databricks we had built a streaming detection platform that allowed us to investigate alerts and incidents via interactive notebooks. However, we noticed that these investigation workloads were well suited to single-node clusters with high compute and memory settings, yet we remained tied to Databricks’ distributed architecture and pricing model. We also weren’t using many of the platform’s advanced features due to operational overhead.

During this time, DNB’s data platform team re-architected their platform IPA to use Snowflake and began providing managed services like Neo4j that would be useful for cybersecurity use cases. A migration to IPA seemed like the natural path forward, but we had designed our storage solution around tiered Delta tables—a format incompatible with Snowflake.

Fortunately, Snowflake’s performance on Iceberg tables had improved substantially, narrowing the gap with native Snowflake tables. This meant we could migrate from Delta to Iceberg while maintaining the same cost-effective object storage model and build our security data platform around IPA’s infrastructure.

The backend strategy became clear: adopt Ibis as the query layer, allowing analysts to write code once and execute it across DuckDB, Spark, and Snowflake. The final challenge was the interface layer. We wanted to design toward a future where analysts could spin up notebooks on demand, connect to whichever backend suited the investigation (whether DuckDB for single-node performance, a GPU-accelerated engine like Theseus or Graphistry, or Splunk), and adapt seamlessly between them without changing tools or workflows.

alt text

Figure 1: Multi-backend investigation workflow. The architecture demonstrates marimo’s role as a unified interface: alerts trigger notebook provisioning, analysts query across data sources, and investigations are exported as documentation.

The interface

With the backend strategy in place, the remaining question was which notebook interface could support this vision. When we came across marimo in early 2024, it addressed exactly what we needed: native support for multiple data sources and built-in integration with Ibis, allowing connections to Splunk, Databricks, and Snowflake from the same notebook environment. This flexibility aligned with our vision of a composable data system.

As we adopted marimo, it quickly became clear that it was more than a replacement for Databricks notebooks. Its built-in UI elements and reactive execution model allowed us to design self-contained incident response environments that combined analysis, visualization, and documentation in a single place. Analysts could run queries, explore results, and record findings side by side within the same workspace. The let analysts keep markdown notes in one column alongside visualizations or data views in another.

Beyond built-in interactivity, marimo’s anywidget support allowed us to extend notebooks with custom visualizations. For example, we embedded a process tree visualization that had previously existed only in our legacy EDR tool, directly into the notebook with minimal effort. This created a unified, adaptable investigation environment that now forms the core interface of our composable security data platform.

AI

marimo’s integrated AI features, such as the chat and agent sidebars, give it a significant advantage over legacy notebook solutions like Databricks and Jupyter. Unlike a generic chatbot, marimo’s AI assistance has access to the complete investigation state—markdown notes, code cells, in-memory variables, and schemas from connected data sources—enabling it to reason about the ongoing analysis.

alt text

Figure 2: Context-aware AI assistant. The assistant has access to the complete investigation state (markdown notes, code cells, in-memory variables, and schemas from connected data sources), enabling context-aware suggestions and query generation.

For example, the assistant can generate suggested queries in the correct syntax: SPL for Splunk or Ibis expressions for DuckDB and Snowflake. Analysts describe their investigation needs in natural language, and the assistant translates them into the appropriate query syntax. With marimo’s upcoming support for external tools via MCP (Model Context Protocol), the assistant will gain access to additional capabilities such as external APIs and enterprise tools.

Looking forward, marimo’s agent support extends these AI capabilities further through the Agent Client Protocol (ACP). By connecting to Claude Code or Gemini, the agents are equipped to generate queries, visualizations, and markdown documentation cells, and execute queries against Snowflake and Splunk, read documentation from Confluence, or interact with other custom tooling via MCP. While still experimental, this combination of agentic notebook generation and tool integration positions marimo as a natural interface for agentic incident response and threat hunting workflows.

Conclusion

By using marimo, Ibis, and Iceberg, we’ve been able to build toward a composable security data platform, decoupling the interface layer from backend query engines and gaining the flexibility to work with multiple backends seamlessly. Iceberg’s open table format enables cost-effective object storage while maintaining interoperability across different query engines.

This architecture has also eased our transition to IPA DNB’s data platform, giving us access to capabilities that would be difficult to build and maintain independently. Snowflake handles large-scale queries across our security data lake, while Neo4j enables graph-based investigations to trace lateral movement and assess blast radius. GenAI tooling from OpenAI and AWS Bedrock is available when analysts need it. The IPA team manages the operational complexity of these systems, allowing our security analysts to focus on threat detection and response. When we need specialized processing—such as GPU-accelerated analysis—the IPA team can provision these resources without disrupting existing workflows.

marimo serves as both the unified interface to this architecture and an AI layer for security investigations. The combination of context-aware assistance and programmatic notebook generation transforms how analysts interact with security data, making investigations faster and more systematic.

Looking ahead, we’re working with DNB’s internal data analytics platform team to offer provisioning as a service. This would enable on-demand notebook environments that connect to the appropriate backends for each investigation, further reducing operational overhead while maintaining the flexibility that makes the composable approach valuable.