Why data engineers love marimo

Data engineering workflows benefit from tools that offer compelling interactivity, but remain simple Python scripts that can be deployed.

Data engineering involves building and maintaining complex data pipelines, often requiring iterative development and debugging. Marimo provides engineers with a notebook experience that facilitates this development while offering a different approach to working with code. Unlike traditional notebooks, marimo notebooks are plain Python files under the hood, meaning they can be versioned with Git. These files can run as interactive notebooks during development, but can just as easily run as standard Python scripts for automation or scheduled batch workloads.

Interactive pipeline development

Marimo uses reactive execution where code cells automatically update when their dependencies change. This is particularly useful for data pipeline development, where changes to data transformations should propagate through the entire workflow. When a data engineer modifies a data cleaning step or aggregation logic, downstream cells can update automatically, providing immediate feedback on how the change affects the final output. This eliminates the need to manually re-run multiple cells in sequence and reduces the risk of working with stale intermediate results. Engineers can still choose to run cells manually when working with heavy workloads, and marimo offers caching mechanisms to prevent unnecessary computation when values haven’t changed.

Data engineering projects often involve collaboration between team members with different technical backgrounds. Marimo notebooks can run as web applications that allow users to interact with the outputs of all cells in a notebook. This makes it straightforward to share pipeline prototypes with stakeholders or other engineers. The notebooks can even be deployed via WebAssembly, eliminating the need for a Python backend in some cases. This flexibility helps bridge the gap between development and sharing results with non-technical stakeholders.

Production-ready data access

marimo supports Python cells as well as SQL. Not only that, when marimo connects to a data source, it can automatically detect table schemas. That means that you can expect full autocompletion in SQL cells as well as widgets that understand how to visualise the tables. These schemas also provides useful context for AI assistants when helping engineers write SQL statements. The combination of automatic schema detection and AI assistance can speed up the development of data queries, especially when working with unfamiliar datasets or complex relationships between tables.

All features

Why data engineers love marimo

Interactive pipeline development

Easy to share and collaborate

Production-ready data access