Gleb, Alex, Erez and Simon here – we are building an open-source tool for comparing data within and across databases at any scale. The repo is at https://github.com/datafold/data-diff , and our home page is https://datafold.com/ . As a company, Datafold builds tools for data engineers to automate the most tedious and error-prone tasks falling through the cracks of the modern data stack, such as data testing and lineage. We launched two years ago with a tool for regression-testing changes to ETL code https://news.ycombinator.com/item?id=24071955 . It compares the produced data before and after the code change and shows the impact on values, aggregate metrics, and downstream data applications. While working with many customers on improving their data engineering experience, we kept hearing that they needed to diff their data across databases to validate data replication between systems. There were 3 main use cases for such replication: (1) To perform analytics on transactional data i...