Scaling Up
As we're progressing with nationwide and force-specific hotspot determination we have had to reassess some of the tooling we've been using so far, particularly for presenting and disseminating results, and with a view to producing practical tools for practitioners and policymakers.
We've started to hit performance limitations with geopandas so are switching a lot of the CPU-intensive spatial code to use an ephemeral duckdb instance with the spatial extension. This is exceptionally fast, able to load 3 years of national crime data (~18m incidents, England & Wales) from over 1500 individual parquet files, and group the crimes into 200m hex cells (~1.5m) - all in well under 10 seconds.
We've previously used Streamlit as a front-end (see here) but it's also not scaling well, proving somewhat inflexible and has compatibility issues with duckdb, so we're looking at alternatives. Later in the project we'll post a more definitive article on recommendations for the best (python) tooling, but in the meantime we're excited about the possibilities duckdb enables and want to share some progress...
