Skip to content

Methodology

Scaling Up

As we're progressing with nationwide and force-specific hotspot determination we have had to reassess some of the tooling we've been using so far, particularly for presenting and disseminating results, and with a view to producing practical tools for practitioners and policymakers.

We've started to hit performance limitations with geopandas so are switching a lot of the CPU-intensive spatial code to use an ephemeral duckdb instance with the spatial extension. This is exceptionally fast, able to load 3 years of national crime data (~18m incidents, England & Wales) from over 1500 individual parquet files, and group the crimes into 200m hex cells (~1.5m) - all in well under 10 seconds.

We've previously used Streamlit as a front-end (see here) but it's also not scaling well, proving somewhat inflexible and has compatibility issues with duckdb, so we're looking at alternatives. Later in the project we'll post a more definitive article on recommendations for the best (python) tooling, but in the meantime we're excited about the possibilities duckdb enables and want to share some progress...

Measuring Crime Concentration (Part 2)

We demonstrated in part 1 that random data with no structural concentration will exhibit some concentration using the traditional measures. In this article we taker a deeper dive into this claim.

Recall that our null hypothesis was that crimes are not concentrated, crime is is no more or less likely to occur in any given spatial unit, and the chance of a crime occurrence is not affected by previous events.

Motivating example - Seasonality

Public crime data in the UK contains precise (but obfuscated) location data, imprecise temporal information (the month of occurrence) and broad categorical information (around a dozen categories). Taking 3 years of data for incidents of ASB in West Yorkshire, aggregating crime into spatial units (in this case LSOA) we can plot a graph of (naive) Gini over time.

This graph shows some clear seasonality - concentration increases in the winter and decreases in the summer. A fairly obvious explanation for this would be that since antisocial behaviour generally occurs outdoors, there are simply more outdoor gatherings in warmer weather and longer daylight hours, therefore more opportunities.

Gini seasonality

However, more careful analysis of the data shows the apparent seasonality is an illusion. Here's why...

Measuring Crime Concentration (Part 1)

What constitutes concentration? In these articles we ask:

  • how do we count crimes? And how do we account for and control for heterogeneity in our observations (units with different area and populations)?
  • how do we decide if crime is concentrated? What measures are traditionally used?
  • if crime is purely random and thus isn't concentrated in any meaningful sense, will we still measure some concentration using traditional measures?
  • can we develop a "null hypothesis" statistical model to create a baseline measure, allowing us to differentiate between random and structural effects? What features must this model have to be realistic?
  • how do we develop this into a useful measure?