Explore regional data

Try this page interactively — no install needed.

The Explore module is your first look at a regional dataset — before you estimate a single model. This page is a case study: you have just been handed 520 Indian districts observed by DMSP-OLS satellite nighttime lights between 1996 and 2010 (from Mendez, Kabiraj & Li) and asked three questions an analyst always starts with: is development spatially clustered, where exactly, and how did the whole regional distribution move over time?

Every Explore function takes the panel and returns a small result object carrying a tidy .df plus an interactive Plotly figure (.fig), and most offer a plain-language .interpret(). Read this page top to bottom: the functions are ordered as a workflowload the three inputs → map the level → encode the neighborhood → test and localize clustering → watch the distribution move in time and space.

Note

This is exploratory analysis: every reading below describes an association, never a cause. The Analyze module turns these patterns into estimates, and Learn explains the ideas behind them with simulations you control.

Stage 0 — Load the three inputs

geometrics separates geometry, data, and metadata: a geometry with only the entity ID (gdf), a long-form panel (df), and a data dictionary (df_dict). The bundled India case study ships all three; set_labels attaches the dictionary’s labels to every future figure and declares the (entity, time) coordinates once.

import warnings

warnings.filterwarnings("ignore")

import geometrics as gm

gdf, df, df_dict = gm.data.load_india()
df = gm.set_labels(df, df_dict, set_panel=True)  # labels + entity/time + roles, once
df.head(3)
statedist state district year ntl_rural ntl_urban ntl_total ntl_pc_1996 log_ntl_pc_1996 growth_ntl_pc_9610 ... latitude rural_share log_pop_density sc_share st_share work_share literacy_share higher_edu_share electricity_share log_paved_roads
0 Andhra PradeshAdilabad Andhra Pradesh Adilabad 1996 35532.074 8465.7354 43997.809 0.019194 -3.953148 0.039751 ... 19.25 0.948856 5.205684 0.188309 0.16231 0.439483 0.414134 0.043333 0.454379 4.301359
1 Andhra PradeshAdilabad Andhra Pradesh Adilabad 1999 51730.660 9121.8799 60852.539 0.019194 -3.953148 0.039751 ... 19.25 0.948856 5.205684 0.188309 0.16231 0.439483 0.414134 0.043333 0.454379 4.301359
2 Andhra PradeshAdilabad Andhra Pradesh Adilabad 2000 63759.672 10821.1310 74580.805 0.019194 -3.953148 0.039751 ... 19.25 0.948856 5.205684 0.188309 0.16231 0.439483 0.414134 0.043333 0.454379 4.301359

3 rows × 28 columns

The dictionary is data too — it documents every column and drives the labels on every figure:

df_dict.head(8)
var_name var_def label type role can_be_na
0 statedist Unique district identifier formed by concatena... State-district ID entity NaN False
1 state Name of the Indian state or union territory th... State factor NaN False
2 district Name of the district under 1991-census boundar... District factor entity_name False
3 year Observation year of the radiance-calibrated DM... Year time NaN False
4 ntl_rural Radiance-calibrated DMSP-OLS total nighttime l... Rural NTL numeric NaN False
5 ntl_urban Radiance-calibrated DMSP-OLS total nighttime l... Urban NTL numeric NaN False
6 ntl_total Radiance-calibrated DMSP-OLS total nighttime l... Total NTL numeric NaN False
7 ntl_pc_1996 Radiance-calibrated nighttime luminosity per c... NTL per capita (1996) numeric NaN False

Stage 1 — See the map

explore_choropleth_map classifies with mapclassify (Fisher-Jenks by default, k classes) and draws one legend entry per class, so the legend is the classification:

gm.explore_choropleth_map(df, "ntl_total", gdf=gdf, period=2010).fig

Pass animate=True instead of a single period to play the whole 1996–2010 film, or switch scheme ("quantiles", "equalinterval", …) to see how much the story depends on the classification — gm.explain("choropleth_classification") explains why.

Stage 2 — Encode the neighborhood

Everything spatial starts with a weights matrix W — the formal answer to “who is whose neighbor?”. The paper uses 6 nearest neighbors; explore_connectivity_map draws the graph so you can inspect it before trusting it:

w = gm.make_weights(gdf, method="knn", k=6)
gm.explore_connectivity_map(gdf, w=w).fig

Contiguity is the common alternative (method="queen") — see Spatial dependence and LISA for how to choose, and Analyze for checking that results survive the choice.

Stage 3 — Is development spatially clustered?

The Moran scatterplot puts each district’s (standardized) value against the average of its neighbors; the slope is global Moran’s I, the workhorse test of spatial autocorrelation:

moran = gm.explore_moran_plot(df, "log_ntl_pc_1996", gdf=gdf, w=w, period=1996)
moran.fig
print(moran.interpret())
Global Moran's I for **log_ntl_pc_1996** in 1996 is 0.724, against an expectation of -0.00193 under spatial randomness — statistically significant at the 1% level (pseudo p = 0.001 from 999 permutations, under 6-nearest-neighbor (metric centroids), row-standardized, n=520).

The dependence is **positive**: similar values cluster in space — high values sit next to high values and low next to low, so the map shows contiguous patches rather than a random scatter.

448 of 520 regions (86%) fall in the clustering quadrants of the scatter (High-High or Low-Low); the rest are surrounded by neighbors unlike themselves. The slope of the fitted line equals Moran's I under row-standardized weights, so a steeper line means stronger clustering.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

Stage 4 — Where exactly? (LISA)

Global Moran’s I says whether the map clusters; LISA (local Moran) says where — each district is classified as a High-High hot spot, Low-Low cold spot, or a spatial outlier (High-Low / Low-High), masked at 5% significance:

lisa = gm.explore_lisa_cluster_map(df, "log_ntl_pc_1996", gdf=gdf, w=w, period=1996)
lisa.fig
print(lisa.interpret())
Local Moran statistics (LISA) locate *where* **log_ntl_pc_1996** in 1996 clusters or stands out, under 6-nearest-neighbor (metric centroids), row-standardized, n=520. The accompanying global Moran's I is 0.724 (pseudo p = 0.001), consistent with overall clustering of similar values.

At the 0.05 significance level, 281 of 520 regions show significant local association: **169 High-High** hot spots (high values surrounded by high neighbors) and **101 Low-Low** cold spots (low surrounded by low) mark clustering, while **9 High-Low** and **2 Low-High** regions are spatial outliers that break with their surroundings. The remaining 239 regions are not significant — their local pattern is compatible with randomness.

LISA pseudo p-values are computed region by region without a multiple-testing adjustment, so treat borderline clusters cautiously and read the map as descriptive of where dependence concentrates.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

Stage 5 — The whole distribution, year by year

Convergence questions are distribution questions. The ridgeline stacks the cross-sectional density of each year on one shared grid, so you can watch the shape — not just the mean — move:

gm.explore_distribution_over_time(df, "log_ntl_pc_1996").fig

(kind="animated" plays the same densities as an animation instead.)

Stage 6 — Every district, every year

The space-time heatmap keeps every unit visible: one row per district, one column per year. Sorting the rows by latitude turns geography itself into the y-axis — a north–south transect of the whole panel:

gm.explore_spacetime_heatmap(
    df, "log_ntl_pc_1996", gdf=gdf, sort_by="north_south"
).fig

Rows that keep their shading left to right are persistent; rows that lighten or darken are mobile. sort_by="value" orders by the first period instead.

Stage 7 — Does the clustering strengthen or fade?

Stage 3 tested one year. Running Moran’s I per year closes the loop — is the spatial structure of development deepening or dissolving?

mot = gm.explore_moran_over_time(df, "log_ntl_pc_1996", gdf=gdf, w=w)
mot.fig
print(mot.interpret())
Global Moran's I for **log_ntl_pc_1996** is tracked over 6 periods (1996 to 2010) on a fixed set of regions, under 6-nearest-neighbor (metric centroids), row-standardized, n=520: it moves from 0.724 to 0.724.

The series is broadly **stable**: the degree to which similar values cluster in space changes little over the window.

Per-period permutation tests flag 6 of 6 periods as significant at the 5% level (filled markers in the figure); open markers are periods where the pattern is compatible with spatial randomness.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

Where next

You now know the map clusters, where it clusters, and how the distribution moved.

  • Analyze — estimate it: β/σ/club convergence, spatial models with spillovers, Markov dynamics, inequality decompositions (on the Bolivia case study)
  • The India case study — the full replication arc on this same panel
  • Learn — the ideas behind W, Moran’s I and LISA, taught with simulations where you plant the truth
  • The data model — bring your own (gdf, df, df_dict)