The Explore module is your first look at a regional dataset — before you estimate a single model. This page is a case study: you have just been handed 520 Indian districts observed by DMSP-OLS satellite nighttime lights between 1996 and 2010 (from Mendez, Kabiraj & Li) and asked three questions an analyst always starts with: is development spatially clustered, where exactly, and how did the whole regional distribution move over time?
Every Explore function takes the panel and returns a small result object carrying a tidy .df plus an interactive Plotly figure (.fig), and most offer a plain-language .interpret(). Read this page top to bottom: the functions are ordered as a workflow — load the three inputs → map the level → encode the neighborhood → test and localize clustering → watch the distribution move in time and space.
Note
This is exploratory analysis: every reading below describes an association, never a cause. The Analyze module turns these patterns into estimates, and Learn explains the ideas behind them with simulations you control.
Stage 0 — Load the three inputs
geometrics separates geometry, data, and metadata: a geometry with only the entity ID (gdf), a long-form panel (df), and a data dictionary (df_dict). The bundled India case study ships all three; set_labels attaches the dictionary’s labels to every future figure and declares the (entity, time) coordinates once.
The dictionary is data too — it documents every column and drives the labels on every figure:
df_dict.head(8)
var_name
var_def
label
type
role
can_be_na
0
statedist
Unique district identifier formed by concatena...
State-district ID
entity
NaN
False
1
state
Name of the Indian state or union territory th...
State
factor
NaN
False
2
district
Name of the district under 1991-census boundar...
District
factor
entity_name
False
3
year
Observation year of the radiance-calibrated DM...
Year
time
NaN
False
4
ntl_rural
Radiance-calibrated DMSP-OLS total nighttime l...
Rural NTL
numeric
NaN
False
5
ntl_urban
Radiance-calibrated DMSP-OLS total nighttime l...
Urban NTL
numeric
NaN
False
6
ntl_total
Radiance-calibrated DMSP-OLS total nighttime l...
Total NTL
numeric
NaN
False
7
ntl_pc_1996
Radiance-calibrated nighttime luminosity per c...
NTL per capita (1996)
numeric
NaN
False
Stage 1 — See the map
explore_choropleth_map classifies with mapclassify (Fisher-Jenks by default, k classes) and draws one legend entry per class, so the legend is the classification:
Pass animate=True instead of a single period to play the whole 1996–2010 film, or switch scheme ("quantiles", "equalinterval", …) to see how much the story depends on the classification — gm.explain("choropleth_classification") explains why.
Stage 2 — Encode the neighborhood
Everything spatial starts with a weights matrix W — the formal answer to “who is whose neighbor?”. The paper uses 6 nearest neighbors; explore_connectivity_map draws the graph so you can inspect it before trusting it:
w = gm.make_weights(gdf, method="knn", k=6)gm.explore_connectivity_map(gdf, w=w).fig
Contiguity is the common alternative (method="queen") — see Spatial dependence and LISA for how to choose, and Analyze for checking that results survive the choice.
Stage 3 — Is development spatially clustered?
The Moran scatterplot puts each district’s (standardized) value against the average of its neighbors; the slope is global Moran’s I, the workhorse test of spatial autocorrelation:
Global Moran's I for **log_ntl_pc_1996** in 1996 is 0.724, against an expectation of -0.00193 under spatial randomness — statistically significant at the 1% level (pseudo p = 0.001 from 999 permutations, under 6-nearest-neighbor (metric centroids), row-standardized, n=520).
The dependence is **positive**: similar values cluster in space — high values sit next to high values and low next to low, so the map shows contiguous patches rather than a random scatter.
448 of 520 regions (86%) fall in the clustering quadrants of the scatter (High-High or Low-Low); the rest are surrounded by neighbors unlike themselves. The slope of the fitted line equals Moran's I under row-standardized weights, so a steeper line means stronger clustering.
_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._
Stage 4 — Where exactly? (LISA)
Global Moran’s I says whether the map clusters; LISA (local Moran) says where — each district is classified as a High-High hot spot, Low-Low cold spot, or a spatial outlier (High-Low / Low-High), masked at 5% significance:
lisa = gm.explore_lisa_cluster_map(df, "log_ntl_pc_1996", gdf=gdf, w=w, period=1996)lisa.fig
print(lisa.interpret())
Local Moran statistics (LISA) locate *where* **log_ntl_pc_1996** in 1996 clusters or stands out, under 6-nearest-neighbor (metric centroids), row-standardized, n=520. The accompanying global Moran's I is 0.724 (pseudo p = 0.001), consistent with overall clustering of similar values.
At the 0.05 significance level, 281 of 520 regions show significant local association: **169 High-High** hot spots (high values surrounded by high neighbors) and **101 Low-Low** cold spots (low surrounded by low) mark clustering, while **9 High-Low** and **2 Low-High** regions are spatial outliers that break with their surroundings. The remaining 239 regions are not significant — their local pattern is compatible with randomness.
LISA pseudo p-values are computed region by region without a multiple-testing adjustment, so treat borderline clusters cautiously and read the map as descriptive of where dependence concentrates.
_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._
Stage 5 — The whole distribution, year by year
Convergence questions are distribution questions. The ridgeline stacks the cross-sectional density of each year on one shared grid, so you can watch the shape — not just the mean — move:
(kind="animated" plays the same densities as an animation instead.)
Stage 6 — Every district, every year
The space-time heatmap keeps every unit visible: one row per district, one column per year. Sorting the rows by latitude turns geography itself into the y-axis — a north–south transect of the whole panel:
Rows that keep their shading left to right are persistent; rows that lighten or darken are mobile. sort_by="value" orders by the first period instead.
Stage 7 — Does the clustering strengthen or fade?
Stage 3 tested one year. Running Moran’s I per year closes the loop — is the spatial structure of development deepening or dissolving?
mot = gm.explore_moran_over_time(df, "log_ntl_pc_1996", gdf=gdf, w=w)mot.fig
print(mot.interpret())
Global Moran's I for **log_ntl_pc_1996** is tracked over 6 periods (1996 to 2010) on a fixed set of regions, under 6-nearest-neighbor (metric centroids), row-standardized, n=520: it moves from 0.724 to 0.724.
The series is broadly **stable**: the degree to which similar values cluster in space changes little over the window.
Per-period permutation tests flag 6 of 6 periods as significant at the 5% level (filled markers in the figure); open markers are periods where the pattern is compatible with spatial randomness.
_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._
Where next
You now know the map clusters, where it clusters, and how the distribution moved.
Analyze — estimate it: β/σ/club convergence, spatial models with spillovers, Markov dynamics, inequality decompositions (on the Bolivia case study)