Learn spatial analysis

The Learn module is geometrics’ teaching layer, and it works two complementary ways:

Every result speaks. Each explore_* / analyze_* result carries .interpret() — a plain-language reading of that result — and .explain(), the concept behind the method.
Sandboxes with a planted truth. The learn_* functions simulate data from a known data-generating process, run the real geometrics estimator on it, and show whether the truth you planted comes back. Turn the knobs (rho=, shift=, convergence_rate=, …) and watch the concept respond.

Note

Sandboxes are for learning, never for your data — they generate their own. And even here, where the truth is literally known, .interpret() keeps its associational discipline: the habit should transfer to real data, where no truth is planted.

Stage 1 — Read a real result in plain language

Any result can explain itself. A quick β-convergence on the Bolivia provinces:

import warnings

warnings.filterwarnings("ignore")

import geometrics as gm

gdf, df, df_dict = gm.data.load_bolivia()
df = gm.set_labels(df, df_dict, set_panel=True)

res = gm.analyze_beta_convergence(df, "gdppc", model="ols")
print(res.interpret())

Across 107 units, the growth of **gdppc** over a 10-period window is associated with its initial log level with a total slope of **-0.00913** (SE 0.00406), statistically significant at the 5% level.
The slope is negative — the β-convergence pattern: units that started lower tended to grow faster, narrowing initial gaps.
That slope implies a convergence speed of λ = 0.00957 per period and a half-life of about 72.4 periods — the time for half of an initial gap to close at this pace.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

And the concept behind it, from the built-in explainer registry:

print(res.explain().to_markdown()[:600], "...")

### Beta convergence

**What it is.** β-convergence asks whether units that start *behind* grow *faster* and so catch up. The test regresses each unit's average growth rate over a horizon on its **initial level** — canonically the growth of GDP per capita on initial log GDP per capita. A **negative** slope β is convergence: lower starting points are associated with faster growth. The slope maps to a structural **speed of convergence** λ = -ln(1 + β·T) / T (per period) and a **half-life** ln 2 / λ, the time to close half of an initial gap. **Unconditional** (absolute) convergence uses the initi ...

Stage 2 — The browsable concept index

Thirty topics ship with the package — ESDA, weights, spatial models and impacts, convergence, distribution dynamics, inequality, and foundations. Every key works with gm.explain(...):

gm.list_topics()

['beta_convergence',
 'choropleth_classification',
 'convergence_clubs',
 'correlation_vs_causation',
 'crs_projections',
 'distribution_dynamics',
 'gini',
 'gwr',
 'lm_diagnostics',
 'local_moran',
 'markov_chains',
 'mgwr',
 'mobility_measures',
 'pearson',
 'row_standardization',
 'sigma_convergence',
 'slx_model',
 'spatial_autocorrelation',
 'spatial_convergence',
 'spatial_durbin_model',
 'spatial_error_model',
 'spatial_impacts',
 'spatial_lag',
 'spatial_lag_model',
 'spatial_markov',
 'spatial_weights',
 'spearman',
 'theil_decomposition',
 'theil_index',
 'weights_robustness']

print(gm.explain("spatial_autocorrelation").to_markdown())

### Spatial autocorrelation (Moran's I)

**What it is.** Spatial autocorrelation is the tendency of nearby units to resemble each other. **Moran's I** measures it globally as the cross-product between each unit's (standardized) value and its spatial lag: positive I means high values cluster near high values and low near low; negative I means checkerboard-like alternation; I near its expectation E[I] = -1/(n-1) means spatial randomness. Inference uses **conditional permutations**: values are reshuffled across the map many times to build a reference distribution, giving a pseudo p-value (`p_sim`). The Moran scatterplot draws value against lag; under a row-standardized W its regression slope *is* Moran's I.

**When to use it.** Run it first on any regional variable — levels and growth rates alike. Strong autocorrelation in a regression's residuals signals that OLS standard errors and possibly coefficients are unreliable and a spatial model is worth considering (`analyze_spatial_diagnostics`). Track it over time with `explore_moran_over_time` to see whether spatial structure is strengthening or dissolving.

**Watch out for.**
- I is one number for the whole map; it can hide offsetting local pockets — pair it with the local version (LISA).
- The value and its significance depend on the chosen W.
- A significant I says values cluster; it does not say *why* — common shocks, spillovers, and omitted spatially-smooth covariates all produce it.

*See also:* local_moran, spatial_weights, lm_diagnostics

*References:* Moran (1950), 'Notes on Continuous Stochastic Phenomena', Biometrika; Anselin (1995), 'Local Indicators of Spatial Association — LISA', Geog. Anal.

Stage 3 — Sandbox: seeing spatial autocorrelation

What does ρ actually look like? Plant no dependence, then strong dependence — the left panel is one simulated map, the right panel tracks Moran’s I across planted ρ:

gm.learn_spatial_autocorrelation(rho=0.0).fig

strong = gm.learn_spatial_autocorrelation(rho=0.8)
strong.fig

print(strong.interpret())

**What this sandbox shows** — the data were simulated, so the truth is known. With the planted dependence at ρ = 0.8, Moran's I averages 0.496 across the simulations, against a no-dependence baseline of E[I] = -0.00699 (the ρ = 0 runs average -0.0196). 100% of the focal-ρ runs are significant at 5% — as ρ rises, neighbors look alike and the statistic pulls away from its null.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

Stage 4 — Sandbox: why spatial econometrics exists

Simulate outcomes that spill over (y = (I - ρW)⁻¹(βx + ε)), then fit OLS as if space did not exist. The omitted spatial lag inflates the slope; the SAR model recovers both β and ρ:

omit = gm.learn_omitted_spatial_lag(rho=0.7)
omit.fig

print(omit.interpret())

**What this sandbox shows** — the data were simulated, so the truth is known. OLS, which omits the spatial lag Wy, puts the slope at 1.42 — off the planted β = 1 by 0.425 because the spatial multiplier is absorbed into the coefficient. The ML spatial-lag model recovers β̂ = 1.07 and ρ̂ = 0.621 (planted ρ = 0.7) by modeling the dependence instead of ignoring it.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

Stage 5 — Sandbox: spillovers you planted, impacts recovered

In a spatial Durbin world the true direct and indirect effects are known in closed form — so you can watch the LeSage-Pace decomposition earn its keep. This is the idea behind Analyze Stage 3:

spill = gm.learn_spatial_spillovers(rho=0.5, gamma=0.5)
spill.fig

print(spill.interpret())

**What this sandbox shows** — the data were simulated, so the truth is known. The Monte-Carlo impact decomposition lands close to the planted truth: direct 1.18 vs 1.3 true, indirect 1.48 vs 1.7, total 2.66 vs 3 (ρ̂ = 0.432 against a planted 0.5). This is why spatial-model coefficients are read through impacts: with feedback via ρ, β alone is not the marginal association.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

Stage 6 — Sandbox: convergence at a known speed

Plant a 2% convergence rate; the growth-on-initial regression should hand it back — slope, speed, and half-life:

beta = gm.learn_beta_convergence(convergence_rate=0.02)
beta.fig

beta.df

	quantity	estimated	true
0	beta	-0.020127	-0.020000
1	speed	0.021213	0.021072
2	half_life	32.675050	32.894067

Stage 7 — The full sandbox catalog

Eleven sandboxes cover the package’s method families — each links to its reference page with every knob documented:

Sandbox	The lesson
`learn_spatial_autocorrelation`	What ρ looks like, and how Moran’s I tracks it
`learn_spatial_weights`	The same field under queen / rook / knn — W is a choice
`learn_lisa_clusters`	Planted hot/cold spots, recovered (and false positives counted)
`learn_spatial_spillovers`	Direct/indirect/total impacts vs a closed-form truth
`learn_omitted_spatial_lag`	The bias of ignoring Wy — and how SAR repairs it
`learn_beta_convergence`	A planted convergence rate, recovered
`learn_sigma_convergence`	A planted dispersion path; trend = ln ρ exactly
`learn_convergence_clubs`	Two planted clubs; Phillips-Sul finds them
`learn_markov_chains`	A planted transition matrix, recovered cell by cell
`learn_spatial_markov`	Mobility that depends on the neighbors — detected
`learn_theil_decomposition`	A planted between/within split, decomposed exactly

(The two Markov sandboxes need the dynamics extra: pip install "geometrics[dynamics]".)

Prefer sliders?

The Learn app wraps every sandbox knob in a slider and pairs it with the explainer browser — no install needed:

📚 Launch the Learn app

Where next

Explore — apply the ESDA ideas to the India panel
Analyze — the estimators these sandboxes demystify, on Bolivia
API reference — every knob of every sandbox