The Analyze module estimates what Explore described. This page is a case study on the bundled Bolivia panel — 112 provinces with PWT-anchored GDP per capita, 2012–2022 (from Rossi-Hansberg & Zhang’s local GDP, rescaled to Penn World Table 11.0) — asking the three standing questions of the convergence literature: are poorer provinces catching up, do spillovers carry growth across borders, and is the national gap narrowing?
The functions appear in the order an analysis actually runs: build the growth cross-section → estimate β without space → add spillovers → let the diagnostics pick the model → check robustness to W → σ-convergence → clubs → distribution dynamics → inequality and its decomposition → local heterogeneity. (The India case study runs this same arc on the flagship 520-district panel.)
Note
Every .interpret() below reads an association, never a cause — estimates from observational regional data describe patterns, not policy effects. The Learn module demonstrates each estimator on simulated data where the truth is planted.
Stage 0 — Load and declare
import warningswarnings.filterwarnings("ignore")import numpy as npimport geometrics as gmgdf, df, df_dict = gm.data.load_bolivia() # 112 provinces x 2012-2022df = gm.set_labels(df, df_dict, set_panel=True) # labels + (gid, year), oncew = gm.make_weights(gdf) # queen contiguity, row-standardizeddf.head(3)
level
gid
name
engtype
gid1
name1
iso
country
year
threshold
...
pop_persons
gdp_scale
pop_scale
gdp_pwt
pop_pwt
gdppc
ln_gdppc
n_cells
n_cells_nearest
pop_nat
0
adm2
BOL.1.10_2
Zudáñez
Province
BOL.1_1
Chuquisaca
BOL
Bolivia
2012
0_05
...
25950
873.436447
0.000001
117.574532
0.026367
4459.091741
8.402700
5
0
10350000
1
adm2
BOL.1.10_2
Zudáñez
Province
BOL.1_1
Chuquisaca
BOL
Bolivia
2013
0_05
...
24873
908.187148
0.000001
125.041489
0.025282
4945.867842
8.506308
5
0
10510000
2
adm2
BOL.1.10_2
Zudáñez
Province
BOL.1_1
Chuquisaca
BOL
Bolivia
2014
0_05
...
25215
904.053096
0.000001
131.373522
0.025640
5123.766779
8.541645
5
0
10670000
3 rows × 21 columns
Five provinces are fully censored in the source product (polygons but no panel rows) — geometrics warns and carries on; see the Bolivia dataset.
Stage 1 — The growth cross-section
Every convergence regression starts from the same frame: one row per province, its initial level and its annualized log growth. growth_cross_section builds it explicitly, so the β regression that follows has no hidden steps:
Do initially-poorer provinces grow faster? A negative slope of growth on the initial (log) level says yes; speed and half_life translate it into years:
Across 107 units, the growth of **gdppc** over a 10-period window is associated with its initial log level with a total slope of **-0.00913** (SE 0.00406), statistically significant at the 5% level.
The slope is negative — the β-convergence pattern: units that started lower tended to grow faster, narrowing initial gaps.
That slope implies a convergence speed of λ = 0.00957 per period and a half-life of about 72.4 periods — the time for half of an initial gap to close at this pace.
_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._
Stage 3 — β with spillovers (the spatial Durbin model)
Provinces are not islands: model="sdm" adds the spatial lags of outcome and covariates, and the convergence estimate becomes a LeSage-Pace total impact with direct and indirect (spillover) components — Monte-Carlo standard errors included:
Across 107 units, the growth of **gdppc** over a 10-period window is associated with its initial log level with a total slope of **-0.0121** (SE 0.00761), not statistically significant at conventional levels.
The slope is negative — the β-convergence pattern: units that started lower tended to grow faster, narrowing initial gaps.
That slope implies a convergence speed of λ = 0.0129 per period and a half-life of about 53.7 periods — the time for half of an initial gap to close at this pace.
The SDM decomposition splits the total into a direct component of -0.00909 (own initial level) and a spillover (indirect) component of -0.00303 operating through neighboring units, under the weights: queen contiguity, row-standardized, n=112.
The spatial-lag parameter ρ = 0.279 says each unit's growth moves together with its neighbors' growth, so part of the convergence pattern is shared across space rather than purely unit-by-unit.
_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._
Stage 4 — Which spatial model do the data ask for?
Rather than assuming the SDM, let the Lagrange-multiplier diagnostics inspect the OLS residuals and apply the Anselin-Florax decision rule:
Recommendation: error
A simple LM test rejects at alpha = 0.05 (LM lag = 4.07, p = 0.0438; LM error = 4.09, p = 0.0432) but neither robust form does, an ambiguous configuration. The larger simple statistic points to the error model; treat the choice as tentative and compare specifications directly.
The recommended specification is one model= switch away — and its full spreg table with the impact decomposition comes from analyze_spatial_model:
model = gm.analyze_spatial_model( cs, outcome="growth", covariates=["ln_initial"], gdf=gdf, w=w, model="durbin", n_draws=1000,)model.gt
Spatial durbin (sdm) model of growth
ML estimates | n = 107 | R² = 0.159 | AIC = -865.5
Estimate
Std. Error
z
p
CONSTANT
0.0974**
(0.0474)
2.05
0.0399
ln_initial
-0.0091***
(0.0025)
-3.70
0.0002
W_ln_initial
0.0004
(0.0052)
0.07
0.9460
W_growth
0.2787**
(0.1336)
2.09
0.0369
*** p<0.01, ** p<0.05, * p<0.10 (z-based)
W: queen contiguity, row-standardized, n=112
Stage 5 — Robust to the weights choice?
A conclusion that only holds under one definition of “neighbor” is fragile. analyze_spatial_model_by_weights re-estimates the same model under alternative weights and compares the impacts side by side:
The spatial Durbin (SDM) model was re-estimated under **3 alternative spatial weights** specifications; the table and figure compare the direct, indirect and total impacts of **ln_initial** against the **queen** baseline (total = -0.0121).
The total impact keeps the **same sign in all 3 specifications**, ranging from -0.0121 to -0.0073 (spread 0.00481) — the qualitative conclusion does not hinge on the weights choice.
Every specification's 95% interval covers the baseline estimate, so the alternatives are statistically indistinguishable from the baseline.
By AIC the best-fitting specification is **knn6** (AIC = -865.7); AIC comparisons are only meaningful across models estimated on the same sample.
_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._
Stage 6 — Is the gap narrowing? (σ-convergence)
β asks about catch-up; σ asks whether cross-sectional dispersion actually shrank. Both matter — fast catch-up can coexist with stable dispersion:
Across 107 units and 11 periods, the cross-sectional dispersion of **gdppc** (standard deviation of its log) narrowed by about 0.242% per period (not statistically significant at conventional levels).
A negative log-dispersion trend is σ-convergence: the units are becoming more alike.
All estimated measures (standard deviation, Gini, coefficient of variation) trend downward, so the narrowing is not an artifact of one dispersion metric.
_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._
Tip
Dispersion is measured on logs, so the series must be strictly positive — gdppc is. For a panel with zeros (India’s night lights), filter first: the India case study shows the pattern.
Stage 7 — One Bolivia or several? (convergence clubs)
Global convergence can fail while clubs of provinces converge to different steady states. The Phillips-Sul log(t) test and clustering find them from the data:
Global convergence of **ln_gdppc** is rejected for the 107 units (log(t) t = -24.3 ≤ -1.65): the panel is not heading toward one common path.
The Phillips-Sul clustering instead finds **9 convergence clubs** — groups whose members converge toward a shared, club-specific path while the clubs themselves stay apart.
Club sizes — Club 1: 4, Club 2: 7, Club 3: 18, Club 4: 23, Club 5: 19, Club 6: 25, Club 7: 2, Club 8: 2, Club 9: 2.
5 units fit no club (the divergent group) and follow paths of their own.
Club membership is a description of co-movement in the relative transition paths, not an explanation of why the groups differ.
_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._
clubs.fig_map if clubs.fig_map isnotNoneelse clubs.fig
Stage 8 — Distribution dynamics (Markov chains)
How mobile are provinces across the income distribution — and does mobility depend on the neighbors? (Requires the dynamics extra: pip install "geometrics[dynamics]".)
Markov transition summary — GDP per capita (2021 US$ PPP)
4 states (quantiles classes, per-period classification), 1,070 transitions
Measure
Value
Mobility indices
Shorrocks trace index (giddy measure 'P')
0.150
Prais determinant index (giddy measure 'D')
0.395
Bartholomew index (giddy measure 'B1')
0.150
Ergodic (steady-state) distribution
Long-run share in Q1
0.252
Long-run share in Q2
0.252
Long-run share in Q3
0.243
Long-run share in Q4
0.252
Expected sojourn time (periods)
Consecutive periods in Q1
10.385
Consecutive periods in Q2
6.136
Consecutive periods in Q3
7.647
Consecutive periods in Q4
16.875
The spatially conditioned chains need every mapped province observed in every period — so drop the five censored polygons from the geometry first (their weights are rebuilt on the subset automatically):
The spatial Markov chain splits **gdppc**'s 4-state transition matrix by the neighbors' position (spatial lag under queen contiguity, row-standardized, n=107), giving one matrix per neighborhood class (4 classes); values were expressed relative to each period's mean first.
The unconditional matrix keeps regions in place with average probability 0.9.
Conditioning on context, regions surrounded by **low-value neighbors** stay in place with average probability 0.929, while regions surrounded by **high-value neighbors** do so with 0.904 — movement is more common in prosperous neighborhoods.
The homogeneity tests (LR = 26.1, p = 0.347; Q = 26.2, p = 0.342; 24 degrees of freedom) are not statistically significant at conventional levels: the data do not reject identical transition dynamics across neighborhood contexts.
_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._
Stage 9 — Inequality: trend and decomposition
The same panel, read through inequality indices — including the spatial Gini, which splits inequality into neighbor and non-neighbor pairs:
Cross-sectional inequality in **gdppc** is tracked across 107 units over 11 periods (2012 to 2022).
The Gini index fell from 0.0832 to 0.0788 between the first and last period.
The Theil index fell from 0.0176 to 0.0153 between the first and last period.
The log-trend of the Gini index is -0.00221 per period (not statistically significant at conventional levels), so the movement is not distinguishable from a flat trend.
The log-trend of the Theil index is -0.00569 per period (not statistically significant at conventional levels), so the movement is not distinguishable from a flat trend.
In the latest period, inequality between *neighboring* units under the weights (queen contiguity, row-standardized, n=112) contributes 0.00346 to the Gini — about 4% of the overall Gini; the rest comes from pairs that are not neighbors.
The permutation test (p = 0.04) indicates the non-neighbor component of inequality is larger than expected under spatial randomness — differences line up with geography rather than being scattered among neighbors.
_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._
How much of provincial inequality is between departments rather than within them? The Theil index decomposes exactly:
The Theil index of **gdppc** splits additively into inequality **between** the 9 name1 groups and inequality **within** them (between + within = total, exactly).
In the latest period (2022), the total Theil index is 0.0153: about 5% of it lies between name1 groups and 95% within them — most inequality plays out among units inside the same name1.
Over the window, the between-group share fell from 6% to 5% — group means are pulling closer together relative to the differences inside groups.
_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._
Stage 10 — Local heterogeneity (GWR, briefly)
One β for all of Bolivia may hide local stories. Geographically weighted regression lets the convergence coefficient vary over space and maps it: