Analyze convergence and inequality

The Analyze module estimates what Explore described. This page is a case study on the bundled Bolivia panel — 112 provinces with PWT-anchored GDP per capita, 2012–2022 (from Rossi-Hansberg & Zhang’s local GDP, rescaled to Penn World Table 11.0) — asking the three standing questions of the convergence literature: are poorer provinces catching up, do spillovers carry growth across borders, and is the national gap narrowing?

The functions appear in the order an analysis actually runs: build the growth cross-section → estimate β without space → add spillovers → let the diagnostics pick the model → check robustness to W → σ-convergence → clubs → distribution dynamics → inequality and its decomposition → local heterogeneity. (The India case study runs this same arc on the flagship 520-district panel.)

Note

Every .interpret() below reads an association, never a cause — estimates from observational regional data describe patterns, not policy effects. The Learn module demonstrates each estimator on simulated data where the truth is planted.

Stage 0 — Load and declare

import warnings

warnings.filterwarnings("ignore")

import numpy as np

import geometrics as gm

gdf, df, df_dict = gm.data.load_bolivia()        # 112 provinces x 2012-2022
df = gm.set_labels(df, df_dict, set_panel=True)  # labels + (gid, year), once
w = gm.make_weights(gdf)                         # queen contiguity, row-standardized
df.head(3)

	level	gid	name	engtype	gid1	name1	iso	country	year	threshold	...	pop_persons	gdp_scale	pop_scale	gdp_pwt	pop_pwt	gdppc	ln_gdppc	n_cells	pop_nat
0	adm2	BOL.1.10_2	Zudáñez	Province	BOL.1_1	Chuquisaca	BOL	Bolivia	2012	0_05	...	25950	873.436447	0.000001	117.574532	0.026367	4459.091741	8.402700	5	10350000
1	adm2	BOL.1.10_2	Zudáñez	Province	BOL.1_1	Chuquisaca	BOL	Bolivia	2013	0_05	...	24873	908.187148	0.000001	125.041489	0.025282	4945.867842	8.506308	5	10510000
2	adm2	BOL.1.10_2	Zudáñez	Province	BOL.1_1	Chuquisaca	BOL	Bolivia	2014	0_05	...	25215	904.053096	0.000001	131.373522	0.025640	5123.766779	8.541645	5	10670000

3 rows × 21 columns

Five provinces are fully censored in the source product (polygons but no panel rows) — geometrics warns and carries on; see the Bolivia dataset.

Stage 1 — The growth cross-section

Every convergence regression starts from the same frame: one row per province, its initial level and its annualized log growth. growth_cross_section builds it explicitly, so the β regression that follows has no hidden steps:

cs = gm.growth_cross_section(df, "gdppc")
cs.head()

	gid	initial	final	growth
0	BOL.1.10_2	4459.091741	6151.072854	0.032168
1	BOL.1.1_2	4387.140378	5913.421678	0.029855
2	BOL.1.2_2	4882.403255	7274.147174	0.039869
3	BOL.1.3_2	4531.888566	6313.165089	0.033150
4	BOL.1.4_2	5109.765173	7034.930632	0.031973

Stage 2 — β-convergence, without space

Do initially-poorer provinces grow faster? A negative slope of growth on the initial (log) level says yes; speed and half_life translate it into years:

ols = gm.analyze_beta_convergence(df, "gdppc", model="ols")
print(
    f"beta = {ols.beta_total:.4f}  (speed {ols.speed:.3f}, "
    f"half-life {ols.half_life:.0f} yr)"
)
ols.fig

beta = -0.0091  (speed 0.010, half-life 72 yr)

print(ols.interpret())

Across 107 units, the growth of **gdppc** over a 10-period window is associated with its initial log level with a total slope of **-0.00913** (SE 0.00406), statistically significant at the 5% level.
The slope is negative — the β-convergence pattern: units that started lower tended to grow faster, narrowing initial gaps.
That slope implies a convergence speed of λ = 0.00957 per period and a half-life of about 72.4 periods — the time for half of an initial gap to close at this pace.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

Stage 3 — β with spillovers (the spatial Durbin model)

Provinces are not islands: model="sdm" adds the spatial lags of outcome and covariates, and the convergence estimate becomes a LeSage-Pace total impact with direct and indirect (spillover) components — Monte-Carlo standard errors included:

sdm = gm.analyze_beta_convergence(
    df, "gdppc", model="sdm", gdf=gdf, w=w, n_draws=1000
)
print(
    f"SDM total: {sdm.beta_total:.4f} = direct {sdm.beta_direct:.4f} "
    f"+ indirect {sdm.beta_indirect:.4f}  (rho = {sdm.rho:.2f})"
)

SDM total: -0.0121 = direct -0.0091 + indirect -0.0030  (rho = 0.28)

print(sdm.interpret())

Across 107 units, the growth of **gdppc** over a 10-period window is associated with its initial log level with a total slope of **-0.0121** (SE 0.00761), not statistically significant at conventional levels.
The slope is negative — the β-convergence pattern: units that started lower tended to grow faster, narrowing initial gaps.
That slope implies a convergence speed of λ = 0.0129 per period and a half-life of about 53.7 periods — the time for half of an initial gap to close at this pace.
The SDM decomposition splits the total into a direct component of -0.00909 (own initial level) and a spillover (indirect) component of -0.00303 operating through neighboring units, under the weights: queen contiguity, row-standardized, n=112.
The spatial-lag parameter ρ = 0.279 says each unit's growth moves together with its neighbors' growth, so part of the convergence pattern is shared across space rather than purely unit-by-unit.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

Stage 4 — Which spatial model do the data ask for?

Rather than assuming the SDM, let the Lagrange-multiplier diagnostics inspect the OLS residuals and apply the Anselin-Florax decision rule:

cs["ln_initial"] = np.log(cs["initial"])
cs["year"] = 2012
cs = gm.set_panel(cs, entity="gid", time="year")

diag = gm.analyze_spatial_diagnostics(
    cs, outcome="growth", covariates=["ln_initial"], gdf=gdf, w=w
)
print(f"Recommendation: {diag.recommendation}\n")
print(diag.reasoning)

Recommendation: error

A simple LM test rejects at alpha = 0.05 (LM lag = 4.07, p = 0.0438; LM error = 4.09, p = 0.0432) but neither robust form does, an ambiguous configuration. The larger simple statistic points to the error model; treat the choice as tentative and compare specifications directly.

The recommended specification is one model= switch away — and its full spreg table with the impact decomposition comes from analyze_spatial_model:

model = gm.analyze_spatial_model(
    cs,
    outcome="growth",
    covariates=["ln_initial"],
    gdf=gdf,
    w=w,
    model="durbin",
    n_draws=1000,
)
model.gt

	Estimate	Std. Error	z	p
Spatial durbin (sdm) model of growth
ML estimates \| n = 107 \| R² = 0.159 \| AIC = -865.5
CONSTANT	0.0974**	(0.0474)	2.05	0.0399
ln_initial	-0.0091***	(0.0025)	-3.70	0.0002
W_ln_initial	0.0004	(0.0052)	0.07	0.9460
W_growth	0.2787**	(0.1336)	2.09	0.0369
* p<0.01, p<0.05, * p<0.10 (z-based)
W: queen contiguity, row-standardized, n=112

Stage 5 — Robust to the weights choice?

A conclusion that only holds under one definition of “neighbor” is fragile. analyze_spatial_model_by_weights re-estimates the same model under alternative weights and compares the impacts side by side:

robust = gm.analyze_spatial_model_by_weights(
    cs,
    outcome="growth",
    covariates=["ln_initial"],
    gdf=gdf,
    weights={
        "queen": gm.make_weights(gdf, method="queen"),
        "knn4": gm.make_weights(gdf, method="knn", k=4),
        "knn6": gm.make_weights(gdf, method="knn", k=6),
    },
    model="durbin",
    n_draws=1000,
)
robust.fig

print(robust.interpret())

The spatial Durbin (SDM) model was re-estimated under **3 alternative spatial weights** specifications; the table and figure compare the direct, indirect and total impacts of **ln_initial** against the **queen** baseline (total = -0.0121).

The total impact keeps the **same sign in all 3 specifications**, ranging from -0.0121 to -0.0073 (spread 0.00481) — the qualitative conclusion does not hinge on the weights choice.

Every specification's 95% interval covers the baseline estimate, so the alternatives are statistically indistinguishable from the baseline.

By AIC the best-fitting specification is **knn6** (AIC = -865.7); AIC comparisons are only meaningful across models estimated on the same sample.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

Stage 6 — Is the gap narrowing? (σ-convergence)

β asks about catch-up; σ asks whether cross-sectional dispersion actually shrank. Both matter — fast catch-up can coexist with stable dispersion:

sigma = gm.analyze_sigma_convergence(df, "gdppc")
sigma.fig

print(sigma.interpret())

Across 107 units and 11 periods, the cross-sectional dispersion of **gdppc** (standard deviation of its log) narrowed by about 0.242% per period (not statistically significant at conventional levels).
A negative log-dispersion trend is σ-convergence: the units are becoming more alike.
All estimated measures (standard deviation, Gini, coefficient of variation) trend downward, so the narrowing is not an artifact of one dispersion metric.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

Tip

Dispersion is measured on logs, so the series must be strictly positive — gdppc is. For a panel with zeros (India’s night lights), filter first: the India case study shows the pattern.

Stage 7 — One Bolivia or several? (convergence clubs)

Global convergence can fail while clubs of provinces converge to different steady states. The Phillips-Sul log(t) test and clustering find them from the data:

clubs = gm.analyze_convergence_clubs(df, "ln_gdppc", gdf=gdf)
print(clubs.interpret())

Global convergence of **ln_gdppc** is rejected for the 107 units (log(t) t = -24.3 ≤ -1.65): the panel is not heading toward one common path.
The Phillips-Sul clustering instead finds **9 convergence clubs** — groups whose members converge toward a shared, club-specific path while the clubs themselves stay apart.
Club sizes — Club 1: 4, Club 2: 7, Club 3: 18, Club 4: 23, Club 5: 19, Club 6: 25, Club 7: 2, Club 8: 2, Club 9: 2.
5 units fit no club (the divergent group) and follow paths of their own.
Club membership is a description of co-movement in the relative transition paths, not an explanation of why the groups differ.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

clubs.fig_map if clubs.fig_map is not None else clubs.fig

Stage 8 — Distribution dynamics (Markov chains)

How mobile are provinces across the income distribution — and does mobility depend on the neighbors? (Requires the dynamics extra: pip install "geometrics[dynamics]".)

mkv = gm.analyze_markov_transitions(df, "gdppc", k=4)
mkv.gt

Measure	Value
Markov transition summary — GDP per capita (2021 US$ PPP)
4 states (quantiles classes, per-period classification), 1,070 transitions
Mobility indices
Shorrocks trace index (giddy measure 'P')	0.150
Prais determinant index (giddy measure 'D')	0.395
Bartholomew index (giddy measure 'B1')	0.150
Ergodic (steady-state) distribution
Long-run share in Q1	0.252
Long-run share in Q2	0.252
Long-run share in Q3	0.243
Long-run share in Q4	0.252
Expected sojourn time (periods)
Consecutive periods in Q1	10.385
Consecutive periods in Q2	6.136
Consecutive periods in Q3	7.647
Consecutive periods in Q4	16.875

The spatially conditioned chains need every mapped province observed in every period — so drop the five censored polygons from the geometry first (their weights are rebuilt on the subset automatically):

gdf_obs = gdf[gdf["gid"].isin(df["gid"])]
spm = gm.analyze_spatial_markov(df, "gdppc", gdf=gdf_obs, k=4, m=4)
print(spm.interpret())

The spatial Markov chain splits **gdppc**'s 4-state transition matrix by the neighbors' position (spatial lag under queen contiguity, row-standardized, n=107), giving one matrix per neighborhood class (4 classes); values were expressed relative to each period's mean first.
The unconditional matrix keeps regions in place with average probability 0.9.
Conditioning on context, regions surrounded by **low-value neighbors** stay in place with average probability 0.929, while regions surrounded by **high-value neighbors** do so with 0.904 — movement is more common in prosperous neighborhoods.
The homogeneity tests (LR = 26.1, p = 0.347; Q = 26.2, p = 0.342; 24 degrees of freedom) are not statistically significant at conventional levels: the data do not reject identical transition dynamics across neighborhood contexts.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

Stage 9 — Inequality: trend and decomposition

The same panel, read through inequality indices — including the spatial Gini, which splits inequality into neighbor and non-neighbor pairs:

ineq = gm.analyze_inequality_over_time(df, "gdppc", gdf=gdf, w=w)
ineq.fig

print(ineq.interpret())

Cross-sectional inequality in **gdppc** is tracked across 107 units over 11 periods (2012 to 2022).

The Gini index fell from 0.0832 to 0.0788 between the first and last period.

The Theil index fell from 0.0176 to 0.0153 between the first and last period.

The log-trend of the Gini index is -0.00221 per period (not statistically significant at conventional levels), so the movement is not distinguishable from a flat trend.

The log-trend of the Theil index is -0.00569 per period (not statistically significant at conventional levels), so the movement is not distinguishable from a flat trend.

In the latest period, inequality between *neighboring* units under the weights (queen contiguity, row-standardized, n=112) contributes 0.00346 to the Gini — about 4% of the overall Gini; the rest comes from pairs that are not neighbors.

The permutation test (p = 0.04) indicates the non-neighbor component of inequality is larger than expected under spatial randomness — differences line up with geography rather than being scattered among neighbors.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

How much of provincial inequality is between departments rather than within them? The Theil index decomposes exactly:

theil = gm.analyze_theil_decomposition(df, "gdppc", "name1")
theil.fig

print(theil.interpret())

The Theil index of **gdppc** splits additively into inequality **between** the 9 name1 groups and inequality **within** them (between + within = total, exactly).

In the latest period (2022), the total Theil index is 0.0153: about 5% of it lies between name1 groups and 95% within them — most inequality plays out among units inside the same name1.

Over the window, the between-group share fell from 6% to 5% — group means are pulling closer together relative to the differences inside groups.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

Stage 10 — Local heterogeneity (GWR, briefly)

One β for all of Bolivia may hide local stories. Geographically weighted regression lets the convergence coefficient vary over space and maps it:

gwr = gm.analyze_gwr(
    cs, outcome="growth", covariates=["ln_initial"], gdf=gdf
)
gwr.figs["ln_initial"]

Multiscale GWR (analyze_mgwr) lets each term choose its own bandwidth — see the reference and the India article for a full run.

Where next

Explore — the descriptive workflow that should precede all of this
Learn — every estimator above, demonstrated on simulated data with a planted truth
Spatial spillovers — the spreg suite in depth; Distribution dynamics; Regional inequality
The India case study — this arc on 520 districts