Beta and sigma convergence (and clubs)

Do laggards catch up, is the gap narrowing, and does everyone converge to the same place?

Three questions, three tools. β-convergence asks whether units that start behind grow faster (analyze_beta_convergence); σ-convergence asks whether the cross-sectional spread is actually narrowing (analyze_sigma_convergence); and when neither gives a clean verdict, convergence clubs ask whether the panel splits into groups that each converge to their own path (analyze_convergence_clubs). This article runs all three on the bundled India panel — 520 districts observed by satellite nighttime lights, 1996-2010 — using the raw panel variable ntl_total (the paper’s per-capita replication lives in the India case study).

import warnings

warnings.filterwarnings("ignore")

import numpy as np
import pandas as pd

import geometrics as gm

gdf, df, df_dict = gm.data.load_india()
df = gm.set_labels(df, df_dict, set_panel=True)

Every concept in the library ships a built-in explainer. Here is how gm.explain introduces the β-convergence idea:

print(gm.explain("beta_convergence").to_markdown()[:960], "...")
### Beta convergence

**What it is.** β-convergence asks whether units that start *behind* grow *faster* and so catch up. The test regresses each unit's average growth rate over a horizon on its **initial level** — canonically the growth of GDP per capita on initial log GDP per capita. A **negative** slope β is convergence: lower starting points are associated with faster growth. The slope maps to a structural **speed of convergence** λ = -ln(1 + β·T) / T (per period) and a **half-life** ln 2 / λ, the time to close half of an initial gap. **Unconditional** (absolute) convergence uses the initial level alone; **conditional** convergence adds controls for each unit's steady-state determinants and, by the Frisch-Waugh-Lovell theorem, reads the convergence slope from a partial-regression scatter that holds those controls fixed. The same machinery works for any variable — income, schooling, health.

**When to use it.** Use it to summarise catch-up dyn ...

β-convergence, first ignoring space

analyze_beta_convergence builds the growth cross-section internally: for each district, total luminosity in levels at 1996 and 2010, and the annualized log growth between them, regressed on the initial log level. A negative slope is convergence.

ols = gm.analyze_beta_convergence(df, "ntl_total", model="ols")
ols.fig

The result object carries the headline scalars — the slope, the implied structural speed λ = -ln(1 + β·T)/T, and the half-life ln 2 / λ (the years needed to close half of an initial gap):

print(
    f"beta = {ols.beta_total:.4f} (SE {ols.se_total:.4f}), R2 = {ols.r2:.3f}, "
    f"N = {ols.n_obs}\n"
    f"speed = {ols.speed:.4f} per year -> half-life = {ols.half_life:.0f} years"
)
beta = -0.0140 (SE 0.0013), R2 = 0.298, N = 520
speed = 0.0156 per year -> half-life = 44 years
print(ols.interpret())
Across 520 units, the growth of **ntl_total** over a 14-period window is associated with its initial log level with a total slope of **-0.014** (SE 0.00133), statistically significant at the 1% level.
The slope is negative — the β-convergence pattern: units that started lower tended to grow faster, narrowing initial gaps.
That slope implies a convergence speed of λ = 0.0156 per period and a half-life of about 44.5 periods — the time for half of an initial gap to close at this pace.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

Adding spillovers: the spatial Durbin model

Districts are not islands — initial luminosity and its growth are both spatially clustered (see the India case study for the LISA maps). The model switch re-estimates the same regression with spreg’s spatial family; the paper’s choice is the spatial Durbin model (SDM) on 6-nearest-neighbor weights built, like the paper, on plain lon/lat centroids (crs=None):

w = gm.make_weights(gdf, method="knn", k=6, crs=None)

sdm = gm.analyze_beta_convergence(
    df, "ntl_total", model="sdm", gdf=gdf, w=w, n_draws=2000
)

With a spatial lag in the model, the raw coefficient is no longer the answer. The convergence estimate becomes a LeSage-Pace impact: a direct part (a district’s own initial level and its own growth), an indirect part (the neighborhood’s initial level — the spillover), and their total, with Monte-Carlo standard errors from n_draws draws (2,000 here for speed; the default is 10,000). res.impacts tabulates the decomposition for every regressor:

sdm.impacts
term direct se_direct indirect se_indirect total se_total
0 log_initial -0.009948 0.001246 -0.007151 0.003685 -0.017099 0.003427

Side by side with OLS — the pattern of the source paper’s Table 1:

comparison = pd.DataFrame(
    {
        "OLS": [ols.beta_total, np.nan, ols.beta_total, np.nan,
                ols.speed, ols.half_life],
        "SDM": [sdm.beta_direct, sdm.beta_indirect, sdm.beta_total, sdm.rho,
                sdm.speed, sdm.half_life],
    },
    index=["direct", "indirect", "total", "rho (spatial lag)",
           "speed (per yr)", "half-life (yr)"],
).round(4)
comparison
OLS SDM
direct -0.0140 -0.0099
indirect NaN -0.0072
total -0.0140 -0.0171
rho (spatial lag) NaN 0.7330
speed (per yr) 0.0156 0.0195
half-life (yr) 44.4788 35.4648
print(sdm.interpret())
Across 520 units, the growth of **ntl_total** over a 14-period window is associated with its initial log level with a total slope of **-0.0171** (SE 0.00343), statistically significant at the 1% level.
The slope is negative — the β-convergence pattern: units that started lower tended to grow faster, narrowing initial gaps.
That slope implies a convergence speed of λ = 0.0195 per period and a half-life of about 35.5 periods — the time for half of an initial gap to close at this pace.
The SDM decomposition splits the total into a direct component of -0.00995 (own initial level) and a spillover (indirect) component of -0.00715 operating through neighboring units, under the weights: 6-nearest-neighbor (geographic centroids), row-standardized, n=520.
The spatial-lag parameter ρ = 0.733 says each unit's growth moves together with its neighbors' growth, so part of the convergence pattern is shared across space rather than purely unit-by-unit.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

The OLS β understates catch-up: once the spatial lag (ρ ≈ 0.7) and the neighbors’ initial levels enter, the total impact is larger in magnitude than the OLS slope, and the implied convergence speed rises — part of every district’s catch-up is associated with its neighborhood, which OLS attributes to nothing. How these impacts are computed, tested and stress-checked against other weights is the subject of the spatial spillovers article.

Mapping who grew: growth_cross_section

The same one-row-per-unit growth table the regression uses is available directly — handy for mapping the dependent variable before modelling it. It returns a plain DataFrame (entity, initial, final, growth) with the panel entity already declared, so it feeds straight into explore_choropleth_map:

cs = gm.growth_cross_section(df, "ntl_total")
cs = gm.set_labels(cs, {"growth": "NTL growth (annualized log), 1996-2010"})
gm.explore_choropleth_map(cs, "growth", gdf=gdf).fig

σ-convergence: is the gap actually narrowing?

β-convergence is necessary but not sufficient for the distribution to compress — new shocks can re-spread it even while laggards catch up. analyze_sigma_convergence tracks the cross-sectional dispersion of the log of the variable per period (the standard deviation, the Gini, the coefficient of variation) and tests the trend of the log dispersion over time.

Because dispersion is measured on logs, the series must be strictly positive — and one district (Lahul and Spiti, Himachal Pradesh) records zero luminosity in some years. Pass the full panel and geometrics refuses, telling you exactly why:

try:
    gm.analyze_sigma_convergence(df, "ntl_total")
except ValueError as err:
    print(err)
analyze_sigma_convergence: 'ntl_total' has non-positive values — dispersion is measured on log values, so the variable must be strictly positive (pass levels)

So filter to the always-positive balanced panel first, exactly as the Explore page does — dropping the offending district (all its years, keeping the panel balanced), not just the offending rows:

bad = df.loc[df["ntl_total"] <= 0, "statedist"].unique()
pos = gm.set_labels(
    df[~df["statedist"].isin(bad)].copy(), df_dict, set_panel=True
)

sigma = gm.analyze_sigma_convergence(pos, "ntl_total")
sigma.fig
print(sigma.interpret())
Across 519 units and 6 periods, the cross-sectional dispersion of **ntl_total** (standard deviation of its log) narrowed by about 1.07% per period (statistically significant at the 1% level).
A negative log-dispersion trend is σ-convergence: the units are becoming more alike.
All estimated measures (standard deviation, Gini, coefficient of variation) trend downward, so the narrowing is not an artifact of one dispersion metric.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

Both answers agree here: dimmer districts grew faster (β), and the distribution narrowed (σ). That is not guaranteed — report both.

Convergence clubs: one destination, or several?

A single β can also paper over a split panel: some districts converging to a high path, others to a low one. The Phillips-Sul log(t) machinery tests whole-panel convergence and, when it is rejected, clusters the districts into data-driven convergence clubs from their relative transition paths. It runs on the same always-positive subset (the HP smoothing and relative transitions need a gap-free, strictly positive series); pass the matching geometry to get the club map. The clustering sieve refits thousands of log(t) regressions, so expect roughly a minute for the 519 districts — that is normal:

gdf_pos = gdf[~gdf["statedist"].isin(bad)].copy()

clubs = gm.analyze_convergence_clubs(pos, "ntl_total", gdf=gdf_pos)
print(
    f"global log-t = {clubs.global_tstat:.1f} (converged: {clubs.converged}) -> "
    f"{clubs.n_clubs} clubs, {clubs.n_divergent} divergent districts"
)
global log-t = -31.8 (converged: False) -> 10 clubs, 2 divergent districts

Global convergence is emphatically rejected — the districts sort into clubs, each converging to its own path. The membership map shows where those paths live:

clubs.fig
clubs.fig_map
print(clubs.interpret())
Global convergence of **ntl_total** is rejected for the 519 units (log(t) t = -31.8 ≤ -1.65): the panel is not heading toward one common path.
The Phillips-Sul clustering instead finds **10 convergence clubs** — groups whose members converge toward a shared, club-specific path while the clubs themselves stay apart.
Club sizes — Club 1: 4, Club 2: 15, Club 3: 28, Club 4: 61, Club 5: 83, Club 6: 60, Club 7: 133, Club 8: 49, Club 9: 80, Club 10: 4.
2 units fit no club (the divergent group) and follow paths of their own.
Club membership is a description of co-movement in the relative transition paths, not an explanation of why the groups differ.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

Where next

  • Spatial spillovers — the full spreg suite behind model="sdm": specification diagnostics, impact inference, and robustness to the weights choice
  • The India case study — this toolkit inside the complete replication arc, on the paper’s exact per-capita growth variable
  • The data model — bring your own (gdf, df, df_dict)