The India case study

Regional growth, convergence, and spatial spillovers — a reproducible view from outer space

This article replicates and extends the analysis of “Regional growth, convergence, and spatial spillovers in India” (Mendez, Kabiraj & Li; building on Chanda & Kabiraj 2020, World Development): 520 Indian districts observed by radiance-calibrated DMSP-OLS nighttime lights between 1996 and 2010, used as a satellite proxy for economic activity.

Three questions organize everything:

  1. Convergence — do dimmer (poorer) districts grow faster than brighter ones?
  2. Spatial dependence — do neighboring districts light up together?
  3. Spillovers — does a neighborhood’s brightness help local growth?
import warnings

warnings.filterwarnings("ignore")

import geometrics as gm

gdf, df, df_dict = gm.data.load_india()
df = gm.set_labels(df, df_dict, set_panel=True)
print(f"{gdf.shape[0]} districts x {df['year'].nunique()} years; "
      f"{df_dict.shape[0]} documented variables")
520 districts x 6 years; 28 documented variables

A view from space

Total district luminosity, classified with Fisher-Jenks. The animation steps through all six satellite years with a pooled classification, so colors are comparable across frames:

gm.explore_choropleth_map(df, "ntl_total", gdf=gdf, animate=True).fig

Spatial dependence

The paper’s weights are 6 nearest neighbors (built, like the paper, on plain lon/lat centroids — pass crs=None; the geometrics default would project first):

w = gm.make_weights(gdf, method="knn", k=6, crs=None)

lisa_initial = gm.explore_lisa_cluster_map(
    df, "log_ntl_pc_1996", gdf=gdf, w=w, period=1996
)
lisa_initial.fig

Initial luminosity is strongly clustered — the paper reports Moran’s I = 0.73:

print(f"Moran's I (initial log luminosity pc): {lisa_initial.moran_i:.2f} "
      f"(pseudo p = {lisa_initial.p_sim_global:.3f})")
print(f"High-High districts: {lisa_initial.n_hh}, "
      f"Low-Low: {lisa_initial.n_ll}")
Moran's I (initial log luminosity pc): 0.73 (pseudo p = 0.001)
High-High districts: 169, Low-Low: 101

And so is growth:

growth_lisa = gm.explore_lisa_cluster_map(
    df.query("year == 1996"), "growth_ntl_pc_9610", gdf=gdf, w=w
)
print(f"Moran's I (growth 1996-2010): {growth_lisa.moran_i:.2f} "
      f"(pseudo p = {growth_lisa.p_sim_global:.3f})")
growth_lisa.fig
Moran's I (growth 1996-2010): 0.60 (pseudo p = 0.001)

Convergence: OLS vs the spatial Durbin model

The paper’s dependent variable is the per-capita luminosity growth rate 1996-2010, shipped verbatim by load_india() (an honest per-capita panel is impossible — district population exists only for 1996 and 2001 — so the paper’s pre-computed columns are carried unchanged). To run the paper’s exact cross-section through the panel API, rebuild a two-period panel whose growth reproduces the paper’s dependent variable identically:

import numpy as np
import pandas as pd

HORIZON = 14  # 1996 -> 2010
base = df.query("year == 1996")[
    ["statedist", "state", "district", "ntl_pc_1996", "growth_ntl_pc_9610"]
]
paper_panel = pd.concat(
    [
        base.assign(year=1996, ntl_pc=base["ntl_pc_1996"]),
        base.assign(
            year=2010,
            ntl_pc=base["ntl_pc_1996"]
            * np.exp(HORIZON * base["growth_ntl_pc_9610"]),
        ),
    ],
    ignore_index=True,
)
paper_panel = gm.set_panel(paper_panel, entity="statedist", time="year")

Unconditional convergence, first ignoring space, then with the SDM (the paper’s Table 1, Model 1):

ols = gm.analyze_beta_convergence(paper_panel, "ntl_pc", model="ols")
sdm = gm.analyze_beta_convergence(
    paper_panel, "ntl_pc", model="sdm", gdf=gdf, w=w, n_draws=5000
)

summary = pd.DataFrame(
    {
        "OLS": [ols.beta_total, np.nan, ols.beta_total, ols.speed, ols.half_life],
        "SDM": [sdm.beta_direct, sdm.beta_indirect, sdm.beta_total,
                sdm.speed, sdm.half_life],
    },
    index=["direct", "indirect", "total", "speed (per yr)", "half-life (yr)"],
).round(4)
summary
OLS SDM
direct -0.0199 -0.0212
indirect NaN -0.0005
total -0.0199 -0.0217
speed (per yr) 0.0233 0.0259
half-life (yr) 29.7493 26.7398

The headline finding: spatial spillovers raise the estimated speed of convergence. Part of every district’s catch-up arrives through its neighborhood — the indirect impact — which OLS attributes to nothing.

sdm.fig
print(sdm.interpret())
Across 520 units, the growth of **ntl_pc** over a 14-period window is associated with its initial log level with a total slope of **-0.0217** (SE 0.00611), statistically significant at the 1% level.
The slope is negative — the β-convergence pattern: units that started lower tended to grow faster, narrowing initial gaps.
That slope implies a convergence speed of λ = 0.0259 per period and a half-life of about 26.7 periods — the time for half of an initial gap to close at this pace.
The SDM decomposition splits the total into a direct component of -0.0212 (own initial level) and a spillover (indirect) component of -0.000534 operating through neighboring units, under the weights: 6-nearest-neighbor (geographic centroids), row-standardized, n=520.
The spatial-lag parameter ρ = 0.797 says each unit's growth moves together with its neighbors' growth, so part of the convergence pattern is shared across space rather than purely unit-by-unit.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

Which spatial model do the data ask for?

diag = gm.analyze_spatial_diagnostics(
    df.query("year == 1996"),
    outcome="growth_ntl_pc_9610",
    covariates=["log_ntl_pc_1996"],
    gdf=gdf,
    w=w,
)
print(diag.recommendation)
print(diag.reasoning)
diag.gt
error
At least one simple LM test rejects at alpha = 0.05, so the robust forms decide: robust LM error remains significant (statistic = 124.35, p = 7.06e-29) while robust LM lag does not (p = 0.298). The Anselin-Florax rule reads this as spatially correlated disturbances, pointing to the spatial error (SEM) model.
Spatial dependence diagnostics for NTL per capita growth (1996-2010)
OLS residual tests | W: 6-nearest-neighbor (geographic centroids), row-standardized, n=520
Statistic df p-value
Moran's I (residuals) 0.5831 -- 0.0000
LM lag 452.2328 1 0.0000
LM error 575.4996 1 0.0000
Robust LM lag 1.0839 1 0.2978
Robust LM error 124.3508 1 0.0000
LM SARMA 576.5836 2 0.0000
Anselin-Florax recommendation: error

Robustness to the weights choice

The paper re-estimates its preferred SDM under seven alternative weights (notebook c07). Here are four:

alt = {
    "knn4": gm.make_weights(gdf, method="knn", k=4, crs=None),
    "knn6": w,
    "knn8": gm.make_weights(gdf, method="knn", k=8, crs=None),
    "queen": gm.make_weights(gdf, method="queen"),
}
robust = gm.analyze_spatial_model_by_weights(
    df.query("year == 1996"),
    outcome="growth_ntl_pc_9610",
    covariates=["log_ntl_pc_1996"],
    gdf=gdf,
    weights=alt,
    baseline="knn6",
    n_draws=2000,
)
robust.fig

Local convergence: GWR

Is the growth-initial association uniform across India? Geographically weighted regression maps the local convergence coefficient:

gwr = gm.analyze_gwr(
    df.query("year == 1996"),
    outcome="growth_ntl_pc_9610",
    covariates=["log_ntl_pc_1996"],
    gdf=gdf,
)
print(f"adaptive bandwidth: {gwr.bw:.0f} neighbors; local R2 mean "
      f"{gwr.df['local_r2'].mean():.2f}")
gwr.figs["log_ntl_pc_1996"]
adaptive bandwidth: 46 neighbors; local R2 mean 0.40

Distribution dynamics

Beyond the regression slope: how does the whole distribution move? (One district records zero luminosity in some years; log-based and relative measures use the always-positive panel.)

bad = df.loc[df["ntl_total"] <= 0, "statedist"].unique()
pos = gm.set_labels(
    df[~df["statedist"].isin(bad)].copy(), df_dict, set_panel=True
)
gdf_pos = gdf[~gdf["statedist"].isin(bad)].copy()
w_pos = gm.make_weights(gdf_pos, method="knn", k=6, crs=None)

gm.explore_distribution_over_time(pos, "ntl_total", relative=True).fig

Quintile-to-quintile mobility, then conditioned on the neighborhood:

mk = gm.analyze_markov_transitions(pos, "ntl_total", k=5, relative=True)
print(f"Shorrocks mobility: {mk.shorrocks:.2f} "
      f"(diagonal persistence {np.diag(mk.p).mean():.2f})")
mk.fig
Shorrocks mobility: 0.21 (diagonal persistence 0.83)
smk = gm.analyze_spatial_markov(pos, "ntl_total", gdf=gdf_pos, w=w_pos, k=4)
print(f"Homogeneity LR test: {smk.lr_stat:.1f} (p = {smk.lr_p:.2g}) — "
      "transition dynamics differ by neighborhood")
smk.fig
Homogeneity LR test: 73.6 (p = 6.3e-07) — transition dynamics differ by neighborhood
print(smk.interpret())
The spatial Markov chain splits **ntl_total**'s 4-state transition matrix by the neighbors' position (spatial lag under 6-nearest-neighbor (geographic centroids), row-standardized, n=519), giving one matrix per neighborhood class (4 classes); values were expressed relative to each period's mean first.
The unconditional matrix keeps regions in place with average probability 0.871.
Conditioning on context, regions surrounded by **low-value neighbors** stay in place with average probability 0.837, while regions surrounded by **high-value neighbors** do so with 0.896 — movement is more common in low-value neighborhoods.
The homogeneity tests (LR = 73.6, p = 6.26e-07; Q = 73.5, p = 6.3e-07; 24 degrees of freedom) are statistically significant at the 1% level: transition dynamics **differ across neighborhood contexts** — a region's mobility is associated with the state of its neighbors.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

Regional inequality

σ-convergence and the between/within split — how much of district inequality is between states?

sigma = gm.analyze_sigma_convergence(pos, "ntl_total")
sigma.fig
theil = gm.analyze_theil_decomposition(pos, "ntl_total", "state")
theil.fig
print(theil.interpret())
The Theil index of **ntl_total** splits additively into inequality **between** the 28 state groups and inequality **within** them (between + within = total, exactly).

In the latest period (2010), the total Theil index is 0.504: about 59% of it lies between state groups and 41% within them — differences across state means are the dominant layer of inequality.

Over the window, the between-group share fell from 62% to 59% — group means are pulling closer together relative to the differences inside groups.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

Convergence clubs

Finally, the Phillips-Sul log(t) machinery asks whether all districts share one steady-state path or sort into clubs:

clubs = gm.analyze_convergence_clubs(pos, "ntl_total", gdf=gdf_pos)
print(f"{clubs.n_clubs} clubs, {clubs.n_divergent} divergent districts "
      f"(whole-panel log-t = {clubs.global_tstat:.1f})")
clubs.fig_map
10 clubs, 2 divergent districts (whole-panel log-t = -31.8)

Sources

  • Mendez, C., Kabiraj, S., & Li, J. — Regional growth, convergence, and spatial spillovers in India: A reproducible view from outer space (repository, interactive manuscript)
  • Chanda, A., & Kabiraj, S. (2020). Shedding light on regional growth and convergence in India. World Development, 133.
  • Data: DMSP-OLS radiance-calibrated nighttime lights (NOAA/NGDC), district boundaries from the 2001 Census geography.