Regional inequality

Gini and Theil over time, the spatial Gini, and the between/within split

Convergence asks whether poor regions catch up; inequality analysis asks how far apart regions are right now, and whether that gap is closing. This article tracks regional inequality in the bundled Indian district panel — 520 districts observed by DMSP-OLS nighttime lights, 1996–2010 — with the PySAL inequality stack: level measures over time, the Rey–Smith spatial Gini, and the Theil between/within decomposition by state.

import warnings

warnings.filterwarnings("ignore")

import geometrics as gm

gdf, df, df_dict = gm.data.load_india()
df = gm.set_labels(df, df_dict, set_panel=True)

# log-based measures need strictly positive values (see below)
bad = df.loc[df["ntl_total"] <= 0, "statedist"].unique()
pos = gm.set_labels(
    df[~df["statedist"].isin(bad)].copy(), df_dict, set_panel=True
)
gdf_pos = gdf[~gdf["statedist"].isin(bad)].copy()
w_pos = gm.make_weights(gdf_pos, method="knn", k=6, crs=None)

Gini or Theil?

The Gini index compares every pair of regions: it is the mean absolute difference between two randomly drawn units, scaled by twice the mean, running from 0 (everyone equal) to (nearly) 1 (one unit holds everything). It is the most widely reported inequality measure, robust to how you group the data, and most sensitive to transfers around the middle of the distribution — but it does not decompose cleanly into parts.

The Theil index is an entropy measure: it weighs each region’s share of the total by the log of that share relative to an equal split. It is more sensitive to the top of the distribution, and its killer feature is exact additive decomposability — total inequality splits into a between-group plus a within-group component, with nothing left over. That is what makes it the workhorse for regional hierarchies like districts within states:

print(gm.explain("theil_decomposition").to_markdown()[:600], "...")

### Theil between/within decomposition

**What it is.** For any grouping of units (states, macro-regions, coastal/interior), the Theil index decomposes *exactly* into a **between-group** term — the inequality that would remain if every unit were replaced by its group mean — and a **within-group** term, the group-size-weighted average of inequality inside each group. The **between share** (between/total) is the headline: a high share says the story is group membership (geography, in regional applications); a low share says most inequality lives inside groups. Tracked over time, the shares revea ...

Strictly positive, or geometrics tells you why not

The Theil index takes logarithms of shares, so it is undefined at zero. One Indian district (Lahul and Spiti, up in the Himalayas) records zero luminosity in some years — and instead of a cryptic numpy warning, geometrics raises a ValueError that names the offenders:

try:
    gm.analyze_inequality_over_time(
        df, "ntl_total", measures=("gini", "theil", "cv")
    )
except ValueError as err:
    print(err)

analyze_inequality_over_time: the Theil index needs strictly positive 'ntl_total' values, but 3 row(s) are <= 0 (entities: ['Himachal PradeshLahul and Spiti'])

That is why the first chunk filtered to the always-positive panel (pos, 519 districts) exactly as the Explore page does, and rebuilt the geometry and weights to match.

Three measures, one trend test

analyze_inequality_over_time computes each requested measure per period, then regresses the log of each measure on time (OLS, HC1 standard errors). A negative, significant slope means inequality is narrowing — the inequality-narrative complement of σ-convergence:

ineq = gm.analyze_inequality_over_time(
    pos, "ntl_total", measures=("gini", "theil", "cv")
)
ineq.df.round(3)

	time	n_units	gini	theil	cv
0	1996	519	0.546	0.527	1.180
1	1999	519	0.559	0.556	1.238
2	2000	519	0.527	0.485	1.114
3	2004	519	0.546	0.529	1.221
4	2005	519	0.541	0.518	1.209
5	2010	519	0.536	0.504	1.178

ineq.fig

The trend table makes the verdict explicit:

ineq.gt

	Trend (per period)	Std. error	p-value	R²	Narrowing
Inequality trends: Total NTL
6 periods, 519 units
Gini index	-0.001305	0.001128	0.247	0.11	no
Theil index	-0.002901	0.002643	0.272	0.0995	no
Coefficient of variation	0.0006457	0.002361	0.784	0.00751	no
Trend = OLS slope of ln(measure) on time with HC1 standard errors. A negative, significant slope means cross-sectional inequality is narrowing over time.

print(ineq.interpret())

Cross-sectional inequality in **ntl_total** is tracked across 519 units over 6 periods (1996 to 2010).

The Gini index fell from 0.546 to 0.536 between the first and last period.

The Theil index fell from 0.527 to 0.504 between the first and last period.

The coefficient of variation was essentially unchanged at 1.18 between the first and last period.

The log-trend of the Gini index is -0.0013 per period (not statistically significant at conventional levels), so the movement is not distinguishable from a flat trend.

The log-trend of the Theil index is -0.0029 per period (not statistically significant at conventional levels), so the movement is not distinguishable from a flat trend.

The log-trend of the coefficient of variation is 0.000646 per period (not statistically significant at conventional levels), so the movement is not distinguishable from a flat trend.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

All three measures agree: district-level luminosity inequality is high (Gini around 0.54) and essentially flat over 1996–2010 — no narrowing the trend test can distinguish from noise.

The spatial Gini: inequality between neighbors

The Gini is a sum over pairs of regions, and every pair is either a neighbor pair or a non-neighbor pair under a spatial weights matrix. Rey & Smith (2013) split it exactly along that line. Pass the geometry and weights and analyze_inequality_over_time adds the decomposition per period:

sg = gm.analyze_inequality_over_time(
    pos,
    "ntl_total",
    measures=("gini", "theil", "cv"),
    gdf=gdf_pos,
    w=w_pos,
)
sg.df[["time", "n_units", "gini", "gini_spatial", "gini_spatial_p"]].round(4)

	time	n_units	gini	gini_spatial	gini_spatial_p
0	1996	519	0.5463	0.0039	0.01
1	1999	519	0.5593	0.0040	0.01
2	2000	519	0.5274	0.0036	0.01
3	2004	519	0.5456	0.0039	0.01
4	2005	519	0.5414	0.0039	0.01
5	2010	519	0.5363	0.0039	0.01

gini_spatial is the component of the overall Gini owed to differences between neighboring districts — here well under 1% of a Gini of ~0.54, so almost all pairwise inequality lives between districts that are not neighbors. Neighbors resemble each other; the big gaps are long-distance gaps. gini_spatial_p is a permutation pseudo p-value (99 permutations by default) testing whether the non-neighbor component exceeds what spatial randomness would produce — at p = 0.01 in every year, the spatial structure of inequality is no accident. The w_spec field records the weights used:

print(sg.w_spec)

6-nearest-neighbor (geographic centroids), row-standardized, n=519

How much inequality is between states?

Districts nest inside states, so the Theil index splits exactly into a between-state component (inequality across state means) and a within-state component (inequality among districts inside each state). With permutations=99, districts are randomly reassigned to states and p_between reports how often a random partition captures a between share at least as large:

theil = gm.analyze_theil_decomposition(
    pos, "ntl_total", "state", permutations=99
)
theil.df.round(4)

	time	theil	between	within	between_share	p_between
0	1996	0.5270	0.3281	0.1989	0.6225	0.01
1	1999	0.5556	0.3471	0.2085	0.6247	0.01
2	2000	0.4852	0.3089	0.1763	0.6367	0.01
3	2004	0.5287	0.3200	0.2087	0.6052	0.01
4	2005	0.5183	0.3121	0.2062	0.6021	0.01
5	2010	0.5041	0.2990	0.2051	0.5931	0.01

theil.fig

print(theil.interpret())

The Theil index of **ntl_total** splits additively into inequality **between** the 28 state groups and inequality **within** them (between + within = total, exactly).

In the latest period (2010), the total Theil index is 0.504: about 59% of it lies between state groups and 41% within them — differences across state means are the dominant layer of inequality.

Over the window, the between-group share fell from 62% to 59% — group means are pulling closer together relative to the differences inside groups.

The permutation test on the between component (p = 0.01) indicates the observed grouping captures more inequality than random reassignments of units to groups typically do.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

About 60% of district luminosity inequality is a between-state phenomenon, and that share drifted down (62% → 59%) over the window — state means pulled slightly closer together while within-state gaps held. The permutation p-values (0.01 in every period) say the state partition captures far more inequality than chance groupings do: in India, geography — which state you are in — is the dominant layer of regional inequality.

A lightweight companion: 32 states, one year

load_india_states() ships a small state-level cross-section (32 states and union territories, corrected DMSP-OLS lights over gridded population, 1992). With a single year there is no trend to test — analyze_inequality_over_time needs at least two periods — but a one-off snapshot takes three lines:

import pandas as pd
from inequality.gini import Gini
from inequality.theil import Theil

gdf_s, df_s, dict_s = gm.data.load_india_states()
df_s = gm.set_labels(df_s, dict_s, set_panel=True)

y = df_s["ntl_pc"].to_numpy(dtype=float)
pd.DataFrame(
    {
        "measure": ["Gini", "Theil", "CV"],
        "value": [Gini(y).g, Theil(y).T, y.std(ddof=1) / y.mean()],
    }
).round(3)

Downloading file 'maps/india32.geojson' from 'https://raw.githubusercontent.com/quarcs-lab/project2025s-py/b5688fe367af536da06880d97aacaebb3c09d29f/data/maps/india32.geojson' to '/Users/carlosmendez/Library/Caches/geometrics'.

	measure	value
0	Gini	0.393
1	Theil	0.248
2	CV	0.751

State-level inequality (Gini 0.39) is well below district-level inequality (0.54) — aggregation averages away the within-state gaps, which is precisely the between/ within story again. The map shows where the bright and dark states are:

gm.explore_choropleth_map(df_s, "log_ntl_pc", gdf=gdf_s, period=1992).fig

Where next

Distribution dynamics — the same panel through Markov and spatial Markov chains: who moves within the distribution, and does the neighborhood condition mobility?
The India case study — the full replication arc, with σ-convergence and this Theil decomposition in context
gm.explain("gini"), gm.explain("theil_index"), gm.explain("theil_decomposition") — the concept explainers quoted above