The Bolivia dataset

PWT-anchored local GDP at three spatial scales, 2012-2022

geometrics ships a second case study: BOL-005popAdj-PWTscaled, a Bolivia subnational GDP product built from the 0.25° gridded estimates of Rossi-Hansberg & Zhang (2026) under their most aggressive low-density censoring (0_05), rescaled so the national totals exactly equal Penn World Table 11.0 (rgdpo and pop). GDP and population are therefore in interpretable 2021 PPP US dollars, while the model’s relative spatial pattern is preserved exactly. The collection is delivered at three analysis scales — departments (ADM1, n=9), provinces (ADM2, n=112), and the raw grid cells (n=1,603) — for 2012–2022, with GADM 4.10 boundaries.

If you use it, cite the underlying estimates, the benchmark, and the boundaries: Rossi-Hansberg & Zhang (2026, J. Urban Economics 154), Feenstra, Inklaar & Timmer (2015, AER 105(10); PWT 11.0), and GADM 4.10. The full method (with the rescaling math) is documented in datasets/BOL-005popAdj-PWTscaled/README.md.

Three scales, one contract

Each loader returns the usual geometrics trio — ID-only geometry, long panel, data dictionary:

import warnings

warnings.filterwarnings("ignore")

import geometrics as gm

gdf, df, df_dict = gm.data.load_bolivia()          # 112 provinces (ADM2)
df = gm.set_labels(df, df_dict, set_panel=True)
print(f"provinces: {gdf.shape[0]} polygons, panel {df.shape}")

gdf1, df1, dd1 = gm.data.load_bolivia_departments()  # 9 departments (ADM1)
df1 = gm.set_labels(df1, dd1, set_panel=True)

Downloading file 'ADM1/bolivia_adm1_boundaries.gpkg' from 'https://raw.githubusercontent.com/quarcs-lab/geometrics/c23a78de67db2a72c3243c43887927675b685c7c/datasets/BOL-005popAdj-PWTscaled/ADM1/bolivia_adm1_boundaries.gpkg' to '/Users/carlosmendez/Library/Caches/geometrics'.

provinces: 112 polygons, panel (1177, 21)

Downloading file 'ADM1/bolivia_adm1.csv' from 'https://raw.githubusercontent.com/quarcs-lab/geometrics/c23a78de67db2a72c3243c43887927675b685c7c/datasets/BOL-005popAdj-PWTscaled/ADM1/bolivia_adm1.csv' to '/Users/carlosmendez/Library/Caches/geometrics'.
Downloading file 'ADM1/bolivia_adm1_data_def.csv' from 'https://raw.githubusercontent.com/quarcs-lab/geometrics/c23a78de67db2a72c3243c43887927675b685c7c/datasets/BOL-005popAdj-PWTscaled/ADM1/bolivia_adm1_data_def.csv' to '/Users/carlosmendez/Library/Caches/geometrics'.

The dictionaries ship with the data and document every column, including the scaling provenance:

df_dict[df_dict["var_name"].isin(["gid", "year", "gdp_pwt", "gdppc", "ln_gdppc"])]

	var_name	var_def	label	type	role	can_be_na
1	gid	GADM region code at this level (GID_1 for adm1...	Region code (GID)	entity	NaN	False
8	year	Calendar year of the estimate (2012-2022).	Year	time	NaN	False
14	gdp_pwt	GDP rescaled to PWT; national sum per year equ...	GDP rescaled to PWT (mil. 2021 US$ PPP)	numeric	NaN	False
16	gdppc	gdp_pwt / pop_pwt, 2021 US$/person; 0 for cens...	GDP per capita (2021 US$ PPP)	numeric	NaN	False
17	ln_gdppc	Natural log of gdppc; missing where gdppc <= 0...	Log GDP per capita	numeric	outcome	False

One data fact to know: five provinces have boundary polygons but no panel rows — all of their grid cells fall below the 0_05 population-density censoring threshold. geometrics’ alignment machinery warns about them and carries on; the warnings below are expected, not a bug.

The map

gm.explore_choropleth_map(df, "ln_gdppc", gdf=gdf, period=2022).fig

Convergence across provinces, 2012-2022

Did poorer provinces grow faster? First aspatially, then with the spatial Durbin model on queen-contiguity weights:

w = gm.make_weights(gdf)  # queen contiguity, row-standardized

ols = gm.analyze_beta_convergence(df, "gdppc", model="ols")
sdm = gm.analyze_beta_convergence(
    df, "gdppc", model="sdm", gdf=gdf, w=w, n_draws=2000
)
print(
    f"OLS beta: {ols.beta_total:.4f} (half-life {ols.half_life:.0f} yr)\n"
    f"SDM total: {sdm.beta_total:.4f} = direct {sdm.beta_direct:.4f} "
    f"+ indirect {sdm.beta_indirect:.4f} (rho = {sdm.rho:.2f})"
)

OLS beta: -0.0091 (half-life 72 yr)
SDM total: -0.0121 = direct -0.0091 + indirect -0.0030 (rho = 0.28)

sdm.fig

print(sdm.interpret())

Across 107 units, the growth of **gdppc** over a 10-period window is associated with its initial log level with a total slope of **-0.0121** (SE 0.00786), not statistically significant at conventional levels.
The slope is negative — the β-convergence pattern: units that started lower tended to grow faster, narrowing initial gaps.
That slope implies a convergence speed of λ = 0.0129 per period and a half-life of about 53.7 periods — the time for half of an initial gap to close at this pace.
The SDM decomposition splits the total into a direct component of -0.00909 (own initial level) and a spillover (indirect) component of -0.00303 operating through neighboring units, under the weights: queen contiguity, row-standardized, n=112.
The spatial-lag parameter ρ = 0.279 says each unit's growth moves together with its neighbors' growth, so part of the convergence pattern is shared across space rather than purely unit-by-unit.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

How much inequality lies between departments?

The province panel carries its parent department (name1), so the Theil between/within decomposition is one call:

theil = gm.analyze_theil_decomposition(df, "gdppc", "name1")
theil.fig

print(theil.interpret())

The Theil index of **gdppc** splits additively into inequality **between** the 9 name1 groups and inequality **within** them (between + within = total, exactly).

In the latest period (2022), the total Theil index is 0.0153: about 5% of it lies between name1 groups and 95% within them — most inequality plays out among units inside the same name1.

Over the window, the between-group share fell from 6% to 5% — group means are pulling closer together relative to the differences inside groups.

_These are associations, not causal effects. A causal reading needs a research design — see `explain('correlation_vs_causation')`._

Down to the grid

The raw 0.25° cells behind the admin aggregates — useful when administrative boundaries are themselves part of the question:

gdfg, dfg, ddg = gm.data.load_bolivia_grid()
dfg = gm.set_labels(dfg, ddg, set_panel=True)
print(f"cells: {gdfg.shape[0]}, panel {dfg.shape}")

gm.explore_choropleth_map(dfg, "ln_gdppc", gdf=gdfg, period=2022, tiles=None).fig

cells: 1603, panel (17633, 31)

And spatial structure at cell level — LISA clusters of log GDP per capita:

wg = gm.make_weights(gdfg)  # queen on the grid = rook + corners
lisa = gm.explore_lisa_cluster_map(dfg, "ln_gdppc", gdf=gdfg, w=wg, period=2022)
print(f"Moran's I = {lisa.moran_i:.2f} (p = {lisa.p_sim_global:.3f}); "
      f"HH cells: {lisa.n_hh}, LL cells: {lisa.n_ll}")
lisa.fig

Moran's I = 0.28 (p = 0.001); HH cells: 156, LL cells: 275

Where next

Beta and sigma convergence — the convergence toolkit in depth
Spatial spillovers — diagnostics, the spreg suite, robustness
The data model — bring your own (gdf, df, df_dict)
The other bundled case study: India