analyze_theil_decomposition
analyze_theil_decomposition(
df,
var,
group,
*,
entity=None,
time=None,
permutations=0,
seed=12345,
title=None,
)Decompose the Theil index between and within a group partition, per period.
For every period the Theil index of var across units is split additively (:class:inequality.theil.TheilD) into a between-group component (inequality across the mean levels of the group partition, e.g. states) and a within-group component (inequality among units inside each group): theil = between + within exactly. The between_share tracks how much of total inequality is a group-level phenomenon. With permutations > 0 the between component gets a permutation pseudo p-value (:class:inequality.theil.TheilDSim): units are randomly reassigned to groups and p_between reports how often a random partition yields a between share at least as large.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| df | pd.DataFrame | Long-form panel data frame. | required |
| var | str | Numeric variable to decompose (strictly positive — the Theil index takes logarithms of shares). | required |
| group | str | Partition column (e.g. a state id for district units). It must be constant within each entity across periods, and define at least two groups. | required |
| entity | str | None | Panel identifiers. Default to those declared via :func:geometrics.set_panel. |
None |
| time | str | None | Panel identifiers. Default to those declared via :func:geometrics.set_panel. |
None |
| permutations | int | Number of permutations for the between-component inference (0 disables it and omits the p_between column). |
0 |
| seed | int | Seed for the permutation draws. TheilDSim has no seed parameter and draws from NumPy’s global RNG, so np.random.seed(seed) is called once before the per-period loop. |
12345 |
| title | str | None | Title for the figure. | None |
Returns
| Name | Type | Description |
|---|---|---|
| TheilDecompositionResult | Per-period frame df (time, theil, between, within, between_share, plus p_between when permutations > 0); the stacked between/within area fig with the between-share line on the secondary axis; the per-period gt table; and group / n_groups / permutations. |
Raises
| Name | Type | Description |
|---|---|---|
| KeyError | If var or group is not a column of df. |
|
| TypeError | If var is not numeric. |
|
| ValueError | If group varies within an entity (the offenders are named), defines fewer than two groups, or var has non-positive values (the offending entities are named). |
Examples
Two states with two districts each, over two years:
import pandas as pd
from geometrics.regional_inequality import analyze_theil_decomposition
df = pd.DataFrame(
{
"district": ["d1", "d2", "d3", "d4"] * 2,
"state": ["north", "north", "south", "south"] * 2,
"year": [2000] * 4 + [2001] * 4,
"income": [10.0, 12.0, 30.0, 36.0, 11.0, 13.0, 33.0, 40.0],
}
)
res = analyze_theil_decomposition(
df, "income", "state", entity="district", time="year"
)
res.df[["time", "between_share"]].round(3)