analyze_theil_decomposition

analyze_theil_decomposition(
    df,
    var,
    group,
    *,
    entity=None,
    time=None,
    permutations=0,
    seed=12345,
    title=None,
)

Decompose the Theil index between and within a group partition, per period.

For every period the Theil index of var across units is split additively (:class:inequality.theil.TheilD) into a between-group component (inequality across the mean levels of the group partition, e.g. states) and a within-group component (inequality among units inside each group): theil = between + within exactly. The between_share tracks how much of total inequality is a group-level phenomenon. With permutations > 0 the between component gets a permutation pseudo p-value (:class:inequality.theil.TheilDSim): units are randomly reassigned to groups and p_between reports how often a random partition yields a between share at least as large.

Parameters

Name Type Description Default
df pd.DataFrame Long-form panel data frame. required
var str Numeric variable to decompose (strictly positive — the Theil index takes logarithms of shares). required
group str Partition column (e.g. a state id for district units). It must be constant within each entity across periods, and define at least two groups. required
entity str | None Panel identifiers. Default to those declared via :func:geometrics.set_panel. None
time str | None Panel identifiers. Default to those declared via :func:geometrics.set_panel. None
permutations int Number of permutations for the between-component inference (0 disables it and omits the p_between column). 0
seed int Seed for the permutation draws. TheilDSim has no seed parameter and draws from NumPy’s global RNG, so np.random.seed(seed) is called once before the per-period loop. 12345
title str | None Title for the figure. None

Returns

Name Type Description
TheilDecompositionResult Per-period frame df (time, theil, between, within, between_share, plus p_between when permutations > 0); the stacked between/within area fig with the between-share line on the secondary axis; the per-period gt table; and group / n_groups / permutations.

Raises

Name Type Description
KeyError If var or group is not a column of df.
TypeError If var is not numeric.
ValueError If group varies within an entity (the offenders are named), defines fewer than two groups, or var has non-positive values (the offending entities are named).

Examples

Two states with two districts each, over two years:

import pandas as pd

from geometrics.regional_inequality import analyze_theil_decomposition

df = pd.DataFrame(
    {
        "district": ["d1", "d2", "d3", "d4"] * 2,
        "state": ["north", "north", "south", "south"] * 2,
        "year": [2000] * 4 + [2001] * 4,
        "income": [10.0, 12.0, 30.0, 36.0, 11.0, 13.0, 33.0, 40.0],
    }
)
res = analyze_theil_decomposition(
    df, "income", "state", entity="district", time="year"
)
res.df[["time", "between_share"]].round(3)