analyze_markov_transitions

analyze_markov_transitions(
    df,
    var,
    *,
    entity=None,
    time=None,
    k=5,
    scheme='quantiles',
    bins=None,
    per_period=True,
    relative=False,
    title=None,
)

Estimate a discrete Markov chain of movement between distribution states.

Each region’s var is discretized into k states (per period by default, so a state is a rank within that period’s cross-section) and every period-to-period move is pooled into a k-by-k transition-probability matrix (:class:giddy.markov.Markov). The result carries the ergodic (steady-state) distribution, expected sojourn times, and the Shorrocks / Prais / Bartholomew mobility indices of the matrix.

Parameters

Name Type Description Default
df pd.DataFrame Long-form panel with entity, time and var columns. The panel must be balanced in var (every entity observed in every period). required
var str Numeric variable whose distribution dynamics are analyzed. required
entity str | None Entity (unit) id column; defaults to the panel declared via :func:geometrics.set_panel. None
time str | None Time id column; defaults to the declared panel. None
k int Number of states (classes) to discretize into (default 5). 5
scheme str Classification scheme: "quantiles" (default), "equal_interval" or "fisher_jenks". Ignored when bins is given. 'quantiles'
bins Sequence[float] | None Explicit upper class bounds (:class:mapclassify.UserDefined); the same fixed intervals apply in every period and scheme / per_period are ignored. None
per_period bool Classify each period’s cross-section separately (default True, the distribution-dynamics convention: states are positions within the period’s distribution). False pools all n*t values into one classification. True
relative bool Divide var by its cross-sectional mean per period first (so 1.0 marks the period average). Default False. False
title str | None Figure title (a default naming the variable is used when None). None

Returns

Name Type Description
MarkovTransitionsResult The long panel with each (entity, period) state, the labelled transition matrix p and counts, the annotated heatmap fig, the summary table gt, the steady_state and sojourn series, and the shorrocks / prais / bartholomew mobility indices.

Raises

Name Type Description
ImportError If the optional giddy dependency is not installed.
KeyError If var is not a column of df.
TypeError If var is not numeric.
ValueError If k < 2, the scheme is unknown, the panel is unbalanced, or fewer than two periods are observed.

Notes

Mobility indices use :func:giddy.mobility.markov_mobility measure codes: shorrocks is measure "P" (the trace index :math:(k - \mathrm{tr}\,P)/(k-1)), prais is measure "D" (the determinant index :math:1 - |\det P|), and bartholomew is measure "B1" (the trace index weighted by the first period’s observed state distribution).

Examples

Three groups of regions that keep their income rank from year to year:

import numpy as np
import pandas as pd

from geometrics.distribution_dynamics import analyze_markov_transitions

rng = np.random.default_rng(0)
units = [f"r{i}" for i in range(9)]
base = np.repeat([1.0, 2.0, 3.0], 3)
df = pd.DataFrame(
    [
        {"region": u, "year": y, "income": b + rng.normal(0, 0.5)}
        for y in (2000, 2001, 2002, 2003)
        for u, b in zip(units, base)
    ]
)
res = analyze_markov_transitions(df, "income", entity="region", time="year", k=3)
res.p.round(2)