analyze_markov_transitions
analyze_markov_transitions(
df,
var,
*,
entity=None,
time=None,
k=5,
scheme='quantiles',
bins=None,
per_period=True,
relative=False,
title=None,
)Estimate a discrete Markov chain of movement between distribution states.
Each region’s var is discretized into k states (per period by default, so a state is a rank within that period’s cross-section) and every period-to-period move is pooled into a k-by-k transition-probability matrix (:class:giddy.markov.Markov). The result carries the ergodic (steady-state) distribution, expected sojourn times, and the Shorrocks / Prais / Bartholomew mobility indices of the matrix.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| df | pd.DataFrame | Long-form panel with entity, time and var columns. The panel must be balanced in var (every entity observed in every period). |
required |
| var | str | Numeric variable whose distribution dynamics are analyzed. | required |
| entity | str | None | Entity (unit) id column; defaults to the panel declared via :func:geometrics.set_panel. |
None |
| time | str | None | Time id column; defaults to the declared panel. | None |
| k | int | Number of states (classes) to discretize into (default 5). | 5 |
| scheme | str | Classification scheme: "quantiles" (default), "equal_interval" or "fisher_jenks". Ignored when bins is given. |
'quantiles' |
| bins | Sequence[float] | None | Explicit upper class bounds (:class:mapclassify.UserDefined); the same fixed intervals apply in every period and scheme / per_period are ignored. |
None |
| per_period | bool | Classify each period’s cross-section separately (default True, the distribution-dynamics convention: states are positions within the period’s distribution). False pools all n*t values into one classification. |
True |
| relative | bool | Divide var by its cross-sectional mean per period first (so 1.0 marks the period average). Default False. |
False |
| title | str | None | Figure title (a default naming the variable is used when None). |
None |
Returns
| Name | Type | Description |
|---|---|---|
| MarkovTransitionsResult | The long panel with each (entity, period) state, the labelled transition matrix p and counts, the annotated heatmap fig, the summary table gt, the steady_state and sojourn series, and the shorrocks / prais / bartholomew mobility indices. |
Raises
| Name | Type | Description |
|---|---|---|
| ImportError | If the optional giddy dependency is not installed. |
|
| KeyError | If var is not a column of df. |
|
| TypeError | If var is not numeric. |
|
| ValueError | If k < 2, the scheme is unknown, the panel is unbalanced, or fewer than two periods are observed. |
Notes
Mobility indices use :func:giddy.mobility.markov_mobility measure codes: shorrocks is measure "P" (the trace index :math:(k - \mathrm{tr}\,P)/(k-1)), prais is measure "D" (the determinant index :math:1 - |\det P|), and bartholomew is measure "B1" (the trace index weighted by the first period’s observed state distribution).
Examples
Three groups of regions that keep their income rank from year to year:
import numpy as np
import pandas as pd
from geometrics.distribution_dynamics import analyze_markov_transitions
rng = np.random.default_rng(0)
units = [f"r{i}" for i in range(9)]
base = np.repeat([1.0, 2.0, 3.0], 3)
df = pd.DataFrame(
[
{"region": u, "year": y, "income": b + rng.normal(0, 0.5)}
for y in (2000, 2001, 2002, 2003)
for u, b in zip(units, base)
]
)
res = analyze_markov_transitions(df, "income", entity="region", time="year", k=3)
res.p.round(2)