analyze_markov_transitions

analyze_markov_transitions(
    df,
    var,
    *,
    entity=None,
    time=None,
    k=5,
    scheme='quantiles',
    bins=None,
    per_period=True,
    relative=False,
    title=None,
)

Estimate a discrete Markov chain of movement between distribution states.

Each region’s var is discretized into k states (per period by default, so a state is a rank within that period’s cross-section) and every period-to-period move is pooled into a k-by-k transition-probability matrix (:class:giddy.markov.Markov). The result carries the ergodic (steady-state) distribution, expected sojourn times, and the Shorrocks / Prais / Bartholomew mobility indices of the matrix.

Parameters

Name	Type	Description	Default
df	pd.DataFrame	Long-form panel with entity, time and `var` columns. The panel must be balanced in `var` (every entity observed in every period).	required
var	str	Numeric variable whose distribution dynamics are analyzed.	required
entity	str \| None	Entity (unit) id column; defaults to the panel declared via :func:`geometrics.set_panel`.	`None`
time	str \| None	Time id column; defaults to the declared panel.	`None`
k	int	Number of states (classes) to discretize into (default 5).	`5`
scheme	str	Classification scheme: `"quantiles"` (default), `"equal_interval"` or `"fisher_jenks"`. Ignored when `bins` is given.	`'quantiles'`
bins	Sequence[float] \| None	Explicit upper class bounds (:class:`mapclassify.UserDefined`); the same fixed intervals apply in every period and `scheme` / `per_period` are ignored.	`None`
per_period	bool	Classify each period’s cross-section separately (default `True`, the distribution-dynamics convention: states are positions within the period’s distribution). `False` pools all `n*t` values into one classification.	`True`
relative	bool	Divide `var` by its cross-sectional mean per period first (so 1.0 marks the period average). Default `False`.	`False`
title	str \| None	Figure title (a default naming the variable is used when `None`).	`None`

Returns

Name	Type	Description
	MarkovTransitionsResult	The long panel with each (entity, period) `state`, the labelled transition matrix `p` and `counts`, the annotated heatmap `fig`, the summary table `gt`, the `steady_state` and `sojourn` series, and the `shorrocks` / `prais` / `bartholomew` mobility indices.

Raises

Name	Type	Description
	ImportError	If the optional `giddy` dependency is not installed.
	KeyError	If `var` is not a column of `df`.
	TypeError	If `var` is not numeric.
	ValueError	If `k < 2`, the scheme is unknown, the panel is unbalanced, or fewer than two periods are observed.

Notes

Mobility indices use :func:giddy.mobility.markov_mobility measure codes: shorrocks is measure "P" (the trace index :math:(k - \mathrm{tr}\,P)/(k-1)), prais is measure "D" (the determinant index :math:1 - |\det P|), and bartholomew is measure "B1" (the trace index weighted by the first period’s observed state distribution).

Examples

Three groups of regions that keep their income rank from year to year:

import numpy as np
import pandas as pd

from geometrics.distribution_dynamics import analyze_markov_transitions

rng = np.random.default_rng(0)
units = [f"r{i}" for i in range(9)]
base = np.repeat([1.0, 2.0, 3.0], 3)
df = pd.DataFrame(
    [
        {"region": u, "year": y, "income": b + rng.normal(0, 0.5)}
        for y in (2000, 2001, 2002, 2003)
        for u, b in zip(units, base)
    ]
)
res = analyze_markov_transitions(df, "income", entity="region", time="year", k=3)
res.p.round(2)