explore_distribution_over_time

explore_distribution_over_time(
    df,
    var,
    *,
    entity=None,
    time=None,
    relative=False,
    periods=None,
    kind='ridgeline',
    bandwidth=None,
    title=None,
)

Track how the cross-sectional distribution of one variable evolves over time.

A Gaussian kernel density of var is estimated per period (:class:scipy.stats.gaussian_kde) and evaluated on a single grid shared by all periods, so the densities are directly comparable. kind="ridgeline" stacks one filled density per period with a subtle vertical offset (newest period on top); kind="animated" shows a single density animated over the periods with a play button and slider.

Parameters

Name Type Description Default
df pd.DataFrame Long panel holding var per entity and period. required
var str Numeric column of df whose distribution is tracked. required
entity str | None Panel identifiers; default to the ids declared via :func:geometrics.set_panel. A time id is required. None
time str | None Panel identifiers; default to the ids declared via :func:geometrics.set_panel. A time id is required. None
relative bool Divide var by its cross-sectional mean per period before density estimation (the distribution-dynamics convention): 1.0 marks the period average and a dashed vertical line is drawn at 1. False
periods Sequence[Any] | None Subset of periods to include (default: all periods in df). Unknown periods raise :class:ValueError. None
kind Literal['ridgeline', 'animated'] "ridgeline" (stacked filled densities, one trace per period) or "animated" (one density trace animated over periods with a slider). 'ridgeline'
bandwidth float | str | None Kernel bandwidth passed to :class:scipy.stats.gaussian_kde as bw_method (a scalar factor or "scott" / "silverman"). None uses scipy’s default (Scott’s rule). None
title str | None Figure title. Defaults to a description built from the variable label. None

Returns

Name Type Description
DistributionOverTimeResult Frozen result with the tidy evaluation frame df (columns time, value, density), the themed fig, and notes.

Raises

Name Type Description
KeyError If var is not a column of df.
TypeError If var is not numeric.
ValueError If no time id resolves, kind is unknown, a requested period is absent, or a period has fewer than 2 distinct values.

Examples

Ridgeline of a small two-period panel:

import pandas as pd

from geometrics.spacetime import explore_distribution_over_time

df = pd.DataFrame(
    {
        "region": list("abcdefgh") * 2,
        "year": [2000] * 8 + [2010] * 8,
        "gdppc": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]
        + [2.0, 2.5, 3.5, 4.5, 5.0, 6.5, 7.0, 7.5],
    }
)
res = explore_distribution_over_time(df, "gdppc", entity="region", time="year")
len(res.fig.data)