analyze_convergence_clubs

analyze_convergence_clubs(
    df,
    var,
    *,
    entity=None,
    time=None,
    gdf=None,
    hp_filter=True,
    hp_lambda=400.0,
    trim=0.3,
    tcrit=_TCRIT,
    cr=0.0,
    increment=0.05,
    max_cr=3.0,
    fraction=0.0,
    adjust=False,
    merge='ps',
    tiles='carto-positron',
    title=None,
)

Phillips-Sul log(t) convergence test and data-driven club clustering for a panel.

Runs the full club-convergence workflow on one variable: optionally smooth each unit’s series with the Hodrick-Prescott filter (lambda = 400 for annual data); form the relative transition path h_it = x_it / mean_i(x_it); run the log(t) regression test for the whole panel; and, when global convergence is rejected, apply the clustering algorithm to split the units into convergence clubs, then merge adjacent clubs that jointly converge. This answers the descriptive question “do these units form one converging group, several catch-up clubs, or none?”.

The variable is used as supplied — no log is taken — so for the canonical income case pass log GDP per capita (or log labor productivity). The panel must be balanced (every unit present in every period) because the HP filter needs a gap-free series.

Parameters

Name	Type	Description	Default
df	pd.DataFrame	Balanced panel data frame.	required
var	str	Numeric variable to analyse (e.g. `"log_gdppc"`). Used as supplied.	required
entity	str \| None	Panel identifiers. Default to those declared via :func:`geometrics.set_panel`.	`None`
time	str \| None	Panel identifiers. Default to those declared via :func:`geometrics.set_panel`.	`None`
gdf	gpd.GeoDataFrame \| None	Optional entity geometry; when given, the result carries a club-membership choropleth `fig_map` (`None` otherwise).	`None`
hp_filter	bool	Apply the Hodrick-Prescott filter per unit and analyse the trend (default). `False` analyses the variable as given (already smooth).	`True`
hp_lambda	float	HP smoothing parameter (`400` for annual data, the convergence-literature default).	`400.0`
trim	float	Initiating sample fraction `r` of the log(t) regression: the first `round(r*T)` periods are discarded. Phillips-Sul recommend `0.3` for small/moderate `T` and `0.2` for large `T`.	`0.3`
tcrit	float	One-sided convergence critical value for the t-statistic (`-1.65`, the 5% level).	`_TCRIT`
cr	float	Sieve inclusion threshold `c*` for adding units to a core group.	`0.0`
increment	float	Increment by which `cr` is raised (original PS-2007 refinement rule) when the assembled club fails its joint test.	`0.05`
max_cr	float	Ceiling for the raised `cr`.	`3.0`
fraction	float	Cross-section sort key: `0` (default) sorts by the last period; `> 0` sorts by the mean of the last `(1 - fraction)` share of periods (for noisy endpoints).	`0.0`
adjust	bool	Use the Schnurbus et al. (2016) club refinement (add the best candidate one at a time) instead of the original Phillips-Sul `cr`-increment rule.	`False`
merge	str	Adjacent-club merging after clustering: `"ps"` (default) applies the Phillips-Sul (2009) merge test iteratively until no clubs merge, `"single"` does one pass, `"none"` reports the raw clusters.	`'ps'`
tiles	str \| None	MapLibre basemap style for `fig_map` (`None` draws the vector backend).	`'carto-positron'`
title	str \| None	Title for the headline figure.	`None`

Returns

Name	Type	Description
	ConvergenceClubsResult	The tidy long `df` (`entity`, `time`, `value` = HP trend, `relative` = `h_it`, `club` with `0` = divergent); the within-club average figure `fig`; the all-paths figure `fig_paths`; the per-club small-multiples `fig_clubs`; the membership choropleth `fig_map` (`None` without `gdf`); the classification table `gt` / `summary` and the `membership` frame; the whole-panel `global_beta` / `global_tstat` and `converged` flag; and the club counts and run parameters.

Raises

Name	Type	Description
	KeyError	If `var` is not a column of `df`.
	TypeError	If `var` is not numeric.
	ValueError	If `trim` is out of `(0, 1)`, `merge` is unknown, the panel is unbalanced or too short/small, the per-period cross-sectional mean is (near) zero, or the global log(t) test is not estimable.

Notes

The log(t) test regresses, for :math:t = [rT] \ldots T,

.. math:: (H_1 / H_t) - 2 (t) = a + b t + _t,

where :math:H_t = N^{-1} \sum_i (h_{it} - 1)^2 is the cross-sectional variance of the relative transition paths. Under the null of convergence b = 2*alpha >= 0; a one-sided t_b > -1.65 fails to reject it. The standard error is the Phillips-Sul scalar long-run variance form with an Andrews (1991) quadratic-spectral HAC of the residuals. The clustering sorts units by their final value, forms a core group by maximising t_b, sieves in the remaining units, and recurses on the residual; adjacent clubs are then merged when they jointly converge. This is a faithful port of the Stata psecta package (Du 2017); see Phillips & Sul (2007, 2009) and Schnurbus et al. (2016).

Examples

Two planted clubs (units converge within their group, not across groups):

import numpy as np
import pandas as pd

from geometrics.clubs import analyze_convergence_clubs

rng = np.random.default_rng(0)
rows = []
for k, mu in enumerate((10.0, 8.5), start=1):
    for j in range(10):
        dev = rng.uniform(-0.4, 0.4)
        for t in range(1, 31):
            rows.append((f"c{k}u{j}", t, mu + dev * 0.9 ** (t - 1)))
df = pd.DataFrame(rows, columns=["unit", "year", "log_y"])
res = analyze_convergence_clubs(df, "log_y", entity="unit", time="year")
res.n_clubs, res.converged