analyze_convergence_clubs

analyze_convergence_clubs(
    df,
    var,
    *,
    entity=None,
    time=None,
    gdf=None,
    hp_filter=True,
    hp_lambda=400.0,
    trim=0.3,
    tcrit=_TCRIT,
    cr=0.0,
    increment=0.05,
    max_cr=3.0,
    fraction=0.0,
    adjust=False,
    merge='ps',
    tiles='carto-positron',
    title=None,
)

Phillips-Sul log(t) convergence test and data-driven club clustering for a panel.

Runs the full club-convergence workflow on one variable: optionally smooth each unit’s series with the Hodrick-Prescott filter (lambda = 400 for annual data); form the relative transition path h_it = x_it / mean_i(x_it); run the log(t) regression test for the whole panel; and, when global convergence is rejected, apply the clustering algorithm to split the units into convergence clubs, then merge adjacent clubs that jointly converge. This answers the descriptive question “do these units form one converging group, several catch-up clubs, or none?”.

The variable is used as supplied — no log is taken — so for the canonical income case pass log GDP per capita (or log labor productivity). The panel must be balanced (every unit present in every period) because the HP filter needs a gap-free series.

Parameters

Name Type Description Default
df pd.DataFrame Balanced panel data frame. required
var str Numeric variable to analyse (e.g. "log_gdppc"). Used as supplied. required
entity str | None Panel identifiers. Default to those declared via :func:geometrics.set_panel. None
time str | None Panel identifiers. Default to those declared via :func:geometrics.set_panel. None
gdf gpd.GeoDataFrame | None Optional entity geometry; when given, the result carries a club-membership choropleth fig_map (None otherwise). None
hp_filter bool Apply the Hodrick-Prescott filter per unit and analyse the trend (default). False analyses the variable as given (already smooth). True
hp_lambda float HP smoothing parameter (400 for annual data, the convergence-literature default). 400.0
trim float Initiating sample fraction r of the log(t) regression: the first round(r*T) periods are discarded. Phillips-Sul recommend 0.3 for small/moderate T and 0.2 for large T. 0.3
tcrit float One-sided convergence critical value for the t-statistic (-1.65, the 5% level). _TCRIT
cr float Sieve inclusion threshold c* for adding units to a core group. 0.0
increment float Increment by which cr is raised (original PS-2007 refinement rule) when the assembled club fails its joint test. 0.05
max_cr float Ceiling for the raised cr. 3.0
fraction float Cross-section sort key: 0 (default) sorts by the last period; > 0 sorts by the mean of the last (1 - fraction) share of periods (for noisy endpoints). 0.0
adjust bool Use the Schnurbus et al. (2016) club refinement (add the best candidate one at a time) instead of the original Phillips-Sul cr-increment rule. False
merge str Adjacent-club merging after clustering: "ps" (default) applies the Phillips-Sul (2009) merge test iteratively until no clubs merge, "single" does one pass, "none" reports the raw clusters. 'ps'
tiles str | None MapLibre basemap style for fig_map (None draws the vector backend). 'carto-positron'
title str | None Title for the headline figure. None

Returns

Name Type Description
ConvergenceClubsResult The tidy long df (entity, time, value = HP trend, relative = h_it, club with 0 = divergent); the within-club average figure fig; the all-paths figure fig_paths; the per-club small-multiples fig_clubs; the membership choropleth fig_map (None without gdf); the classification table gt / summary and the membership frame; the whole-panel global_beta / global_tstat and converged flag; and the club counts and run parameters.

Raises

Name Type Description
KeyError If var is not a column of df.
TypeError If var is not numeric.
ValueError If trim is out of (0, 1), merge is unknown, the panel is unbalanced or too short/small, the per-period cross-sectional mean is (near) zero, or the global log(t) test is not estimable.

Notes

The log(t) test regresses, for :math:t = [rT] \ldots T,

.. math:: (H_1 / H_t) - 2 (t) = a + b t + _t,

where :math:H_t = N^{-1} \sum_i (h_{it} - 1)^2 is the cross-sectional variance of the relative transition paths. Under the null of convergence b = 2*alpha >= 0; a one-sided t_b > -1.65 fails to reject it. The standard error is the Phillips-Sul scalar long-run variance form with an Andrews (1991) quadratic-spectral HAC of the residuals. The clustering sorts units by their final value, forms a core group by maximising t_b, sieves in the remaining units, and recurses on the residual; adjacent clubs are then merged when they jointly converge. This is a faithful port of the Stata psecta package (Du 2017); see Phillips & Sul (2007, 2009) and Schnurbus et al. (2016).

Examples

Two planted clubs (units converge within their group, not across groups):

import numpy as np
import pandas as pd

from geometrics.clubs import analyze_convergence_clubs

rng = np.random.default_rng(0)
rows = []
for k, mu in enumerate((10.0, 8.5), start=1):
    for j in range(10):
        dev = rng.uniform(-0.4, 0.4)
        for t in range(1, 31):
            rows.append((f"c{k}u{j}", t, mu + dev * 0.9 ** (t - 1)))
df = pd.DataFrame(rows, columns=["unit", "year", "log_y"])
res = analyze_convergence_clubs(df, "log_y", entity="unit", time="year")
res.n_clubs, res.converged