analyze_convergence_clubs
analyze_convergence_clubs(
df,
var,
*,
entity=None,
time=None,
gdf=None,
hp_filter=True,
hp_lambda=400.0,
trim=0.3,
tcrit=_TCRIT,
cr=0.0,
increment=0.05,
max_cr=3.0,
fraction=0.0,
adjust=False,
merge='ps',
tiles='carto-positron',
title=None,
)Phillips-Sul log(t) convergence test and data-driven club clustering for a panel.
Runs the full club-convergence workflow on one variable: optionally smooth each unit’s series with the Hodrick-Prescott filter (lambda = 400 for annual data); form the relative transition path h_it = x_it / mean_i(x_it); run the log(t) regression test for the whole panel; and, when global convergence is rejected, apply the clustering algorithm to split the units into convergence clubs, then merge adjacent clubs that jointly converge. This answers the descriptive question “do these units form one converging group, several catch-up clubs, or none?”.
The variable is used as supplied — no log is taken — so for the canonical income case pass log GDP per capita (or log labor productivity). The panel must be balanced (every unit present in every period) because the HP filter needs a gap-free series.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| df | pd.DataFrame | Balanced panel data frame. | required |
| var | str | Numeric variable to analyse (e.g. "log_gdppc"). Used as supplied. |
required |
| entity | str | None | Panel identifiers. Default to those declared via :func:geometrics.set_panel. |
None |
| time | str | None | Panel identifiers. Default to those declared via :func:geometrics.set_panel. |
None |
| gdf | gpd.GeoDataFrame | None | Optional entity geometry; when given, the result carries a club-membership choropleth fig_map (None otherwise). |
None |
| hp_filter | bool | Apply the Hodrick-Prescott filter per unit and analyse the trend (default). False analyses the variable as given (already smooth). |
True |
| hp_lambda | float | HP smoothing parameter (400 for annual data, the convergence-literature default). |
400.0 |
| trim | float | Initiating sample fraction r of the log(t) regression: the first round(r*T) periods are discarded. Phillips-Sul recommend 0.3 for small/moderate T and 0.2 for large T. |
0.3 |
| tcrit | float | One-sided convergence critical value for the t-statistic (-1.65, the 5% level). |
_TCRIT |
| cr | float | Sieve inclusion threshold c* for adding units to a core group. |
0.0 |
| increment | float | Increment by which cr is raised (original PS-2007 refinement rule) when the assembled club fails its joint test. |
0.05 |
| max_cr | float | Ceiling for the raised cr. |
3.0 |
| fraction | float | Cross-section sort key: 0 (default) sorts by the last period; > 0 sorts by the mean of the last (1 - fraction) share of periods (for noisy endpoints). |
0.0 |
| adjust | bool | Use the Schnurbus et al. (2016) club refinement (add the best candidate one at a time) instead of the original Phillips-Sul cr-increment rule. |
False |
| merge | str | Adjacent-club merging after clustering: "ps" (default) applies the Phillips-Sul (2009) merge test iteratively until no clubs merge, "single" does one pass, "none" reports the raw clusters. |
'ps' |
| tiles | str | None | MapLibre basemap style for fig_map (None draws the vector backend). |
'carto-positron' |
| title | str | None | Title for the headline figure. | None |
Returns
| Name | Type | Description |
|---|---|---|
| ConvergenceClubsResult | The tidy long df (entity, time, value = HP trend, relative = h_it, club with 0 = divergent); the within-club average figure fig; the all-paths figure fig_paths; the per-club small-multiples fig_clubs; the membership choropleth fig_map (None without gdf); the classification table gt / summary and the membership frame; the whole-panel global_beta / global_tstat and converged flag; and the club counts and run parameters. |
Raises
| Name | Type | Description |
|---|---|---|
| KeyError | If var is not a column of df. |
|
| TypeError | If var is not numeric. |
|
| ValueError | If trim is out of (0, 1), merge is unknown, the panel is unbalanced or too short/small, the per-period cross-sectional mean is (near) zero, or the global log(t) test is not estimable. |
Notes
The log(t) test regresses, for :math:t = [rT] \ldots T,
.. math:: (H_1 / H_t) - 2 (t) = a + b t + _t,
where :math:H_t = N^{-1} \sum_i (h_{it} - 1)^2 is the cross-sectional variance of the relative transition paths. Under the null of convergence b = 2*alpha >= 0; a one-sided t_b > -1.65 fails to reject it. The standard error is the Phillips-Sul scalar long-run variance form with an Andrews (1991) quadratic-spectral HAC of the residuals. The clustering sorts units by their final value, forms a core group by maximising t_b, sieves in the remaining units, and recurses on the residual; adjacent clubs are then merged when they jointly converge. This is a faithful port of the Stata psecta package (Du 2017); see Phillips & Sul (2007, 2009) and Schnurbus et al. (2016).
Examples
Two planted clubs (units converge within their group, not across groups):
import numpy as np
import pandas as pd
from geometrics.clubs import analyze_convergence_clubs
rng = np.random.default_rng(0)
rows = []
for k, mu in enumerate((10.0, 8.5), start=1):
for j in range(10):
dev = rng.uniform(-0.4, 0.4)
for t in range(1, 31):
rows.append((f"c{k}u{j}", t, mu + dev * 0.9 ** (t - 1)))
df = pd.DataFrame(rows, columns=["unit", "year", "log_y"])
res = analyze_convergence_clubs(df, "log_y", entity="unit", time="year")
res.n_clubs, res.converged