Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Analysis: Fastest-Inflating Food Categories (CPI & PPI)

Authors
Affiliations
University of California, Berkeley
University of California, Berkeley
University of California, Berkeley
University of California, Berkeley

Analysis Question: Which food categories have had the fastest increase in prices over the past 20–50 years?

We will explore this question using two preprocessed forecast datasets:

  • Historical Consumer Price Index (CPI) forecast series

  • Historical Producer Price Index (PPI) forecast series

Each contains annual percent-change forecasts by food-related category from 1974–2024.

To quantify “fastest increase,” we will measure the average annual percent change in each category over different time windows:

  • A long-run window (~50 years): 1974–2024*

  • A recent window (20 years): 2004–2024

For each dataset (CPI and PPI), we will:

  1. Convert wide data to tidy long format (Year, category, pct_change)

  2. Compute mean annual inflation by category within each window

  3. Rank categories and identify the top 5 fastest-inflating categories

  4. Compare whether categories that inflate quickly in CPI also do so in PPI

Imports + Load Data

import os
import pandas as pd
import matplotlib.pyplot as plt
import sys
sys.path.append('..')  # Just add the parent directory for utils
from utils.data_loader import load_inflation_data
from utils.transformers import reshape_to_long_format

plt.rcParams["figure.figsize"] = (14, 6)
plt.rcParams["axes.grid"] = True

# Load processed data
cpi_wide, ppi_wide = load_inflation_data()

display(cpi_wide.head()) 
display(ppi_wide.head())
Loading...
Loading...

Convert to tidy long format

The preprocessed files are in wide format (one column per category). For analysis, we will convert them to long format with columns:

  • Year

  • category

  • pct_change (annual percent change forecast)

cpi_long = reshape_to_long_format(cpi_wide)
ppi_long = reshape_to_long_format(ppi_wide)

display(cpi_long.head())
display(ppi_long.head())
Loading...
Loading...

Defining the metric: average annual percent change

For each category and time window, we compute `pct_change,’ which is the annual percent change forecast in that category. A higher value means the category is, on average, expected to experience faster price growth in that period.

# Helper function:
def mean_inflation_by_category(df_long: pd.DataFrame,
                               start_year: int,
                               end_year: int) -> pd.Series:
    """
    Compute mean annual percent change per category within a given year window.

    Parameters
    ----------
    df_long : pd.DataFrame
        Long-format data with columns: Year, category, pct_change.
    start_year : int
        Inclusive start of the window.
    end_year : int
        Inclusive end of the window.

    Returns
    -------
    pd.Series
        Mean pct_change per category, sorted descending order.
    """
    window = df_long[
        (df_long["Year"] >= start_year) & (df_long["Year"] <= end_year)
    ]
    means = (
        window.groupby("category")["pct_change"]
        .mean()
        .sort_values(ascending=False)
    )
    return means

# Define Time Windows:
full_start, full_end = 1974, 2024   # ~50-year window
recent_start, recent_end = 2004, 2024  # Last 20 years

CPI: Fastest-inflating categories (long-run vs last 20 years)

We first look at CPI forecasts and compute average annual inflation by category for:

  • 1974–2024 (approximately 50 years)

  • 2004–2024 (recent 20-year window)

Compute Means

cpi_mean_full = mean_inflation_by_category(cpi_long, full_start, full_end)
cpi_mean_recent = mean_inflation_by_category(cpi_long, recent_start, recent_end)

display(cpi_mean_full.head())
display(cpi_mean_recent.head())
category Sugar_and_sweets 4.588235 Fresh_fruits 4.576471 Cereals_and_bakery_products 4.321569 Food_away_from_home 4.278431 Fresh_fruits_and_vegetables 4.268627 Name: pct_change, dtype: float64
category Beef_and_veal 4.395238 Eggs 4.304762 Fats_and_oils 3.519048 Food_away_from_home 3.423810 Meats 3.376190 Name: pct_change, dtype: float64

Top 5 Tables

cpi_top5_full = cpi_mean_full.head(5)
cpi_top5_recent = cpi_mean_recent.head(5)

display(cpi_top5_full)
display(cpi_top5_recent)
category Sugar_and_sweets 4.588235 Fresh_fruits 4.576471 Cereals_and_bakery_products 4.321569 Food_away_from_home 4.278431 Fresh_fruits_and_vegetables 4.268627 Name: pct_change, dtype: float64
category Beef_and_veal 4.395238 Eggs 4.304762 Fats_and_oils 3.519048 Food_away_from_home 3.423810 Meats 3.376190 Name: pct_change, dtype: float64

Bar Plots

fig, ax = plt.subplots()
cpi_top5_full.plot(kind="barh", ax=ax)
plt.gca().invert_yaxis() # top to bottom
ax.set_title("CPI: Top 5 Fastest-Inflating Categories (1974–2024)")
ax.set_xlabel("Average annual percent change")
plt.tight_layout()
plt.savefig('../figures/cpi_top_5_inflating_categories_1974_2024.png')
plt.show()
<Figure size 1400x600 with 1 Axes>
fig, ax = plt.subplots()
cpi_top5_recent.plot(kind="barh", ax=ax)
plt.gca().invert_yaxis() # top to bottom
ax.set_title("CPI: Top 5 Fastest-Inflating Categories (2004–2024)")
ax.set_xlabel("Average annual percent change")
plt.tight_layout()
plt.savefig('../figures/cpi_top_5_inflating_categories_2004_2024.png')
plt.show()
<Figure size 1400x600 with 1 Axes>

PPI: Fastest-inflating categories (long-run vs last 20 years)

We will now repeat the same analysis using PPI forecasts, which focus on producer-level prices.

Compute means

ppi_mean_full = mean_inflation_by_category(ppi_long, full_start, full_end)
ppi_mean_recent = mean_inflation_by_category(ppi_long, recent_start, recent_end)

display(ppi_mean_full.head())
display(ppi_mean_recent.head())
category Farm_level_eggs 6.534694 Wholesale_fats_and_oils 4.131373 Farm_level_milk 3.476471 Farm_level_vegetables 3.450980 Wholesale_beef 3.409804 Name: pct_change, dtype: float64
category Farm_level_eggs 13.176190 Farm_level_wheat 5.128571 Farm_level_milk 4.914286 Wholesale_fats_and_oils 4.890476 Farm_level_vegetables 4.747619 Name: pct_change, dtype: float64

Top 5 Tables

ppi_top5_full = ppi_mean_full.head(5)
ppi_top5_recent = ppi_mean_recent.head(5)

display(ppi_top5_full)
display(ppi_top5_recent)
category Farm_level_eggs 6.534694 Wholesale_fats_and_oils 4.131373 Farm_level_milk 3.476471 Farm_level_vegetables 3.450980 Wholesale_beef 3.409804 Name: pct_change, dtype: float64
category Farm_level_eggs 13.176190 Farm_level_wheat 5.128571 Farm_level_milk 4.914286 Wholesale_fats_and_oils 4.890476 Farm_level_vegetables 4.747619 Name: pct_change, dtype: float64

Bar Plots

fig, ax = plt.subplots()
ppi_top5_full.plot(kind="barh", ax=ax)
plt.gca().invert_yaxis() # top to bottom
ax.set_title("PPI: Top 5 Fastest-Inflating Categories (1974–2024)")
ax.set_xlabel("Average annual percent change")
plt.tight_layout()
plt.savefig('../figures/ppi_top_5_inflating_categories_1974_2024.png')
plt.show()
<Figure size 1400x600 with 1 Axes>
fig, ax = plt.subplots()
ppi_top5_recent.plot(kind="barh", ax=ax)
plt.gca().invert_yaxis() # top to bottom
ax.set_title("PPI: Top 5 Fastest-Inflating Categories (2004–2024)")
ax.set_xlabel("Average annual percent change")
plt.tight_layout()
plt.savefig('../figures/ppi_top_5_inflating_categories_2004_2024.png')
plt.show()
<Figure size 1400x600 with 1 Axes>

Side-by-Side Comparison Tables

CPI:

cpi_q1_summary = pd.DataFrame({
    "CPI_top5_full": cpi_top5_full.index,
    "CPI_full_mean": cpi_top5_full.values,
    "CPI_top5_recent": cpi_top5_recent.index,
    "CPI_recent_mean": cpi_top5_recent.values,
})

cpi_q1_summary
Loading...

PPI:

ppi_q1_summary = pd.DataFrame({
    "PPI_top5_full": ppi_top5_full.index,
    "PPI_full_mean": ppi_top5_full.values,
    "PPI_top5_recent": ppi_top5_recent.index,
    "PPI_recent_mean": ppi_top5_recent.values,
})

ppi_q1_summary
Loading...

Combine CPI + PPI into a single comparison table

combined_q1_summary = pd.DataFrame({
    "CPI_top5_full": cpi_top5_full.index,
    "CPI_full_mean": cpi_top5_full.values,
    "PPI_top5_full": ppi_top5_full.index,
    "PPI_full_mean": ppi_top5_full.values,
    "CPI_top5_recent": cpi_top5_recent.index,
    "CPI_recent_mean": cpi_top5_recent.values,
    "PPI_top5_recent": ppi_top5_recent.index,
    "PPI_recent_mean": ppi_top5_recent.values,
})

combined_q1_summary
Loading...

Saving Results

# make a directory called food_price_trend_summary under ../outputs/
save_directory = "../outputs/food_price_trend_summary/"
os.makedirs(save_directory, exist_ok=True)

# Save CPI results:
cpi_mean_full.to_csv(save_directory + "cpi_mean_full_window.csv")
cpi_mean_recent.to_csv(save_directory + "cpi_mean_recent_window.csv")
cpi_q1_summary.to_csv(save_directory + "q1_cpi_top5_summary.csv", index=False)

# Save PPI results:
ppi_mean_full.to_csv(save_directory + "ppi_mean_full_window.csv")
ppi_mean_recent.to_csv(save_directory + "ppi_mean_recent_window.csv")
ppi_q1_summary.to_csv(save_directory + "q1_ppi_top5_summary.csv", index=False)

# Save combined CPI + PPI comparison table:
combined_q1_summary.to_csv(save_directory + "q1_combined_cpi_ppi_summary.csv",
                           index=False)

print("All Q1 outputs saved successfully!")
All Q1 outputs saved successfully!