Super Bowl Data Analysis¶
Introduction¶
This project uses real-world sports data from Pro Football Reference to explore and analyze the most recent Super Bowl (Super Bowl LIX, 2025). The analysis will answer a set of key performance-related questions using Python, the Pandas library, and data visualization.
We will use the Pro Football Reference Super Bowl page to extract game and player statistics, perform analysis, and derive insights.
Objectives¶
- Load and clean real-world sports data using
pandas
- Perform exploratory data analysis (EDA)
- Ask and answer 6 key questions about the Super Bowl game
- Visualize important stats
- Share insights with others
Dataset Overview¶
The data comes from Pro Football Reference, which is a public database of many extensive football statistics dating back to the inauguration of the National Football League.
Questions We’ll Explore¶
- Which team gained more total yards?
- Which team had better time of possession?
- Did turnovers affect the game outcome?
- Who was the most effective passer?
- Which player had the most rushing yards?
- What were the scoring patterns per quarter?
import pandas as pd
import matplotlib.pyplot as plt
team_stats = pd.read_csv("/workspaces/codespaces-jupyter/data/Team_Stats.csv", index_col=0)
player_stats = pd.read_csv("/workspaces/codespaces-jupyter/data/Player_Stats.csv", skiprows=1, index_col=0)
print("Team Stats:\n", team_stats, "\n")
print("Player Stats:\n", player_stats)
Team Stats: KAN PHI First Downs 12 21 Rush-Yds-TDs 11-49-0 45-135-1 Cmp-Att-Yd-TD-INT 21-32-257-3-2 17-23-221-2-1 Sacked-Yards 6-31 2-11 Net Pass Yards 226 210 Total Yards 275 345 Fumbles-Lost 2-1 0-0 Turnovers 3 1 Penalties-Yards 7-75 8-59 Third Down Conv. 3-11 3-12 Fourth Down Conv. 0-1 0-1 Time of Possession 23:02 36:58 Player Stats: Tm PassCmp PassAtt PassYds PassTD PassInt Sk \ Player Patrick Mahomes KAN 21 32 257 3 2 6 Kareem Hunt KAN 0 0 0 0 0 0 Samaje Perine KAN 0 0 0 0 0 0 Isiah Pacheco KAN 0 0 0 0 0 0 Xavier Worthy KAN 0 0 0 0 0 0 Travis Kelce KAN 0 0 0 0 0 0 DeAndre Hopkins KAN 0 0 0 0 0 0 JuJu Smith-Schuster KAN 0 0 0 0 0 0 Marquise Brown KAN 0 0 0 0 0 0 Noah Gray KAN 0 0 0 0 0 0 Jalen Hurts PHI 17 22 221 2 1 2 Kenny Pickett PHI 0 1 0 0 0 0 Saquon Barkley PHI 0 0 0 0 0 0 Kenneth Gainwell PHI 0 0 0 0 0 0 DeVonta Smith PHI 0 0 0 0 0 0 A.J. Brown PHI 0 0 0 0 0 0 Jahan Dotson PHI 0 0 0 0 0 0 Dallas Goedert PHI 0 0 0 0 0 0 Johnny Wilson PHI 0 0 0 0 0 0 SkYds LngPass Rate ... RushYds RushTD LngRush \ Player ... Patrick Mahomes 31 50 95.4 ... 25 0 8 Kareem Hunt 0 0 NaN ... 9 0 6 Samaje Perine 0 0 NaN ... 8 0 8 Isiah Pacheco 0 0 NaN ... 7 0 6 Xavier Worthy 0 0 NaN ... 0 0 0 Travis Kelce 0 0 NaN ... 0 0 0 DeAndre Hopkins 0 0 NaN ... 0 0 0 JuJu Smith-Schuster 0 0 NaN ... 0 0 0 Marquise Brown 0 0 NaN ... 0 0 0 Noah Gray 0 0 NaN ... 0 0 0 Jalen Hurts 11 46 119.7 ... 72 1 17 Kenny Pickett 0 0 39.6 ... -4 0 -1 Saquon Barkley 0 0 NaN ... 57 0 10 Kenneth Gainwell 0 0 NaN ... 10 0 4 DeVonta Smith 0 0 NaN ... 0 0 0 A.J. Brown 0 0 NaN ... 0 0 0 Jahan Dotson 0 0 NaN ... 0 0 0 Dallas Goedert 0 0 NaN ... 0 0 0 Johnny Wilson 0 0 NaN ... 0 0 0 Tgt Rec RecYds RecTD LngRec Fmb FL Player Patrick Mahomes 0 0 0 0 0 1 1 Kareem Hunt 1 1 5 0 5 0 0 Samaje Perine 1 0 0 0 0 0 0 Isiah Pacheco 2 1 5 0 5 0 0 Xavier Worthy 8 8 157 2 50 0 0 Travis Kelce 6 4 39 0 13 0 0 DeAndre Hopkins 5 2 18 1 11 0 0 JuJu Smith-Schuster 2 2 16 0 11 0 0 Marquise Brown 6 2 15 0 9 0 0 Noah Gray 1 1 2 0 2 0 0 Jalen Hurts 0 0 0 0 0 0 0 Kenny Pickett 0 0 0 0 0 0 0 Saquon Barkley 7 6 40 0 22 0 0 Kenneth Gainwell 0 0 0 0 0 0 0 DeVonta Smith 5 4 69 1 46 0 0 A.J. Brown 5 3 43 1 22 0 0 Jahan Dotson 3 2 42 0 27 0 0 Dallas Goedert 2 2 27 0 20 0 0 Johnny Wilson 1 0 0 0 0 0 0 [19 rows x 21 columns]
Question 1: Which team gained more total yards?¶
total_yards = team_stats.loc["Total Yards"]
print("Total Yards:\n", total_yards)
winner = total_yards.idxmax()
print(f"\nThe team with more total yards is: {winner} with {total_yards[winner]} yards.")
Total Yards: KAN 275 PHI 345 Name: Total Yards, dtype: object The team with more total yards is: PHI with 345 yards.
Answer: Based on the above, The Philadelphia Eagles gained more yards, which may indicate a stronger offense during the game.
Question 2: Which team had more time of possession?¶
def time_to_minutes(time_str):
minutes, seconds = map(int, time_str.split(':'))
return minutes + seconds / 60
time_possession = team_stats.loc["Time of Possession"].apply(time_to_minutes)
print("Time of Possession (in minutes):\n", time_possession)
winner = time_possession.idxmax()
print(f"\nThe team with more possession time is: {winner} with {time_possession[winner]:.2f} minutes.")
Time of Possession (in minutes): KAN 23.033333 PHI 36.966667 Name: Time of Possession, dtype: float64 The team with more possession time is: PHI with 36.97 minutes.
Answer: Based on the above, the Philadelphia Eagles had more time of possession, resulting in more opportunities to score, and in turn a significant indicator of why they put up so many points.
Question 3: Did turnovers affect the game outcome?¶
turnovers = team_stats.loc["Turnovers"]
print("Turnovers:\n", turnovers)
winner = total_yards.idxmax()
loser = [t for t in turnovers.index if t != winner][0]
print(f"\n{winner} won the yardage battle but had {turnovers[winner]} turnover(s).")
print(f"{loser} had {turnovers[loser]} turnover(s).")
Turnovers: KAN 3 PHI 1 Name: Turnovers, dtype: object PHI won the yardage battle but had 1 turnover(s). KAN had 3 turnover(s).
Answer: Based on the above, the Kansas City Chiefs had two more turnovers than the Philadelphia Eagles, although they lost by more then the max amount of points the other team could have gained from those two extra possessions, so the conclusion here is that although turnovers helped the game's final score look more lopsided, it did not make or break the outcome.
Question 4: Who was the most effective passer?¶
passers = player_stats.reset_index()[["Player", "Tm", "Rate"]].dropna()
passers["Rate"] = pd.to_numeric(passers["Rate"])
best_passer = passers.loc[passers["Rate"].idxmax()]
print("Most effective passer:\n", best_passer)
print(f"\nThe most effective passer was {best_passer['Player']} ({best_passer['Tm']}) with a rating of {best_passer['Rate']}.")
Most effective passer: Player Jalen Hurts Tm PHI Rate 119.7 Name: 10, dtype: object The most effective passer was Jalen Hurts (PHI) with a rating of 119.7.
Answer: Based on the above, Philadelphia Eagles Starting Quarterback Jalen Hurts was the most effective passer, boasting mostly better stats across the board than Kansas City Chiefs Starting Quarterback Patrick Mahomes, which mainly factors into why Jalen Hurts has the game high Passer Rating, and also why he won Super Bowl MVP.
Question 5: Which player had the most rushing yards?¶
rushers = player_stats.reset_index()[["Player", "Tm", "RushYds"]].dropna()
rushers["RushYds"] = pd.to_numeric(rushers["RushYds"], errors="coerce")
rushers = rushers[rushers["RushYds"] > 0]
top_rusher = rushers.loc[rushers["RushYds"].idxmax()]
print("Most effective rusher:\n", top_rusher)
print(f"\nTop rusher: {top_rusher['Player']} ({top_rusher['Tm']}) with {top_rusher['RushYds']} rushing yards.")
Most effective rusher: Player Jalen Hurts Tm PHI RushYds 72 Name: 10, dtype: object Top rusher: Jalen Hurts (PHI) with 72 rushing yards.
Answer: Based on the above, Philadelphia Eagles Starting Quarterback Jalen Hurts was the most effective rusher in this game as well, even outrushing fellow teammate and reigning Offensive Player of the Year Saquon Barkley, who broke the record this season for most rushing yards in one season (Postseason Included).
Question 6: What were the scoring patterns per quarter?¶
scores_by_quarter = {
"Quarter": ["Q1", "Q2", "Q3", "Q4"],
"KAN": [0, 0, 6, 16],
"PHI": [7, 17, 10, 6]
}
score_df = pd.DataFrame(scores_by_quarter)
plt.plot(score_df["Quarter"], score_df["KAN"], label="KAN", marker="o")
plt.plot(score_df["Quarter"], score_df["PHI"], label="PHI", marker="o")
plt.title("Scoring Patterns by Quarter")
plt.xlabel("Quarter")
plt.ylabel("Points")
plt.legend()
plt.grid(True)
plt.show()
Answer: Based on the above, the Philadelphia Eagles outscored the Kansas City Chiefs when it mattered, only surrendering 6 points when the starters were in the game, and 16 after they were pulled with 8 minutes to go in the game.
Summary & Insights¶
This analysis provides an accessible look into Super Bowl LIX using open sports data and Python tools. A few key takeaways:
- The Philadelphia Eagles outgained the Kansas City Chiefs in total yards, contributing to their success.
- Philadelphia Eagles Quarterback Jalen Hurts gained a combined 300 statistical yards, with his 72 yards on the ground being an all time record for most rushing yards by a quarterback in a Super Bowl.
- The Philadelphia Eagles turned the ball over 2 less times than the Kansas City Chiefs, but the score was so lopsided that the points gained from turnovers did little to affect the outcome of the game.