您好,登录后才能下订单哦!
密码登录
登录注册
点击 登录注册 即表示同意《亿速云用户服务条款》
# 怎样使用R语言ggplot2画山脊图展示NBA球员出手距离的分布
## 引言
在篮球数据分析中,了解球员的出手距离分布是评估球员技术特点的重要方式。传统的直方图或密度图虽然能展示分布情况,但当需要同时比较多个球员或球队时,图表会显得拥挤。山脊图(Ridgeline Plot)通过重叠的密度曲线和垂直偏移,既能清晰展示个体分布特征,又便于群体比较。
本文将使用R语言的`ggplot2`和`ggridges`包,结合NBA球员的出手数据,演示如何创建专业的山脊图。我们将从数据获取、清洗、可视化到图表美化,完整讲解实现过程。
---
## 一、准备工作
### 1.1 安装必要R包
```r
install.packages(c("tidyverse", "ggridges", "nbastatR", "scales"))
library(tidyverse) # 包含ggplot2及相关数据处理工具
library(ggridges) # 山脊图专用扩展
library(nbastatR) # NBA数据接口
library(scales) # 坐标轴格式化
使用nbastatR
包获取2022-23赛季的投篮数据:
df_shots <- teams_shots(teams = "All",
seasons = 2023,
season_types = "Regular Season")
提取关键字段并计算出手距离(英尺):
df_clean <- df_shots %>%
filter(!is.na(shotDistance)) %>%
mutate(distance_ft = shotDistance) %>%
select(playerName, distance_ft, shotMade) %>%
group_by(playerName) %>%
filter(n() >= 100) %>% # 只保留出手100次以上的球员
ungroup()
ggplot(df_clean, aes(x = distance_ft, y = playerName)) +
geom_density_ridges()
此时会出现警告,因为默认参数不适合未分组的数据。我们需要: 1. 选择有代表性的球员 2. 调整带宽参数
top_players <- c("Stephen Curry", "LeBron James", "Giannis Antetokounmpo",
"Luka Doncic", "Nikola Jokic", "Joel Embiid")
df_top <- df_clean %>%
filter(playerName %in% top_players)
ggplot(df_top, aes(x = distance_ft, y = playerName)) +
geom_density_ridges(bandwidth = 1.5) + # 调整带宽
labs(title = "NBA球员出手距离分布",
x = "出手距离(英尺)",
y = NULL)
首先添加球员位置信息:
position_dict <- c("Stephen Curry" = "PG",
"LeBron James" = "SF",
"Giannis Antetokounmpo" = "PF",
"Luka Doncic" = "PG",
"Nikola Jokic" = "C",
"Joel Embiid" = "C")
df_top <- df_top %>%
mutate(position = factor(position_dict[playerName]))
然后使用分面填充:
ggplot(df_top, aes(x = distance_ft, y = playerName, fill = position)) +
geom_density_ridges(alpha = 0.7, bandwidth = 1.5) +
scale_fill_brewer(palette = "Set2") +
theme_minimal()
在每道山脊上标注平均出手距离:
mean_dist <- df_top %>%
group_by(playerName) %>%
summarise(mean_dist = mean(distance_ft))
ggplot(df_top, aes(x = distance_ft, y = playerName, fill = position)) +
geom_density_ridges(alpha = 0.7, bandwidth = 1.5) +
geom_text(data = mean_dist,
aes(x = 25, y = playerName,
label = paste0("Avg: ", round(mean_dist,1),"ft")),
hjust = 0, size = 3) +
scale_x_continuous(limits = c(0, 35)) # 扩展x轴范围
用半透明矩形标记三分线区域:
ggplot(df_top, aes(x = distance_ft, y = playerName)) +
annotate("rect", xmin = 23.75, xmax = 25,
ymin = 0, ymax = Inf, alpha = 0.2, fill = "blue") +
geom_density_ridges(fill = "orange", alpha = 0.7, bandwidth = 1.5) +
geom_vline(xintercept = 23.75, linetype = "dashed") +
labs(caption = "虚线为三分线距离(23.75英尺)")
final_plot <- ggplot(df_top, aes(x = distance_ft, y = reorder(playerName, distance_ft),
fill = after_stat(x))) +
geom_density_ridges_gradient(bandwidth = 1.5,
rel_min_height = 0.01,
gradient_lwd = 0.3) +
scale_fill_viridis_c(name = "距离 (ft)", option = "C") +
labs(title = "2022-23赛季NBA明星球员出手距离分布",
subtitle = "按位置着色,颜色深度反映距离远近",
x = "出手距离(英尺)",
y = NULL,
caption = "数据来源: NBA Stats | 可视化: ggplot2") +
theme_ridges(font_size = 12) +
theme(legend.position = "right",
plot.title = element_text(face = "bold", size = 16),
plot.subtitle = element_text(color = "grey40"))
after_stat(x)
:用x轴值映射填充色rel_min_height
:控制山脊底部截断gradient_lwd
:颜色渐变边界线宽viridis_c
:色盲友好的颜色梯度# 获取球队数据
df_team <- df_shots %>%
group_by(nameTeam) %>%
filter(n() > 2000) %>%
ggplot(aes(x = shotDistance, y = reorder(nameTeam, shotDistance))) +
geom_density_ridges(fill = "darkblue", alpha = 0.6)
# 按赛季切片
df_shots %>%
mutate(season = as.factor(yearSeason)) %>%
ggplot(aes(x = shotDistance, y = season, height = after_stat(density))) +
geom_density_ridges(stat = "density", trim = TRUE)
通过ggridges
包,我们能够将复杂的分布比较转化为直观的山脊图。本文展示了:
1. 从数据获取到清洗的完整流程
2. 基础山脊图的绘制方法
3. 颜色映射、注释添加等高级技巧
4. 最终出版级图表的优化方案
这种可视化方法不仅适用于篮球数据,任何需要比较多个群体分布的场景(如不同产品的用户年龄分布、各地区的气温分布等)都可以借鉴。
提示:完整代码和数据可在GitHub仓库获取。实际应用中可能需要根据数据特点调整带宽参数和截断阈值。 “`
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。