您好,登录后才能下订单哦!
密码登录
登录注册
点击 登录注册 即表示同意《亿速云用户服务条款》
# R语言ggplot2绘制热图展示GO富集分析结果的是怎样的
## 摘要
基因本体论(GO)富集分析是生物信息学中解读高通量数据的核心方法。本文详细介绍如何使用R语言中的ggplot2包将GO富集结果转化为直观的热图可视化,包括数据预处理、图形定制和结果解读的全流程。通过完整的代码示例和参数解析,帮助研究者掌握专业级GO热图的绘制技巧。
## 1. GO富集分析与可视化概述
### 1.1 GO富集分析原理
基因本体论(Gene Ontology, GO)通过三个层次描述基因功能:
- 分子功能(Molecular Function)
- 生物过程(Biological Process)
- 细胞组分(Cellular Component)
富集分析通过统计检验识别在差异表达基因中显著过表征的GO term,常用方法包括:
- 超几何检验
- Fisher精确检验
- GSEA算法
### 1.2 可视化需求
原始富集结果通常包含:
- Term名称
- P值/q值
- 富集因子
- 基因数量
热图通过颜色和大小双重编码可同时展示:
- 显著性水平(-log10(p-value))
- 富集程度(基因比例)
- 术语间层次关系
## 2. 数据准备与预处理
### 2.1 示例数据加载
```r
# 模拟GO富集结果
go_terms <- data.frame(
ID = c("GO:0008152", "GO:0009987", "GO:0002376",
"GO:0006955", "GO:0006950"),
Description = c("metabolic process", "cellular process",
"immune system", "immune response",
"response to stress"),
GeneRatio = c(120/1000, 85/1000, 45/1000, 30/1000, 25/1000),
BgRatio = c(500/10000, 600/10000, 200/10000, 150/10000, 100/10000),
pvalue = c(1e-12, 1e-8, 1e-5, 0.001, 0.01),
p.adjust = c(1e-10, 1e-6, 1e-4, 0.0005, 0.005),
qvalue = c(1e-10, 1e-6, 1e-4, 0.0004, 0.004),
Count = c(120, 85, 45, 30, 25),
Category = c("BP", "BP", "BP", "BP", "BP")
)
library(dplyr)
plot_data <- go_terms %>%
mutate(
log_p = -log10(pvalue), # 转换p值
GeneRatio_num = sapply(strsplit(as.character(GeneRatio), "/"),
function(x) as.numeric(x[1])/as.numeric(x[2])),
Description = factor(Description, levels = rev(unique(Description)))
library(ggplot2)
ggplot(plot_data, aes(x = Category, y = Description)) +
geom_tile(aes(fill = log_p), color = "white") +
scale_fill_gradient(low = "blue", high = "red") +
theme_minimal()
geom_tile()
: 创建热图矩阵aes(fill)
: 颜色映射变量color
: 格子边框颜色scale_fill_gradient()
: 连续颜色标度ggplot(plot_data, aes(x = Category, y = Description)) +
geom_point(aes(size = Count, color = log_p)) +
scale_color_gradientn(colors = c("blue", "yellow", "red")) +
scale_size(range = c(3, 10)) +
theme_bw(base_size = 12) +
labs(x = "", y = "",
color = "-log10(p-value)",
size = "Gene Count")
# 当有多个比较组时
plot_data$Group <- rep(c("Treatment", "Control"), each = 3)[1:5]
ggplot(plot_data, aes(x = Group, y = Description)) +
geom_tile(aes(fill = log_p)) +
facet_grid(. ~ Category, scales = "free") +
scale_fill_viridis_c(option = "magma")
ggplot(plot_data, aes(x = Category, y = Description)) +
geom_tile(aes(fill = log_p), alpha = 0.8) +
geom_text(aes(label = sprintf("%.1f", log_p)),
color = "white", size = 3) +
scale_fill_distiller(palette = "Spectral") +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5))
场景 | 推荐配色 |
---|---|
单连续变量 | viridis, magma, inferno |
发散型数据 | RdBu, PiYG, PRGn |
分类数据 | Set1, Paired, Dark2 |
my_palette <- colorRampPalette(c("#2E86AB", "#F24236"))(10)
ggplot(plot_data) +
geom_tile(aes(x = Category, y = Description, fill = log_p)) +
scale_fill_gradientn(colors = my_palette)
ggsave("GO_heatmap.pdf",
width = 10, height = 6,
dpi = 300, device = cairo_pdf)
# 高分辨率TIFF格式
ggsave("GO_heatmap.tiff",
compression = "lzw",
units = "in", width = 8, height = 5)
library(ggplot2)
library(dplyr)
# 数据准备
data <- clusterProfiler::enrichGO(...) %>%
as.data.frame() %>%
filter(p.adjust < 0.05) %>%
arrange(pvalue) %>%
head(20) %>%
mutate(
log_p = -log10(p.adjust),
Description = stringr::str_wrap(Description, width = 40))
# 高级热图
ggplot(data, aes(x = GeneRatio_num, y = reorder(Description, log_p))) +
geom_point(aes(size = Count, color = log_p)) +
scale_color_gradientn(
colors = rev(RColorBrewer::brewer.pal(11, "Spectral")),
limits = c(0, max(data$log_p))) +
scale_size_continuous(range = c(3, 8)) +
facet_grid(ONTOLOGY ~ ., scales = "free", space = "free") +
labs(x = "Gene Ratio", y = "",
color = "-log10(adj.p)",
size = "Gene Count",
title = "GO Enrichment Analysis") +
theme_classic(base_size = 12) +
theme(
strip.background = element_rect(fill = "grey90"),
panel.spacing = unit(0.2, "lines"),
axis.text.y = element_text(lineheight = 0.8))
plot_data %>%
mutate(Description = stringr::str_wrap(Description, width = 30)) %>%
ggplot(aes(...)) + ...
theme(axis.text.y = element_text(size = 8))
scale_y_discrete(labels = function(x) substr(x, 1, 20))
scale_fill_gradient(na.value = "gray90")
bind_rows(
mutate(go_data, Type = "GO"),
mutate(kegg_data, Type = "KEGG")) %>%
ggplot(aes(x = Type, ...)) + ...
## 参考文献
1. Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer, 2016.
2. Yu G. et al. clusterProfiler: an R package for comparing biological themes. Bioinformatics, 2012.
3. RStudio ggplot2 Cheat Sheet
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。