您好,登录后才能下订单哦!
密码登录
登录注册
点击 登录注册 即表示同意《亿速云用户服务条款》
# R语言中miRNA与靶基因相关性作图的示例分析
## 摘要
本文通过一个完整的分析流程,展示如何使用R语言进行miRNA与靶基因相关性分析及可视化。内容包括数据获取、预处理、相关性计算、统计检验和图形绘制,并提供可复用的代码示例。
---
## 1. 引言
miRNA作为重要的基因表达调控分子,其与靶基因的相互作用分析是功能基因组学研究的关键环节。R语言凭借其强大的统计分析和可视化能力,成为此类分析的理想工具。本文将基于`TCGA`数据库的乳腺癌数据集(BRCA),演示完整的分析流程。
---
## 2. 数据准备
### 2.1 数据下载与加载
```r
# 安装必要的包
if (!require("TCGAbiolinks")) install.packages("TCGAbiolinks")
if (!require("ggplot2")) install.packages("ggplot2")
library(TCGAbiolinks)
library(ggplot2)
# 下载TCGA-BRCA的miRNA和mRNA数据
query_mirna <- GDCquery(
project = "TCGA-BRCA",
data.category = "Transcriptome Profiling",
data.type = "miRNA Expression Quantification"
)
query_mrna <- GDCquery(
project = "TCGA-BRCA",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "STAR - Counts"
)
GDCdownload(query_mirna)
GDCdownload(query_mrna)
mirna_data <- GDCprepare(query_mirna)
mrna_data <- GDCprepare(query_mrna)
# 提取表达矩阵
mirna_expr <- assay(mirna_data, "miRNA_expression")
mrna_expr <- assay(mrna_data, "unstranded")
# 过滤低表达基因
keep_mirna <- rowSums(mirna_expr > 1) >= 10
keep_mrna <- rowSums(mrna_expr > 1) >= 10
mirna_expr <- mirna_expr[keep_mirna, ]
mrna_expr <- mrna_expr[keep_mrna, ]
# 对数转换
mirna_expr <- log2(mirna_expr + 1)
mrna_expr <- log2(mrna_expr + 1)
# 选择目标miRNA(示例:hsa-mir-21)
target_mirna <- "hsa-mir-21"
mirna_vector <- as.numeric(mirna_expr[target_mirna, ])
# 计算与所有mRNA的Pearson相关性
cor_results <- apply(mrna_expr, 1, function(x) {
cor.test(mirna_vector, as.numeric(x), method = "pearson")
})
# 提取结果数据框
cor_df <- data.frame(
gene = names(cor_results),
cor = sapply(cor_results, function(x) x$estimate),
pvalue = sapply(cor_results, function(x) x$p.value)
)
# 添加FDR校正
cor_df$fdr <- p.adjust(cor_df$pvalue, method = "fdr")
# 设置阈值
sig_threshold <- 0.05
cor_threshold <- 0.3
sig_genes <- cor_df[
cor_df$fdr < sig_threshold & abs(cor_df$cor) > cor_threshold,
]
# 选择top靶基因示例
top_gene <- "TP53"
gene_expr <- as.numeric(mrna_expr[top_gene, ])
ggplot(data.frame(miRNA = mirna_vector, Gene = gene_expr),
aes(x = miRNA, y = Gene)) +
geom_point(alpha = 0.6, color = "#2c7bb6") +
geom_smooth(method = "lm", color = "#d7191c", se = FALSE) +
labs(
title = paste(target_mirna, "vs", top_gene, "Expression Correlation"),
x = paste(target_mirna, "log2(CPM)"),
y = paste(top_gene, "log2(CPM)")
) +
theme_minimal() +
annotate("text", x = Inf, y = Inf,
label = paste("r =", round(cor_df[top_gene, "cor"], 2),
"\nFDR =", format(cor_df[top_gene, "fdr"], scientific = TRUE)),
hjust = 1.1, vjust = 1.1)
# 安装额外包
if (!require("pheatmap")) install.packages("pheatmap")
library(pheatmap)
# 准备数据(top 50基因)
top50_genes <- rownames(sig_genes[order(abs(sig_genes$cor), decreasing = TRUE), ][1:50])
plot_data <- t(scale(t(mrna_expr[top50_genes, ])))
pheatmap(
plot_data,
clustering_method = "complete",
show_rownames = TRUE,
show_colnames = FALSE,
color = colorRampPalette(c("blue", "white", "red"))(50),
main = paste("Expression Heatmap of", target_mirna, "Target Genes")
)
# 安装网络可视化包
if (!require("igraph")) install.packages("igraph")
library(igraph)
# 构建网络数据
net_data <- data.frame(
from = rep(target_mirna, nrow(sig_genes)),
to = rownames(sig_genes),
weight = abs(sig_genes$cor)
)
# 创建网络图
g <- graph_from_data_frame(net_data, directed = FALSE)
# 设置可视化参数
V(g)$size <- ifelse(V(g)$name == target_mirna, 15, 8)
V(g)$color <- ifelse(V(g)$name == target_mirna, "#ff7f00", "#1f78b4")
E(g)$width <- E(g)$weight * 3
plot(g,
layout = layout_with_fr,
vertex.label.cex = 0.7,
main = paste(target_mirna, "Regulatory Network"))
if (!require("clusterProfiler")) BiocManager::install("clusterProfiler")
library(clusterProfiler)
# 转换基因ID
entrez_ids <- mapIds(org.Hs.eg.db,
keys = rownames(sig_genes),
column = "ENTREZID",
keytype = "SYMBOL")
# KEGG富集分析
kegg_res <- enrichKEGG(
gene = na.omit(entrez_ids),
organism = "hsa",
pvalueCutoff = 0.05
)
dotplot(kegg_res, showCategory = 15) +
ggtitle("KEGG Pathway Enrichment of Target Genes")
# 获取临床数据
clinical <- GDCquery_clinic("TCGA-BRCA", "clinical")
# 合并表达与生存数据
surv_data <- data.frame(
patient = substr(colnames(mirna_expr), 1, 12),
mirna = as.numeric(mirna_expr[target_mirna, ]),
status = clinical$vital_status,
time = clinical$days_to_last_follow_up
)
# 进行KM分析
library(survival)
fit <- survfit(Surv(time, status == "Dead") ~ (mirna > median(mirna)),
data = surv_data)
ggsurvplot(fit,
data = surv_data,
pval = TRUE,
risk.table = TRUE,
title = paste(target_mirna, "Expression Survival Analysis"))
本文展示了R语言中miRNA-靶基因相关性分析的完整流程,涵盖从数据获取到高级功能分析的全过程。所有代码均可直接修改用于其他数据集分析。
”`
注:实际运行时需要根据具体数据调整参数,部分代码可能需要较长时间执行(特别是数据下载和预处理步骤)。建议在服务器环境下运行大规模分析。
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。