R语言怎样读入比对好的fasta文件做NJ树并做boostrap检验

发布时间：2021-11-22 15:25:13 作者：柒染
来源：亿速云阅读：540

# R语言怎样读入比对好的fasta文件做NJ树并做boostrap检验

## 1. 准备工作

### 1.1 安装必要R包
首先需要安装并加载以下R包：
```r
install.packages(c("ape", "phangorn", "seqinr"))
library(ape)
library(phangorn)
library(seqinr)

1.2 数据准备

准备已经比对好的fasta格式序列文件（如alignment.fasta），建议使用MUSCLE、MAFFT等工具完成多序列比对。

2. 读取fasta文件

2.1 使用seqinr读取

seqs <- read.fasta("alignment.fasta", seqtype = "DNA")

2.2 转换为phyDat格式

phangorn包需要将数据转换为phyDat对象：

dna <- as.phyDat(seqs, type = "DNA")

3. 构建NJ树

3.1 计算遗传距离

使用ape包的dist.dna函数计算距离矩阵：

dist_matrix <- dist.dna(as.DNAbin(seqs), model = "K80")  # K80模型

3.2 构建NJ树

nj_tree <- nj(dist_matrix)
plot(nj_tree, main = "NJ Tree")

4. Bootstrap检验

4.1 执行bootstrap

使用phangorn包的bootstrap.phyDat函数：

bs <- bootstrap.phyDat(dna, 
                      FUN = function(x) nj(dist.dna(as.DNAbin(x))),
                      bs = 1000)  # 1000次重复

4.2 计算支持率

bs_tree <- plotBS(nj_tree, bs, type = "p")  # p表示在节点显示百分比

4.3 可视化设置

plot(bs_tree, main = "NJ Tree with Bootstrap Support")
nodelabels(bs_tree$node.label, cex = 0.7)  # 调整标签大小

5. 结果输出

5.1 保存树文件

write.tree(bs_tree, "bootstrap_tree.nwk")

5.2 导出PDF

pdf("NJ_tree_with_bootstrap.pdf", width = 10, height = 8)
plot(bs_tree, main = "NJ Tree with Bootstrap Support")
nodelabels(bs_tree$node.label, cex = 0.7)
dev.off()

6. 参数优化建议

距离模型选择：
- "RAW": 原始差异
- "K80": Kimura 2-parameter（默认）
- "TN93": Tamura-Nei模型

Bootstrap次数：

一般建议500-1000次

计算量大时可使用并行计算：


library(parallel)
bs <- bootstrap.phyDat(dna, FUN = function(x) nj(dist.dna(as.DNAbin(x))),
                   bs = 1000, multicore = TRUE, mc.cores = 4)

7. 常见问题解决

内存不足：
- 减少bootstrap重复次数
- 使用gc()手动清理内存

节点标签重叠：

plot(bs_tree, show.node.label = FALSE)
tiplabels(bs_tree$node.label, adj = c(1.2, -0.2), frame = "n", cex = 0.6)

树的美化：

library(ggtree)
ggtree(bs_tree) + 
 geom_tiplab() + 
 geom_nodelab(aes(label = label), hjust = -0.3) +
 theme_tree2()

8. 完整代码示例

# 加载包
library(ape)
library(phangorn)
library(seqinr)

# 读取数据
seqs <- read.fasta("alignment.fasta", seqtype = "DNA")
dna <- as.phyDat(seqs, type = "DNA")

# 构建NJ树
dist_matrix <- dist.dna(as.DNAbin(seqs), model = "K80")
nj_tree <- nj(dist_matrix)

# Bootstrap检验
bs <- bootstrap.phyDat(dna, FUN = function(x) nj(dist.dna(as.DNAbin(x))), bs = 1000)
bs_tree <- plotBS(nj_tree, bs, type = "p")

# 可视化
plot(bs_tree, main = "NJ Tree with Bootstrap Support")
nodelabels(bs_tree$node.label, cex = 0.7)

# 保存结果
write.tree(bs_tree, "final_tree.nwk")

通过以上步骤，您可以在R中完成从比对好的fasta文件到带bootstrap支持的NJ树构建全过程。根据实际数据特点，可调整距离模型和bootstrap参数以获得更可靠的结果。 “`