以下是一份Ubuntu上PyTorch网络编程的简要教程:
首先安装Python和PyTorch,可使用pip或conda安装。以pip为例:
sudo apt update
sudo apt install python3 python3-pip
pip3 install torch torchvision torchaudio
若需GPU支持,安装CUDA和对应版本的PyTorch。
采用模块化设计,将网络不同部分封装成独立类或函数。可参考ResNet等经典架构,根据数据特性和任务需求设计。例如定义一个简单的全连接网络:
import torch.nn as nn
import torch.nn.functional as F
class SimpleNet(nn.Module):
def __init__(self):
super(SimpleNet, self).__init__()
self.fc1 = nn.Linear(784, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = x.view(-1, 784)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
编写训练和验证循环,使用交叉熵损失函数和Adam优化器:
import torch.optim as optim
model = SimpleNet()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# 假设已有train_loader和val_loader
for epoch in range(num_epochs):
model.train()
for inputs, labels in train_loader:
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
model.eval()
with torch.no_grad():
total = 0
correct = 0
for inputs, labels in val_loader:
outputs = model(inputs)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print(f'Epoch [{epoch+1}/{num_epochs}], Accuracy: {100 * correct / total:.2f}%')
若需分布式训练,可使用torch.distributed
模块。安装NCCL库(用于GPU通信),配置环境变量后编写分布式训练代码,使用mpirun
或torch.distributed.launch
启动训练。
使用torch.save
和torch.load
函数保存和加载模型:
# 保存模型
torch.save(model.state_dict(), 'model.pth')
# 加载模型
model = SimpleNet()
model.load_state_dict(torch.load('model.pth'))