Ubuntu上Rust项目的性能调优路线图
一 建立可度量的基准
[dev-dependencies]
criterion = "0.3"
use criterion::{criterion_group, criterion_main, Criterion};
fn fibonacci_slow(n: u64) -> u64 {
match n {
0 => 1,
1 => 1,
n => fibonacci_slow(n - 1) + fibonacci_slow(n - 2),
}
}
fn fibonacci_fast(n: u64) -> u64 {
let mut a = 1u64;
let mut b = 1u64;
for _ in 2..=n {
let tmp = a + b;
a = b;
b = tmp;
}
b
}
fn bench_fib_slow(c: &mut Criterion) {
c.bench_function("fib 20 slow", |b| b.iter(|| fibonacci_slow(black_box(20))));
}
fn bench_fib_fast(c: &mut Criterion) {
c.bench_function("fib 20 fast", |b| b.iter(|| fibonacci_fast(black_box(20))));
}
criterion_group!(benches, bench_fib_slow, bench_fib_fast);
criterion_main!(benches);
cargo bench。基准测试能给出可复现的纳秒级数据与分布信息,是后续所有优化的验收标准。二 编译期优化
cargo build --release 默认启用 opt-level=3,远优于 debug 模式。[profile.release]
opt-level = 3 # 最高级别优化
lto = "thin" # 跨 crate 优化;"fat" 更强但更慢
codegen-units = 1 # 提升跨函数优化,代价是编译更久
panic = "abort" # 去掉栈展开,减少开销
strip = "symbols" # 减小二进制体积
RUSTFLAGS="-C target-cpu=native" cargo build --release[profile.release-with-debug]
inherits = "release"
debug = true
strip = "none"
cargo build --profile release-with-debug,随后可做性能剖析与火焰图分析。三 运行时与内存优化
Vec::with_capacity(N)、HashMap::with_capacity(N);热点路径尽量复用缓冲区。&T/&mut T,减少 clone() 与 Arc 在高频路径的使用。[T; N]、ArrayString、SmallVec 等减少堆分配与缓存未命中。Cow<'a, str> 等仅在必要时分配。#[repr(C)] 控制布局。tokio::sync::Mutex 替代 std::sync::Mutex,避免阻塞运行时线程。BufReader/BufWriter);批量处理减少系统调用与分配次数。with_capacity,避免多次扩容与拷贝。四 并发与异步调优
let s: i32 = numbers.par_iter().sum();join!/try_join!)。Semaphore)避免过度任务拆分与调度开销。ArcSwap、DashMap 等高性能并发容器。五 性能剖析与系统调优
sudo apt install linux-perf
cargo install flamegraph
sudo perf record -g ./target/release/your_app
cargo flamegraph --bin your_app
ulimit -n 100000
sudo sysctl -w net.core.somaxconn=65535
sudo sysctl -w net.ipv4.tcp_max_syn_backlog=4096