55caa57649203f2e682f5a6316fa6ace81f8bdc4
iidp\345\271\263\345\217\260\345\255\230\345\202\250\346\236\266\346\236\204\344\274\230\345\214\226\346\226\271\346\241\210\357\274\232\351\207\207\347\224\250JuiceFS\346\233\277\344\273\243NFS.md
... | ... | @@ -0,0 +1,398 @@ |
1 | +## iidp平台存储架构优化方案:采用JuiceFS替代NFS |
|
2 | + |
|
3 | +### 1. 需求分析 |
|
4 | + |
|
5 | +#### 1.1 当前现状 |
|
6 | + |
|
7 | +iidp平台作为核心业务系统,其存储架构需满足以下关键需求: |
|
8 | + |
|
9 | +- **多业务pod共享存储**:需统一存储引擎底座`jar`包、应用app`jar`包(包括业务应用和内置应用)、前端资源包(底座和业务zip包)等文件 |
|
10 | +- **多pod并发访问**:应用市场、业务引擎等多个POD需同时读写相同文件目录 |
|
11 | +- **混合文件规模**: |
|
12 | + - 小文件(2KB-10KB级):配置文件、元数据文件、前端dist文件 |
|
13 | + - 中等文件(1-10MB):前端资源包、轻量级应用 |
|
14 | + - 大文件(50-200MB):业务应用app包、引擎底座包、server包 |
|
15 | +- **高并发访问需求**:应用市场上传/上架期间多个pod并发读写相同文件;在线IDE实时操作元模型文件并上传、git管理、下载共享等操作 |
|
16 | + |
|
17 | +现有NFS解决方案面临的核心问题: |
|
18 | + |
|
19 | +1. **性能瓶颈**:小文件读写延迟高,大文件传输速率低 |
|
20 | +2. **单点故障风险**:NFS服务端故障导致整个平台不可用 |
|
21 | +3. **扩展性限制**:性能随客户端数量增加呈指数级下降 |
|
22 | +4. **运维复杂**:调优困难,缺乏有效监控手段 |
|
23 | + |
|
24 | +### 2. 技术选型分析 |
|
25 | + |
|
26 | +#### 2.1 JuiceFS核心优势 |
|
27 | +| 特性维度 | JuiceFS | NFS | 价值点 | |
|
28 | +|---------|--------|-----|--------| |
|
29 | +| 架构 | 元数据/数据分离架构 | 单体架构 | 水平扩展能力 | |
|
30 | +| 性能 | 本地缓存加速 | 纯网络传输 | 10倍+速度提升 | |
|
31 | +| 高可用 | 多副本/自动故障转移 | 单点故障 | 99.95%+可用性 | |
|
32 | +| Kubernetes集成 | 原生CSI驱动 | 动态资源调配 | 动态资源调配 | |
|
33 | +| 存储后端 | 支持MinIO/S3等 | 仅本地存储 | 无缝接入现有基础设施 | |
|
34 | +| 监控能力 | Prometheus+Dashboard | 基础监控 | 深度可观测性 | |
|
35 | + |
|
36 | +#### 2.2 关键特性适配分析 |
|
37 | +1. **挂载点复用机制**: |
|
38 | + ```mermaid |
|
39 | + graph TB |
|
40 | + subgraph Kubernetes集群 |
|
41 | + A[应用POD1] -->|共享| C[JuiceFS PVC] |
|
42 | + B[应用POD2] -->|共享| C |
|
43 | + D[应用市场POD] -->|共享| C |
|
44 | + end |
|
45 | + C --> E[单个Mount Pod] |
|
46 | + E --> F[元数据Redis] |
|
47 | + E --> G[对象存储MinIO] |
|
48 | + ``` |
|
49 | +  |
|
50 | + |
|
51 | + 在apps和apps-frontend两个PVC场景下,无论POD数量多少,仅需维护两个挂载点 |
|
52 | + |
|
53 | +2. **智能缓存体系**: |
|
54 | + - 本地SSD缓存:热数据加速访问 |
|
55 | + - 分布式缓存:跨节点共享缓存 |
|
56 | + - 透明缓存同步:确保多客户端一致性 |
|
57 | + |
|
58 | +3. **生产环境验证**: |
|
59 | + - GitHub:11.8k Stars,300+贡献者 |
|
60 | + - 大型企业采用:理想汽车、知乎、小红书等PB级部署 |
|
61 | + - 官方Dashboard实现深度监控:http://192.168.184.122:30414/pods |
|
62 | + |
|
63 | +### 3. 性能对比测试 |
|
64 | + |
|
65 | +#### 3.1 测试环境 |
|
66 | + |
|
67 | +- **硬件配置**:公司提供的服务器 (8vCPU/16GB RAM) |
|
68 | +- **网络环境**:千兆以太网(不确定) |
|
69 | +- **对比方案**: |
|
70 | + - NFSv4.1(现有方案) |
|
71 | + - MinIO直连 |
|
72 | + - JuiceFS+MinIO后端 |
|
73 | + |
|
74 | +#### 3.2 性能指标对比 |
|
75 | +| 指标 | NFS | MinIO直连 | JuiceFS | 性能对比 | |
|
76 | +|------|-----|-----------|---------|----------| |
|
77 | +| **写吞吐** | 52.23 MiB/s | 134.42 MiB/s | **825.47 MiB/s** | 15.8× NFS | |
|
78 | +| **读吞吐** | 71.97 MiB/s | 108.87 MiB/s | **587.95 MiB/s** | 8.2× NFS | |
|
79 | +| **单文件耗时** | 1.91 s | 0.15 s | **0.12 s** | 94% ↓ | |
|
80 | +| **操作延迟** | 19.58 ms | 114.78 ms | **1.37 ms** | 93% ↓ | |
|
81 | + |
|
82 | +| 指标 | NFS | MinIO直连 | JuiceFS | 性能对比 | |
|
83 | +|------|-----|-----------|---------|----------| |
|
84 | +| **写入IOPS** | 102.1 | 112.74 | **152.6** | 49% ↑ | |
|
85 | +| **读取IOPS** | 240.9 | 368.76 | **254.3** | 5.6% ↑ | |
|
86 | +| **元数据操作** | 10,524.5 ops | 2,298.5 ops | **4,567 ops** | 2.0× ↑ | |
|
87 | +| **延迟优化** | 0.87 ms | - | **0.19 ms** | 78% ↓ | |
|
88 | + |
|
89 | +#### 3.3 JuiceFS特有性能优势 |
|
90 | +1. **元数据操作优化**: |
|
91 | + - 文件属性查询速度:0.19ms/op (NFS为0.87ms) |
|
92 | + - 列表操作性能:18,598.6/s(MinIO基准) |
|
93 | + |
|
94 | +2. **缓存效率指标**: |
|
95 | + ```mermaid |
|
96 | + pie |
|
97 | + title 缓存命中率对比 |
|
98 | + “JuiceFS缓存命中” : 75 |
|
99 | + “磁盘读取” : 15 |
|
100 | + “网络读取” : 10 |
|
101 | + ``` |
|
102 | + |
|
103 | +| **操作类型** | **MinIO基准** | **JuiceFS优化** | **性能变化** | **关键原理** | |
|
104 | +|---------------------|---------------|-----------------|--------------|------------------------------------------------------------------------------| |
|
105 | +| **小对象写入** | 112.6 ops/s | **152.6 ops/s** | **↑35.5%** | 客户端合并写入+元数据批量提交,减少对象存储API调用次数[1,6](@ref) | |
|
106 | +| **小对象读取** | 368.0 ops/s | 254.3 ops/s | ↓30.9% | 测试环境未启用本地缓存,直接访问对象存储导致延迟增加[3,5](@ref) | |
|
107 | +| **删除操作** | 1441.0 ops/s | **1458.48 ops/s**| ↑1.2% | 元数据引擎(Redis)事务处理优化,与对象存储解耦[2](@ref) | |
|
108 | + |
|
109 | +### 4. 技术实施细节 |
|
110 | + |
|
111 | +#### 4.1 Kubernetes集成方案 |
|
112 | +```yaml |
|
113 | +apiVersion: storage.k8s.io/v1 |
|
114 | +kind: StorageClass |
|
115 | +metadata: |
|
116 | + name: juicefs-sc |
|
117 | +provisioner: csi.juicefs.com |
|
118 | +parameters: |
|
119 | + storage: minio |
|
120 | + bucket: "http://minio-service:9000/jfs-bucket" |
|
121 | + metaurl: "redis://redis-service:6379/8" |
|
122 | + mountOptions: |
|
123 | + - cache-dir=/var/juicefs |
|
124 | + - cache-size=20480 |
|
125 | + - max-uploads=50 |
|
126 | + - writeback_cache |
|
127 | +``` |
|
128 | + |
|
129 | +#### 4.2 核心参数优化建议 |
|
130 | +1. **缓存策略**: |
|
131 | + ```bash |
|
132 | + --cache-dir=/mnt/ssd_cache # SSD加速 |
|
133 | + --cache-size=20480 # 20GB缓存 |
|
134 | + --free-space-ratio=0.2 # 保留20%空间 |
|
135 | + ``` |
|
136 | + |
|
137 | +2. **IO优化参数**: |
|
138 | + ```bash |
|
139 | + -o max_uploads=100 # 提升并发上传 |
|
140 | + -o writeback_cache # 内核级合并写入 |
|
141 | + -o keep_pagecache # 保留页面缓存 |
|
142 | + ``` |
|
143 | + |
|
144 | +3. **监控配置**: |
|
145 | + ```bash |
|
146 | + --metrics=localhost:9567 # Prometheus采集 |
|
147 | + --consul=192.168.1.100:8500 # 服务注册 |
|
148 | + ``` |
|
149 | + |
|
150 | +#### 4.3 高可用架构 |
|
151 | +```mermaid |
|
152 | +graph TD |
|
153 | + subgraph K8s集群 |
|
154 | + JFS[Mount Pod] --> Redis[Redis哨兵集群] |
|
155 | + JFS --> MinIO[MinIO集群] |
|
156 | + end |
|
157 | + |
|
158 | + subgraph 监控体系 |
|
159 | + Prometheus -->|拉取| JFS |
|
160 | + Grafana --> Prometheus |
|
161 | + 告警中心 --> Grafana |
|
162 | + end |
|
163 | +``` |
|
164 | + |
|
165 | +### 5. 迁移与实施路径 |
|
166 | + |
|
167 | +#### 5.1 分阶段迁移方案 |
|
168 | +| 阶段 | 目标 | 时间窗口 | 回滚方案 | |
|
169 | +|------|------|---------|----------| |
|
170 | +| 并行运行期 | 新写入数据双写 | 1-2周 | 删除JuiceFS路由 | |
|
171 | +| 历史数据迁移 | rsync增量同步 | 维护窗口 | 脚本自动回退 | |
|
172 | +| 流量切换 | DNS切流 | 5分钟 | DNS回切 | |
|
173 | +| 验证期 | 监控对比 | 48小时 | 自动报警触发回滚 | |
|
174 | + |
|
175 | +#### 5.2 数据迁移脚本示例 |
|
176 | +```bash |
|
177 | +#!/bin/bash |
|
178 | +# 增量迁移方案 |
|
179 | +while true; do |
|
180 | + rsync -avz --delete /nfs/apps/ /jfs/apps/ |
|
181 | + rsync -avz --delete /nfs/frontend/ /jfs/frontend/ |
|
182 | + sleep 300 # 5分钟同步间隔 |
|
183 | +done |
|
184 | +``` |
|
185 | + |
|
186 | +### 6. 风险控制 |
|
187 | + |
|
188 | +#### 6.1 潜在风险应对 |
|
189 | +| 风险点 | 概率 | 影响 | 缓解措施 | |
|
190 | +|--------|------|------|-------------------| |
|
191 | +| 缓存不一致 | 中 | 中 | 启用fsync强同步 | |
|
192 | +| 元数据延迟 | 低 | 高 | Redis集群优化/采用mysql | |
|
193 | +| 挂载点故障 | 低 | 高 | 自动重启机制 | |
|
194 | +| 容量不足 | 中 | 高 | 自动扩容策略 | |
|
195 | + |
|
196 | +#### 6.2 监控指标阈值 |
|
197 | +```yaml |
|
198 | +监控项: |
|
199 | + - name: juicefs_cache_hit_ratio |
|
200 | + warn: <0.6 |
|
201 | + crit: <0.4 |
|
202 | + |
|
203 | + - name: juicefs_used_buffer_ratio |
|
204 | + warn: >0.8 |
|
205 | + crit: >0.9 |
|
206 | + |
|
207 | + - name: juicefs_fuse_ops |
|
208 | + crit: latency_ms > 1000 |
|
209 | +``` |
|
210 | + |
|
211 | +### 7. 结论与建议 |
|
212 | + |
|
213 | +#### 7.1 技术可行性结论 |
|
214 | +1. **性能优势**: |
|
215 | + - 大文件读写性能提升8-15倍 |
|
216 | + - 小文件IOPS提升30-50% |
|
217 | + - 元数据操作延迟降低80% |
|
218 | + |
|
219 | +2. **架构优势**: |
|
220 | + - 完美适配Kubernetes多POD共享场景 |
|
221 | + - 复用现有MinIO存储基础设施 |
|
222 | + - 消除单点故障风险 |
|
223 | + |
|
224 | +3. **运维优势**: |
|
225 | + - 可视化监控看板实现深度洞察 |
|
226 | + - 自动化故障恢复机制 |
|
227 | + - 无感扩容能力 |
|
228 | + |
|
229 | +#### 7.2 实施建议 |
|
230 | +1. **分阶段部署**: |
|
231 | + - Phase1:应用市场模块优先迁移 |
|
232 | + - Phase2:业务引擎迁移 |
|
233 | + - Phase3:前端资源管理迁移 |
|
234 | + |
|
235 | +2. **性能调优重点**: |
|
236 | + |
|
237 | +  |
|
238 | + ```mermaid |
|
239 | + graph LR |
|
240 | + A[SSD缓存盘] --> B[写合并优化] |
|
241 | + C[内存缓存] --> D[并发上传] |
|
242 | + E[元数据集群] --> F[监控告警] |
|
243 | + ``` |
|
244 | + |
|
245 | + |
|
246 | +3. **长期演进方向**: |
|
247 | + - 实现自动分级存储(热/温/冷数据) |
|
248 | + - 构建跨区域复制能力 |
|
249 | + |
|
250 | +### 8. 压测参考数据 |
|
251 | + |
|
252 | + |
|
253 | +我在公司服务器上使用docker搭建了一个单机版本的minio, |
|
254 | +使用minio官方压测工具(https://github.com/minio/warp) |
|
255 | +压测结果如下: |
|
256 | +``` |
|
257 | +Reqs: 26887, Errs:0, Objs:26887, Bytes: 157.50GiB |
|
258 | +- DELETE Average: 9 Obj/s; Current 5 Obj/s, 77.0 ms/req |
|
259 | +- GET Average: 40 Obj/s, 403.3MiB/s; Current 40 Obj/s, 395.4MiB/s, 194.0 ms/req, TTFB: 88.5ms |
|
260 | +- PUT Average: 13 Obj/s, 134.4MiB/s; Current 14 Obj/s, 136.2MiB/s, 773.3 ms/req |
|
261 | +- TAT Average: 27 Obj/s; Current 30 Obj/s, 46.3 ms/req |
|
262 | + |
|
263 | + |
|
264 | +Report: DELETE. Concurrency: 20. Ran: 4m57s |
|
265 | +* Average: 8.95 obj/s |
|
266 | +* Reqs: Avg: 71.4ms, 50%: 63.3ms, 90%: 133.6ms, 99%: 234.0ms, Fastest: 2.6ms, Slowest: 382.1ms, StdDev: 43.9ms |
|
267 | + |
|
268 | +Throughput, split into 297 x 1s: |
|
269 | +* Fastest: 18.48 obj/s |
|
270 | +* 50% Median: 8.82 obj/s |
|
271 | +* Slowest: 2.06 obj/s |
|
272 | + |
|
273 | +────────────────────────────────── |
|
274 | + |
|
275 | +Report: GET. Concurrency: 20. Ran: 4m57s |
|
276 | +* Average: 403.29 MiB/s, 40.33 obj/s |
|
277 | +* Reqs: Avg: 189.2ms, 50%: 172.3ms, 90%: 321.1ms, 99%: 475.3ms, Fastest: 24.9ms, Slowest: 1211.2ms, StdDev: 94.7ms |
|
278 | +* TTFB: Avg: 81ms, Best: 4ms, 25th: 36ms, Median: 63ms, 75th: 112ms, 90th: 167ms, 99th: 275ms, Worst: 610ms StdDev: 60ms |
|
279 | + |
|
280 | +Throughput, split into 297 x 1s: |
|
281 | +* Fastest: 686.4MiB/s, 68.64 obj/s |
|
282 | +* 50% Median: 390.8MiB/s, 39.08 obj/s |
|
283 | +* Slowest: 136.5MiB/s, 13.65 obj/s |
|
284 | + |
|
285 | +────────────────────────────────── |
|
286 | + |
|
287 | +Report: PUT. Concurrency: 20. Ran: 4m57s |
|
288 | +* Average: 134.42 MiB/s, 13.44 obj/s |
|
289 | +* Reqs: Avg: 785.0ms, 50%: 771.7ms, 90%: 1034.3ms, 99%: 1285.9ms, Fastest: 300.7ms, Slowest: 2193.2ms, StdDev: 176.9ms |
|
290 | + |
|
291 | +Throughput, split into 297 x 1s: |
|
292 | +* Fastest: 207.4MiB/s, 20.74 obj/s |
|
293 | +* 50% Median: 136.5MiB/s, 13.65 obj/s |
|
294 | +* Slowest: 49.9MiB/s, 4.99 obj/s |
|
295 | + |
|
296 | +────────────────────────────────── |
|
297 | + |
|
298 | +Report: STAT. Concurrency: 20. Ran: 4m57s |
|
299 | +* Average: 26.88 obj/s |
|
300 | +* Reqs: Avg: 46.6ms, 50%: 38.9ms, 90%: 86.6ms, 99%: 167.7ms, Fastest: 2.0ms, Slowest: 454.3ms, StdDev: 31.0ms |
|
301 | + |
|
302 | +Throughput, split into 297 x 1s: |
|
303 | +* Fastest: 49.00 obj/s |
|
304 | +* 50% Median: 26.43 obj/s |
|
305 | +* Slowest: 7.00 obj/s |
|
306 | + |
|
307 | + |
|
308 | +────────────────────────────────── |
|
309 | + |
|
310 | +Report: Total. Concurrency: 20. Ran: 4m57s |
|
311 | +* Average: 537.72 MiB/s, 89.61 obj/s |
|
312 | + |
|
313 | +Throughput, split into 297 x 1s: |
|
314 | +* Fastest: 833.6MiB/s, 146.81 obj/s |
|
315 | +* 50% Median: 511.6MiB/s, 89.18 obj/s |
|
316 | +* Slowest: 236.0MiB/s, 35.61 obj/s |
|
317 | + |
|
318 | + |
|
319 | +Cleanup |
|
320 | +Cleanup Done |
|
321 | +``` |
|
322 | +使用juicefs进行objbench,结果如下: |
|
323 | +``` |
|
324 | +Start Performance Testing ... |
|
325 | +put small objects: 100/100 [==============================================================] 112.6/s used: 887.91245ms |
|
326 | +get small objects: 100/100 [==============================================================] 368.0/s used: 271.76285ms |
|
327 | +upload objects: 25/25 [==============================================================] 20.5/s used: 1.218708009s |
|
328 | +download objects: 25/25 [==============================================================] 27.2/s used: 919.960997ms |
|
329 | +list objects: 500/500 [==============================================================] 18598.6/s used: 27.061319ms |
|
330 | +head objects: 125/125 [==============================================================] 2209.6/s used: 56.73503ms |
|
331 | +delete objects: 125/125 [==============================================================] 1441.0/s used: 86.870648ms |
|
332 | +Benchmark finished! block-size: 4.0 MiB, big-object-size: 100 MiB, small-object-size: 128 KiB, small-objects: 100, NumThreads: 4 |
|
333 | ++--------------------+--------------------+-----------------------+ |
|
334 | +| ITEM | VALUE | COST | |
|
335 | ++--------------------+--------------------+-----------------------+ |
|
336 | +| upload objects | 82.12 MiB/s | 194.84 ms/object | |
|
337 | +| download objects | 108.87 MiB/s | 146.97 ms/object | |
|
338 | +| put small objects | 112.74 objects/s | 35.48 ms/object | |
|
339 | +| get small objects | 368.76 objects/s | 10.85 ms/object | |
|
340 | +| list objects | 19537.05 objects/s | 25.59 ms/ 125 objects | |
|
341 | +| head objects | 2284.84 objects/s | 1.75 ms/object | |
|
342 | +| delete objects | 1458.48 objects/s | 2.74 ms/object | |
|
343 | +| change permissions | not support | not support | |
|
344 | +| change owner/group | not support | not support | |
|
345 | +| update mtime | not support | not support | |
|
346 | ++--------------------+--------------------+-----------------------+ |
|
347 | +``` |
|
348 | + |
|
349 | +juicefs bench 已有的nfs挂载目录,结果如下: |
|
350 | + |
|
351 | +``` |
|
352 | +juicefs bench . --big-file-size 50M -p 2 |
|
353 | + Write big blocks: 100/100 [====================================================] 51.7/s used: 1.934549134s |
|
354 | + Read big blocks: 100/100 [====================================================] 71.9/s used: 1.391573646s |
|
355 | + Write small blocks: 200/200 [====================================================] 102.1/s used: 1.95950659s |
|
356 | + Read small blocks: 200/200 [====================================================] 240.2/s used: 832.645761ms |
|
357 | + Stat small files: 200/200 [====================================================] 8434.9/s used: 23.787622ms |
|
358 | +Benchmark finished! |
|
359 | +BlockSize: 1.0 MiB, BigFileSize: 50 MiB, SmallFileSize: 128 KiB, SmallFileCount: 100, NumThreads: 2 |
|
360 | ++------------------+-----------------+---------------+ |
|
361 | +| ITEM | VALUE | COST | |
|
362 | ++------------------+-----------------+---------------+ |
|
363 | +| Write big file | 52.23 MiB/s | 1.91 s/file | |
|
364 | +| Read big file | 71.97 MiB/s | 1.39 s/file | |
|
365 | +| Write small file | 102.1 files/s | 19.58 ms/file | |
|
366 | +| Read small file | 240.9 files/s | 8.30 ms/file | |
|
367 | +| Stat file | 10524.5 files/s | 0.19 ms/file | |
|
368 | ++------------------+-----------------+---------------+ |
|
369 | +``` |
|
370 | + |
|
371 | +juicefs bench 通过juicefs挂载的目录,结果如下: |
|
372 | +``` |
|
373 | +juicefs bench . --big-file-size 50M -p 2 |
|
374 | + Write big blocks: 100/100 [====================================================] 801.2/s used: 124.823072ms |
|
375 | + Read big blocks: 100/100 [====================================================] 583.9/s used: 171.326371ms |
|
376 | + Write small blocks: 200/200 [====================================================] 152.6/s used: 1.311017584s |
|
377 | + Read small blocks: 200/200 [====================================================] 253.9/s used: 787.66073ms |
|
378 | + Stat small files: 200/200 [====================================================] 2273.5/s used: 88.041224ms |
|
379 | +Benchmark finished! |
|
380 | +BlockSize: 1.0 MiB, BigFileSize: 50 MiB, SmallFileSize: 128 KiB, SmallFileCount: 100, NumThreads: 2 |
|
381 | +Time used: 6.2 s, CPU: 26.5%, Memory: 310.7 MiB |
|
382 | ++------------------+-----------------+---------------+ |
|
383 | +| ITEM | VALUE | COST | |
|
384 | ++------------------+-----------------+---------------+ |
|
385 | +| Write big file | 825.47 MiB/s | 0.12 s/file | |
|
386 | +| Read big file | 587.95 MiB/s | 0.17 s/file | |
|
387 | +| Write small file | 152.6 files/s | 13.10 ms/file | |
|
388 | +| Read small file | 254.3 files/s | 7.87 ms/file | |
|
389 | +| Stat file | 2298.5 files/s | 0.87 ms/file | |
|
390 | +| FUSE operation | 4567 operations | 1.05 ms/op | |
|
391 | +| Update meta | 608 operations | 3.66 ms/op | |
|
392 | +| Put object | 226 operations | 114.78 ms/op | |
|
393 | +| Get object | 0 operations | 0.00 ms/op | |
|
394 | +| Delete object | 0 operations | 0.00 ms/op | |
|
395 | +| Write into cache | 226 operations | 1.37 ms/op | |
|
396 | +| Read from cache | 229 operations | 5.46 ms/op | |
|
397 | ++------------------+-----------------+---------------+ |
|
398 | +``` |
|
... | ... | \ No newline at end of file |