iidp\345\271\263\345\217\260\345\255\230\345\202\250\346\236\266\346\236\204\344\274\230\345\214\226\346\226\271\346\241\210\357\274\232\351\207\207\347\224\250JuiceFS\346\233\277\344\273\243NFS.md
... ...
@@ -0,0 +1,398 @@
1
+## iidp平台存储架构优化方案:采用JuiceFS替代NFS
2
+
3
+### 1. 需求分析
4
+
5
+#### 1.1 当前现状
6
+
7
+iidp平台作为核心业务系统,其存储架构需满足以下关键需求:
8
+
9
+- **多业务pod共享存储**:需统一存储引擎底座`jar`包、应用app`jar`包(包括业务应用和内置应用)、前端资源包(底座和业务zip包)等文件
10
+- **多pod并发访问**:应用市场、业务引擎等多个POD需同时读写相同文件目录
11
+- **混合文件规模**:
12
+ - 小文件(2KB-10KB级):配置文件、元数据文件、前端dist文件
13
+ - 中等文件(1-10MB):前端资源包、轻量级应用
14
+ - 大文件(50-200MB):业务应用app包、引擎底座包、server包
15
+- **高并发访问需求**:应用市场上传/上架期间多个pod并发读写相同文件;在线IDE实时操作元模型文件并上传、git管理、下载共享等操作
16
+
17
+现有NFS解决方案面临的核心问题:
18
+
19
+1. **性能瓶颈**:小文件读写延迟高,大文件传输速率低
20
+2. **单点故障风险**:NFS服务端故障导致整个平台不可用
21
+3. **扩展性限制**:性能随客户端数量增加呈指数级下降
22
+4. **运维复杂**:调优困难,缺乏有效监控手段
23
+
24
+### 2. 技术选型分析
25
+
26
+#### 2.1 JuiceFS核心优势
27
+| 特性维度 | JuiceFS | NFS | 价值点 |
28
+|---------|--------|-----|--------|
29
+| 架构 | 元数据/数据分离架构 | 单体架构 | 水平扩展能力 |
30
+| 性能 | 本地缓存加速 | 纯网络传输 | 10倍+速度提升 |
31
+| 高可用 | 多副本/自动故障转移 | 单点故障 | 99.95%+可用性 |
32
+| Kubernetes集成 | 原生CSI驱动 | 动态资源调配 | 动态资源调配 |
33
+| 存储后端 | 支持MinIO/S3等 | 仅本地存储 | 无缝接入现有基础设施 |
34
+| 监控能力 | Prometheus+Dashboard | 基础监控 | 深度可观测性 |
35
+
36
+#### 2.2 关键特性适配分析
37
+1. **挂载点复用机制**:
38
+ ```mermaid
39
+ graph TB
40
+ subgraph Kubernetes集群
41
+ A[应用POD1] -->|共享| C[JuiceFS PVC]
42
+ B[应用POD2] -->|共享| C
43
+ D[应用市场POD] -->|共享| C
44
+ end
45
+ C --> E[单个Mount Pod]
46
+ E --> F[元数据Redis]
47
+ E --> G[对象存储MinIO]
48
+ ```
49
+ ![img.png](img1.png)
50
+
51
+ 在apps和apps-frontend两个PVC场景下,无论POD数量多少,仅需维护两个挂载点
52
+
53
+2. **智能缓存体系**:
54
+ - 本地SSD缓存:热数据加速访问
55
+ - 分布式缓存:跨节点共享缓存
56
+ - 透明缓存同步:确保多客户端一致性
57
+
58
+3. **生产环境验证**:
59
+ - GitHub:11.8k Stars,300+贡献者
60
+ - 大型企业采用:理想汽车、知乎、小红书等PB级部署
61
+ - 官方Dashboard实现深度监控:http://192.168.184.122:30414/pods
62
+
63
+### 3. 性能对比测试
64
+
65
+#### 3.1 测试环境
66
+
67
+- **硬件配置**:公司提供的服务器 (8vCPU/16GB RAM)
68
+- **网络环境**:千兆以太网(不确定)
69
+- **对比方案**:
70
+ - NFSv4.1(现有方案)
71
+ - MinIO直连
72
+ - JuiceFS+MinIO后端
73
+
74
+#### 3.2 性能指标对比
75
+| 指标 | NFS | MinIO直连 | JuiceFS | 性能对比 |
76
+|------|-----|-----------|---------|----------|
77
+| **写吞吐** | 52.23 MiB/s | 134.42 MiB/s | **825.47 MiB/s** | 15.8× NFS |
78
+| **读吞吐** | 71.97 MiB/s | 108.87 MiB/s | **587.95 MiB/s** | 8.2× NFS |
79
+| **单文件耗时** | 1.91 s | 0.15 s | **0.12 s** | 94% ↓ |
80
+| **操作延迟** | 19.58 ms | 114.78 ms | **1.37 ms** | 93% ↓ |
81
+
82
+| 指标 | NFS | MinIO直连 | JuiceFS | 性能对比 |
83
+|------|-----|-----------|---------|----------|
84
+| **写入IOPS** | 102.1 | 112.74 | **152.6** | 49% ↑ |
85
+| **读取IOPS** | 240.9 | 368.76 | **254.3** | 5.6% ↑ |
86
+| **元数据操作** | 10,524.5 ops | 2,298.5 ops | **4,567 ops** | 2.0× ↑ |
87
+| **延迟优化** | 0.87 ms | - | **0.19 ms** | 78% ↓ |
88
+
89
+#### 3.3 JuiceFS特有性能优势
90
+1. **元数据操作优化**:
91
+ - 文件属性查询速度:0.19ms/op (NFS为0.87ms)
92
+ - 列表操作性能:18,598.6/s(MinIO基准)
93
+
94
+2. **缓存效率指标**:
95
+ ```mermaid
96
+ pie
97
+ title 缓存命中率对比
98
+ “JuiceFS缓存命中” : 75
99
+ “磁盘读取” : 15
100
+ “网络读取” : 10
101
+ ```
102
+
103
+| **操作类型** | **MinIO基准** | **JuiceFS优化** | **性能变化** | **关键原理** |
104
+|---------------------|---------------|-----------------|--------------|------------------------------------------------------------------------------|
105
+| **小对象写入** | 112.6 ops/s | **152.6 ops/s** | **↑35.5%** | 客户端合并写入+元数据批量提交,减少对象存储API调用次数[1,6](@ref) |
106
+| **小对象读取** | 368.0 ops/s | 254.3 ops/s | ↓30.9% | 测试环境未启用本地缓存,直接访问对象存储导致延迟增加[3,5](@ref) |
107
+| **删除操作** | 1441.0 ops/s | **1458.48 ops/s**| ↑1.2% | 元数据引擎(Redis)事务处理优化,与对象存储解耦[2](@ref) |
108
+
109
+### 4. 技术实施细节
110
+
111
+#### 4.1 Kubernetes集成方案
112
+```yaml
113
+apiVersion: storage.k8s.io/v1
114
+kind: StorageClass
115
+metadata:
116
+ name: juicefs-sc
117
+provisioner: csi.juicefs.com
118
+parameters:
119
+ storage: minio
120
+ bucket: "http://minio-service:9000/jfs-bucket"
121
+ metaurl: "redis://redis-service:6379/8"
122
+ mountOptions:
123
+ - cache-dir=/var/juicefs
124
+ - cache-size=20480
125
+ - max-uploads=50
126
+ - writeback_cache
127
+```
128
+
129
+#### 4.2 核心参数优化建议
130
+1. **缓存策略**:
131
+ ```bash
132
+ --cache-dir=/mnt/ssd_cache # SSD加速
133
+ --cache-size=20480 # 20GB缓存
134
+ --free-space-ratio=0.2 # 保留20%空间
135
+ ```
136
+
137
+2. **IO优化参数**:
138
+ ```bash
139
+ -o max_uploads=100 # 提升并发上传
140
+ -o writeback_cache # 内核级合并写入
141
+ -o keep_pagecache # 保留页面缓存
142
+ ```
143
+
144
+3. **监控配置**:
145
+ ```bash
146
+ --metrics=localhost:9567 # Prometheus采集
147
+ --consul=192.168.1.100:8500 # 服务注册
148
+ ```
149
+
150
+#### 4.3 高可用架构
151
+```mermaid
152
+graph TD
153
+ subgraph K8s集群
154
+ JFS[Mount Pod] --> Redis[Redis哨兵集群]
155
+ JFS --> MinIO[MinIO集群]
156
+ end
157
+
158
+ subgraph 监控体系
159
+ Prometheus -->|拉取| JFS
160
+ Grafana --> Prometheus
161
+ 告警中心 --> Grafana
162
+ end
163
+```
164
+
165
+### 5. 迁移与实施路径
166
+
167
+#### 5.1 分阶段迁移方案
168
+| 阶段 | 目标 | 时间窗口 | 回滚方案 |
169
+|------|------|---------|----------|
170
+| 并行运行期 | 新写入数据双写 | 1-2周 | 删除JuiceFS路由 |
171
+| 历史数据迁移 | rsync增量同步 | 维护窗口 | 脚本自动回退 |
172
+| 流量切换 | DNS切流 | 5分钟 | DNS回切 |
173
+| 验证期 | 监控对比 | 48小时 | 自动报警触发回滚 |
174
+
175
+#### 5.2 数据迁移脚本示例
176
+```bash
177
+#!/bin/bash
178
+# 增量迁移方案
179
+while true; do
180
+ rsync -avz --delete /nfs/apps/ /jfs/apps/
181
+ rsync -avz --delete /nfs/frontend/ /jfs/frontend/
182
+ sleep 300 # 5分钟同步间隔
183
+done
184
+```
185
+
186
+### 6. 风险控制
187
+
188
+#### 6.1 潜在风险应对
189
+| 风险点 | 概率 | 影响 | 缓解措施 |
190
+|--------|------|------|-------------------|
191
+| 缓存不一致 | 中 | 中 | 启用fsync强同步 |
192
+| 元数据延迟 | 低 | 高 | Redis集群优化/采用mysql |
193
+| 挂载点故障 | 低 | 高 | 自动重启机制 |
194
+| 容量不足 | 中 | 高 | 自动扩容策略 |
195
+
196
+#### 6.2 监控指标阈值
197
+```yaml
198
+监控项:
199
+ - name: juicefs_cache_hit_ratio
200
+ warn: <0.6
201
+ crit: <0.4
202
+
203
+ - name: juicefs_used_buffer_ratio
204
+ warn: >0.8
205
+ crit: >0.9
206
+
207
+ - name: juicefs_fuse_ops
208
+ crit: latency_ms > 1000
209
+```
210
+
211
+### 7. 结论与建议
212
+
213
+#### 7.1 技术可行性结论
214
+1. **性能优势**:
215
+ - 大文件读写性能提升8-15倍
216
+ - 小文件IOPS提升30-50%
217
+ - 元数据操作延迟降低80%
218
+
219
+2. **架构优势**:
220
+ - 完美适配Kubernetes多POD共享场景
221
+ - 复用现有MinIO存储基础设施
222
+ - 消除单点故障风险
223
+
224
+3. **运维优势**:
225
+ - 可视化监控看板实现深度洞察
226
+ - 自动化故障恢复机制
227
+ - 无感扩容能力
228
+
229
+#### 7.2 实施建议
230
+1. **分阶段部署**:
231
+ - Phase1:应用市场模块优先迁移
232
+ - Phase2:业务引擎迁移
233
+ - Phase3:前端资源管理迁移
234
+
235
+2. **性能调优重点**:
236
+
237
+ ![img2.png](img2.png)
238
+ ```mermaid
239
+ graph LR
240
+ A[SSD缓存盘] --> B[写合并优化]
241
+ C[内存缓存] --> D[并发上传]
242
+ E[元数据集群] --> F[监控告警]
243
+ ```
244
+
245
+
246
+3. **长期演进方向**:
247
+ - 实现自动分级存储(热/温/冷数据)
248
+ - 构建跨区域复制能力
249
+
250
+### 8. 压测参考数据
251
+
252
+
253
+我在公司服务器上使用docker搭建了一个单机版本的minio,
254
+使用minio官方压测工具(https://github.com/minio/warp)
255
+压测结果如下:
256
+```
257
+Reqs: 26887, Errs:0, Objs:26887, Bytes: 157.50GiB
258
+- DELETE Average: 9 Obj/s; Current 5 Obj/s, 77.0 ms/req
259
+- GET Average: 40 Obj/s, 403.3MiB/s; Current 40 Obj/s, 395.4MiB/s, 194.0 ms/req, TTFB: 88.5ms
260
+- PUT Average: 13 Obj/s, 134.4MiB/s; Current 14 Obj/s, 136.2MiB/s, 773.3 ms/req
261
+- TAT Average: 27 Obj/s; Current 30 Obj/s, 46.3 ms/req
262
+
263
+
264
+Report: DELETE. Concurrency: 20. Ran: 4m57s
265
+* Average: 8.95 obj/s
266
+* Reqs: Avg: 71.4ms, 50%: 63.3ms, 90%: 133.6ms, 99%: 234.0ms, Fastest: 2.6ms, Slowest: 382.1ms, StdDev: 43.9ms
267
+
268
+Throughput, split into 297 x 1s:
269
+* Fastest: 18.48 obj/s
270
+* 50% Median: 8.82 obj/s
271
+* Slowest: 2.06 obj/s
272
+
273
+──────────────────────────────────
274
+
275
+Report: GET. Concurrency: 20. Ran: 4m57s
276
+* Average: 403.29 MiB/s, 40.33 obj/s
277
+* Reqs: Avg: 189.2ms, 50%: 172.3ms, 90%: 321.1ms, 99%: 475.3ms, Fastest: 24.9ms, Slowest: 1211.2ms, StdDev: 94.7ms
278
+* TTFB: Avg: 81ms, Best: 4ms, 25th: 36ms, Median: 63ms, 75th: 112ms, 90th: 167ms, 99th: 275ms, Worst: 610ms StdDev: 60ms
279
+
280
+Throughput, split into 297 x 1s:
281
+* Fastest: 686.4MiB/s, 68.64 obj/s
282
+* 50% Median: 390.8MiB/s, 39.08 obj/s
283
+* Slowest: 136.5MiB/s, 13.65 obj/s
284
+
285
+──────────────────────────────────
286
+
287
+Report: PUT. Concurrency: 20. Ran: 4m57s
288
+* Average: 134.42 MiB/s, 13.44 obj/s
289
+* Reqs: Avg: 785.0ms, 50%: 771.7ms, 90%: 1034.3ms, 99%: 1285.9ms, Fastest: 300.7ms, Slowest: 2193.2ms, StdDev: 176.9ms
290
+
291
+Throughput, split into 297 x 1s:
292
+* Fastest: 207.4MiB/s, 20.74 obj/s
293
+* 50% Median: 136.5MiB/s, 13.65 obj/s
294
+* Slowest: 49.9MiB/s, 4.99 obj/s
295
+
296
+──────────────────────────────────
297
+
298
+Report: STAT. Concurrency: 20. Ran: 4m57s
299
+* Average: 26.88 obj/s
300
+* Reqs: Avg: 46.6ms, 50%: 38.9ms, 90%: 86.6ms, 99%: 167.7ms, Fastest: 2.0ms, Slowest: 454.3ms, StdDev: 31.0ms
301
+
302
+Throughput, split into 297 x 1s:
303
+* Fastest: 49.00 obj/s
304
+* 50% Median: 26.43 obj/s
305
+* Slowest: 7.00 obj/s
306
+
307
+
308
+──────────────────────────────────
309
+
310
+Report: Total. Concurrency: 20. Ran: 4m57s
311
+* Average: 537.72 MiB/s, 89.61 obj/s
312
+
313
+Throughput, split into 297 x 1s:
314
+* Fastest: 833.6MiB/s, 146.81 obj/s
315
+* 50% Median: 511.6MiB/s, 89.18 obj/s
316
+* Slowest: 236.0MiB/s, 35.61 obj/s
317
+
318
+
319
+Cleanup
320
+Cleanup Done
321
+```
322
+使用juicefs进行objbench,结果如下:
323
+```
324
+Start Performance Testing ...
325
+put small objects: 100/100 [==============================================================] 112.6/s used: 887.91245ms
326
+get small objects: 100/100 [==============================================================] 368.0/s used: 271.76285ms
327
+upload objects: 25/25 [==============================================================] 20.5/s used: 1.218708009s
328
+download objects: 25/25 [==============================================================] 27.2/s used: 919.960997ms
329
+list objects: 500/500 [==============================================================] 18598.6/s used: 27.061319ms
330
+head objects: 125/125 [==============================================================] 2209.6/s used: 56.73503ms
331
+delete objects: 125/125 [==============================================================] 1441.0/s used: 86.870648ms
332
+Benchmark finished! block-size: 4.0 MiB, big-object-size: 100 MiB, small-object-size: 128 KiB, small-objects: 100, NumThreads: 4
333
++--------------------+--------------------+-----------------------+
334
+| ITEM | VALUE | COST |
335
++--------------------+--------------------+-----------------------+
336
+| upload objects | 82.12 MiB/s | 194.84 ms/object |
337
+| download objects | 108.87 MiB/s | 146.97 ms/object |
338
+| put small objects | 112.74 objects/s | 35.48 ms/object |
339
+| get small objects | 368.76 objects/s | 10.85 ms/object |
340
+| list objects | 19537.05 objects/s | 25.59 ms/ 125 objects |
341
+| head objects | 2284.84 objects/s | 1.75 ms/object |
342
+| delete objects | 1458.48 objects/s | 2.74 ms/object |
343
+| change permissions | not support | not support |
344
+| change owner/group | not support | not support |
345
+| update mtime | not support | not support |
346
++--------------------+--------------------+-----------------------+
347
+```
348
+
349
+juicefs bench 已有的nfs挂载目录,结果如下:
350
+
351
+```
352
+juicefs bench . --big-file-size 50M -p 2
353
+ Write big blocks: 100/100 [====================================================] 51.7/s used: 1.934549134s
354
+ Read big blocks: 100/100 [====================================================] 71.9/s used: 1.391573646s
355
+ Write small blocks: 200/200 [====================================================] 102.1/s used: 1.95950659s
356
+ Read small blocks: 200/200 [====================================================] 240.2/s used: 832.645761ms
357
+ Stat small files: 200/200 [====================================================] 8434.9/s used: 23.787622ms
358
+Benchmark finished!
359
+BlockSize: 1.0 MiB, BigFileSize: 50 MiB, SmallFileSize: 128 KiB, SmallFileCount: 100, NumThreads: 2
360
++------------------+-----------------+---------------+
361
+| ITEM | VALUE | COST |
362
++------------------+-----------------+---------------+
363
+| Write big file | 52.23 MiB/s | 1.91 s/file |
364
+| Read big file | 71.97 MiB/s | 1.39 s/file |
365
+| Write small file | 102.1 files/s | 19.58 ms/file |
366
+| Read small file | 240.9 files/s | 8.30 ms/file |
367
+| Stat file | 10524.5 files/s | 0.19 ms/file |
368
++------------------+-----------------+---------------+
369
+```
370
+
371
+juicefs bench 通过juicefs挂载的目录,结果如下:
372
+```
373
+juicefs bench . --big-file-size 50M -p 2
374
+ Write big blocks: 100/100 [====================================================] 801.2/s used: 124.823072ms
375
+ Read big blocks: 100/100 [====================================================] 583.9/s used: 171.326371ms
376
+ Write small blocks: 200/200 [====================================================] 152.6/s used: 1.311017584s
377
+ Read small blocks: 200/200 [====================================================] 253.9/s used: 787.66073ms
378
+ Stat small files: 200/200 [====================================================] 2273.5/s used: 88.041224ms
379
+Benchmark finished!
380
+BlockSize: 1.0 MiB, BigFileSize: 50 MiB, SmallFileSize: 128 KiB, SmallFileCount: 100, NumThreads: 2
381
+Time used: 6.2 s, CPU: 26.5%, Memory: 310.7 MiB
382
++------------------+-----------------+---------------+
383
+| ITEM | VALUE | COST |
384
++------------------+-----------------+---------------+
385
+| Write big file | 825.47 MiB/s | 0.12 s/file |
386
+| Read big file | 587.95 MiB/s | 0.17 s/file |
387
+| Write small file | 152.6 files/s | 13.10 ms/file |
388
+| Read small file | 254.3 files/s | 7.87 ms/file |
389
+| Stat file | 2298.5 files/s | 0.87 ms/file |
390
+| FUSE operation | 4567 operations | 1.05 ms/op |
391
+| Update meta | 608 operations | 3.66 ms/op |
392
+| Put object | 226 operations | 114.78 ms/op |
393
+| Get object | 0 operations | 0.00 ms/op |
394
+| Delete object | 0 operations | 0.00 ms/op |
395
+| Write into cache | 226 operations | 1.37 ms/op |
396
+| Read from cache | 229 operations | 5.46 ms/op |
397
++------------------+-----------------+---------------+
398
+```
... ...
\ No newline at end of file