☰
Current Page
Main Menu
Home
Home
Editing
sie-snest-gw设计与实现
Edit
Preview
h1
h2
h3
default
Set your preferred keybinding
default
vim
emacs
markdown
Set this page's format to
AsciiDoc
Creole
Markdown
MediaWiki
Org-mode
Plain Text
RDoc
Textile
Rendering unavailable for
BibTeX
Pod
reStructuredText
Help 1
Help 1
Help 1
Help 2
Help 3
Help 4
Help 5
Help 6
Help 7
Help 8
Autosaved text is available. Click the button to restore it.
Restore Text
# sie-snest-gw 基于nginx网关设计与实现 ## 1. 需求分析 ### 1.1 背景 当前系统已经长期使用```Nginx```作为代理网关,且已有许多代理接口运行在```Nginx```上,包括前端资源文件```html```、```api-doc```、```前端配置文件```、```WebGME```等。```Nginx```作为成熟的代理服务器,具有高性能和稳定性。然而,现有的```Nginx```网关仅能提供静态代理服务,无法满足```app```维度的动态服务注册与发现需求,而```sie-snest-engine```引擎核心是需要```app```维度的路由分发功能。 ### 1.2 需求 1. **保留现有Nginx能力**:现有的Nginx配置和代理接口需要保留,但代理到引擎的 /api 接口可以调整。 2. **动态服务注册与发现**:需要基于app维度提供服务注册与发现能力,实现动态分发和路由。 3. **非侵入式增强Nginx**:希望通过非侵入式的方式增强Nginx的能力,不修改已有的Nginx代码。 4. **高效通信**:采用Unix Domain Socket通信,提升通信性能。 ## 2. 方案设计 ### 2.1 保留现有Nginx能力 - **现有Nginx配置**:保持现有Nginx的配置文件和代理接口不变,确保现有服务的正常运行。 - **代理接口保留**:除代理到引擎的 /api 接口外,其他代理接口保持不变。 ### 2.2 动态服务注册与发现 - **app维度的服务注册与发现**:通过实现基于 app 维度的服务注册与发现机制,实现动态寻址和路由。 ### 2.3 非侵入式增强Nginx - **边车模式**:采用边车(sidecar)模式,编写 sie-snest-gw 的边车服务,与Nginx共同运行,共享同一个pod。 - **Unix Domain Socket通信**:由于它们共享网络,边车服务与Nginx通过Unix Domain Socket进行通信,提升通信性能,相比于传统的网络socket具有更好的性能。 ### 2.4 方案架构图 ``` +-----------+ +-----------+ | | | | | Client +---------->+ Nginx | | | | | +-----------+ +-----------+ | | +-----------+ | Unix DS | +-----------+ | | +-----------+ | sie-snet | | -gw | +-----------+ | | +-----------+ | Backend | +-----------+ ``` 已有的方案: [[http://iidp.chinasie.com:9999/iidpminio/sie-snest-gw/img_4.png]] 增强后方案: 主要改造点,新增sie-snest-gw以边车的模式挂在nginx,共享同一个pod资源,以unix domain socket通信,也可以选择localhost,性能都非常高,在保证高性能的前提下增强nginx功能 [[http://iidp.chinasie.com:9999/iidpminio/sie-snest-gw/img_5.png]] ## 3. 技术实现 ### 3.1 Golang实现 - **高性能**:使用 Golang 编写 sie-snest-gw,保证高性能和低内存占用。 - **完善的网关生态**:利用 Golang 的丰富生态系统,增强网关功能。 ### 3.2 Kubernetes资源监听 - **k8s client-go SDK**:使用Kubernetes官方的client-go SDK监听资源变化,实时更新内存中的路由信息。 - **动态更新**:根据监听到的资源变化,动态更新 app 维度的服务注册信息,实现动态路由。 ### 3.3 高性能代理库 - **Go标准库代理**:使用Go标准库提供的代理库,实现高性能、高可用的代理功能。 - **负载均衡**:实现负载均衡策略,确保流量的均匀分配和服务的高可用性。 ### 3.4 详细实现步骤 1. **初始化 sie-snest-gw 服务**: - 使用Golang编写主程序,初始化服务。 - 配置Unix Domain Socket,用于与Nginx通信。 2. **监听Kubernetes资源**: - 使用client-go SDK监听Kubernetes中的服务资源变化。 - 更新内存中的app维度的服务注册信息。 3. **实现动态路由**: - 根据服务注册信息,动态生成路由规则。 - 使用Go标准库的代理库,处理客户端请求并转发到相应的后端服务。 4. **与Nginx通信**: - 配置Nginx,通过Unix Domain Socket将请求转发给 sie-snest-gw。 - 在 sie-snest-gw 中处理请求并返回响应给Nginx。 5. **部署与测试**: - 将 sie-snest-gw 部署为Nginx的边车服务。 - 进行全面的功能和性能测试,确保系统稳定性和高性能。 ### 3.5 资源占用 - 镜像大小:sie-snest-gw 镜像大小约为 17MB [[http://iidp.chinasie.com:9999/iidpminio/sie-snest-gw/img_6.png]] - 内存占用:sie-snest-gw 和nginx一起内存占用约为 15MB [[http://iidp.chinasie.com:9999/iidpminio/sie-snest-gw/img_7.png]] ### 3.6 升级和降级 考虑兼容已有系统,或者升级后发现出现问题无法第一时间修复的场景,gw需要支持随时随地的降级回原来的使用方式, 整个操作非常简单,只需要修改```nginx.conf```中代理到 /api 的配置项即可。 ``` location ^~/api/ { proxy_http_version 1.1; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; client_max_body_size 2048m; proxy_pass http://iidp-app:8060/; # 使用已有的代理至master节点的方式 proxy_pass http://localhost:8060; # 使用localhost,由于共享网络空间,只会走内核协议栈不会走网线 proxy_pass http://unix:/socket/gw-iidp.socket:/api/; # 使用unix domain socket } ``` ### 3.7 就绪和存活 ``` // 就绪和存活探针 router.GET("/healthz", func(ctx *gin.Context) { ctx.JSON(http.StatusOK, gin.H{ "status": "ok", }) }) ``` [[http://iidp.chinasie.com:9999/iidpminio/sie-snest-gw/healthz.jpg]] 如上图所示,在gw上分别配置就绪和存活探针,用于检测gw是否就绪和存活。 这个检测是独立于nginx存在的,nginx根本就不知道gw的存在,gw也不知道nginx的存在,它们唯一的联系就是```/api``` 这个接口,对于nginx来说不关心谁实现了这个接口,对于gw来说不关心这个接口来自于谁,都是解耦的。 所以任何一方由于某些原因导致的重启都不会影响其他的容器。 ### 3.8 熔断和限流 作为一个nginx网关的一部分,为了保护上游的引擎服务,需要提供限流和熔断的功能。 - 限流 使用的是优步开源的 ```go.uber.org/ratelimit``` 只需要在handler中增加限流的middlerware即可,如下代码所示 ``` var ( r = ratelimit.New(500) // 100 op per second todo ) func RateLimit() gin.HandlerFunc { return func(ctx *gin.Context) { r.Take() // Call Take() before each operation. Take will sleep until you can continue. ctx.Next() } } ``` - 熔断 使用的业界开源的hystrix ```github.com/afex/hystrix-go/hystrix``` 在代理的地方使用限流保护,如下代码所示: ``` err = hystrix.Do(appNameTag+AtSeparator+ip, func() error { var herr error startTime := time.Now() // 记录开始时间 // 创建一个反向代理 proxy := httputil.NewSingleHostReverseProxy(targetUrl) // 设置错误处理函数 proxy.ErrorHandler = func(rw http.ResponseWriter, req *http.Request, err error) { herr = err klog.Errorf("proxy to %v app=%v err=%v", key, appName, err.Error()) ctx.String(http.StatusBadGateway, "Bad Gateway: %v", err) } ctx.Request.Host = targetUrl.Host ctx.Request.URL.Path = path proxy.ServeHTTP(ctx.Writer, ctx.Request) // 计算并打印请求耗时 elapsedTime := time.Since(startTime).Milliseconds() klog.Infof("proxy to %v app=%v cost=%vms", key, appName, elapsedTime) return herr }, func(err error) error { // 熔断开启后,直接返回错误信息,保护下游服务 klog.Errorf("hystrix %v app=%v err=%v", key, appName, err.Error()) return err }) if err != nil { pkg.Error(ctx, pkg.ErrInternalServerError, "proxy hystrix failed: "+err.Error()) return } ``` ## 4. 总结 通过上述设计与实现方案,sie-snest-gw 网关能够在不修改现有Nginx代码的前提下,提供 app 维度的动态服务注册与发现能力,并通过高效的Unix Domain Socket通信机制,提升系统的整体性能。采用Golang实现的高性能边车服务,与Nginx共同运行,确保系统的高可用性和可扩展性。 ## 5. 测试 ### 5.1 性能测试 整体上在低并发低请求量的情况下,Unix Domain Socket的性能要优于网络socket。但是在高并发高请求量的情况下,Unix Domain Socket的优势不会特别大,因为业务对于数据的处理时间会占用更多的时间,那么传输时间的优势就不会特别明显。 基于 UDS 的数据的收发过程非常的简单,发送方是直接将数据写到接收方的接收队列里的,不走内核协议栈。 [[http://iidp.chinasie.com:9999/iidpminio/sie-snest-gw/img_1.png]] 延迟测试: [[http://iidp.chinasie.com:9999/iidpminio/sie-snest-gw/img_2.png]] 吞吐测试: [[http://iidp.chinasie.com:9999/iidpminio/sie-snest-gw/img_3.png]] #### unix domain socket 测试数据,请求量 10000 20000 40000 - ##### 并发50 ``` --------------------- 并发 50 --------------------------------- 122/lzb/test> ./bench -c 50 -n 10000 INFO[0000] total: 10000 concurrency: 50 requests per client: 200 INFO[0002] took 2133 ms for 10000 requests INFO[0002] sent requests : 10000 INFO[0002] received requests : 10000 INFO[0002] received requests_OK : 10000 INFO[0002] throughput (TPS) : 5688 INFO[0002] mean: 10474865 ns, median: 8584496 ns, max: 62494839 ns, min: 546029 ns, p99.9: 49550780 ns INFO[0002] mean: 8 ms, median: 6 ms, max: 52 ms, min: 0 ms, p99.9: 29 ms 122/lzb/test> ./bench -c 50 -n 20000 INFO[0000] total: 20000 concurrency: 50 requests per client: 400 INFO[0008] took 8744 ms for 20000 requests INFO[0008] sent requests : 20000 INFO[0008] received requests : 20000 INFO[0008] received requests_OK : 20000 INFO[0008] throughput (TPS) : 3287 INFO[0008] mean: 21662669 ns, median: 9012562 ns, max: 393328734 ns, min: 706598 ns, p99.9: 357892903 ns INFO[0008] mean: 11 ms, median: 8 ms, max: 293 ms, min: 0 ms, p99.9: 57 ms 122/lzb/test> ./bench -c 50 -n 40000 INFO[0000] total: 40000 concurrency: 50 requests per client: 800 INFO[0061] took 61471 ms for 40000 requests INFO[0061] sent requests : 40000 INFO[0061] received requests : 40000 INFO[0061] received requests_OK : 40000 INFO[0061] throughput (TPS) : 1650 INFO[0061] mean: 74883410 ns, median: 11363670 ns, max: 947822594 ns, min: 447573 ns, p99.9: 689353520 ns INFO[0061] mean: 54 ms, median: 11 ms, max: 747 ms, min: 0 ms, p99.9: 589 ms ``` - ##### 并发100 ``` ------------------------------ 并发 100 --------------------------------- 122/lzb/test> ./bench -c 100 -n 10000 INFO[0000] total: 10000 concurrency: 100 requests per client: 100 INFO[0001] took 1941 ms for 10000 requests INFO[0001] sent requests : 10000 INFO[0001] received requests : 10000 INFO[0001] received requests_OK : 10000 INFO[0001] throughput (TPS) : 6151 INFO[0001] mean: 18612570 ns, median: 15089722 ns, max: 99981825 ns, min: 640579 ns, p99.9: 77418796 ns INFO[0001] mean: 18 ms, median: 15 ms, max: 99 ms, min: 0 ms, p99.9: 77 ms 122/lzb/test> ./bench -c 100 -n 20000 INFO[0000] total: 20000 concurrency: 100 requests per client: 200 INFO[0003] took 3460 ms for 20000 requests INFO[0003] sent requests : 20000 INFO[0003] received requests : 20000 INFO[0003] received requests_OK : 20000 INFO[0003] throughput (TPS) : 5780 INFO[0003] mean: 16859393 ns, median: 14338056 ns, max: 130512875 ns, min: 611710 ns, p99.9: 79118046 ns INFO[0003] mean: 16 ms, median: 14 ms, max: 130 ms, min: 0 ms, p99.9: 79 ms 122/lzb/test> ./bench -c 100 -n 40000 INFO[0000] total: 40000 concurrency: 100 requests per client: 400 INFO[0058] took 58969 ms for 40000 requests INFO[0058] sent requests : 40000 INFO[0058] received requests : 40000 INFO[0058] received requests_OK : 40000 INFO[0058] throughput (TPS) : 1678 INFO[0058] mean: 140379894 ns, median: 22283178 ns, max: 1493640861 ns, min: 589891 ns, p99.9: 1090024161 ns INFO[0058] mean: 140 ms, median: 22 ms, max: 1493 ms, min: 0 ms, p99.9: 1090 ms ``` #### localhost 测试数据,请求量 10000 20000 40000 - ##### 并发50 ``` 122/lzb/test> ./bench -c 50 -n 10000 INFO[0000] total: 10000 concurrency: 50 requests per client: 200 INFO[0001] took 1995 ms for 10000 requests INFO[0001] sent requests : 10000 INFO[0001] received requests : 10000 INFO[0001] received requests_OK : 10000 INFO[0001] throughput (TPS) : 5012 INFO[0001] mean: 9809687 ns, median: 8410406 ns, max: 63636117 ns, min: 656349 ns, p99.9: 44334232 ns INFO[0001] mean: 9 ms, median: 8 ms, max: 63 ms, min: 0 ms, p99.9: 44 ms 122/lzb/test> ./bench -c 50 -n 20000 INFO[0000] total: 20000 concurrency: 50 requests per client: 400 INFO[0007] took 7197 ms for 20000 requests INFO[0007] sent requests : 20000 INFO[0007] received requests : 20000 INFO[0007] received requests_OK : 20000 INFO[0007] throughput (TPS) : 2778 INFO[0007] mean: 17742685 ns, median: 12823960 ns, max: 229250773 ns, min: 951364 ns, p99.9: 143942836 ns INFO[0007] mean: 17 ms, median: 12 ms, max: 229 ms, min: 0 ms, p99.9: 143 ms 122/lzb/test> ./bench -c 50 -n 40000 INFO[0000] total: 40000 concurrency: 50 requests per client: 800 INFO[0060] took 60706 ms for 40000 requests INFO[0060] sent requests : 40000 INFO[0060] received requests : 40000 INFO[0060] received requests_OK : 40000 INFO[0060] throughput (TPS) : 658 INFO[0060] mean: 72863090 ns, median: 13186213 ns, max: 627875408 ns, min: 777760 ns, p99.9: 466958052 ns INFO[0060] mean: 72 ms, median: 13 ms, max: 627 ms, min: 0 ms, p99.9: 466 ms ``` - ##### 并发100 ``` -------------------- 并发为 100 122/lzb/test> ./bench -c 100 -n 10000 INFO[0000] total: 10000 concurrency: 100 requests per client: 100 INFO[0002] took 2446 ms for 10000 requests INFO[0002] sent requests : 10000 INFO[0002] received requests : 10000 INFO[0002] received requests_OK : 10000 INFO[0002] throughput (TPS) : 4088 INFO[0002] mean: 23438591 ns, median: 19675704 ns, max: 133427069 ns, min: 961182 ns, p99.9: 107706982 ns INFO[0002] mean: 23 ms, median: 19 ms, max: 133 ms, min: 0 ms, p99.9: 107 ms 122/lzb/test> ./bench -c 100 -n 20000 INFO[0000] total: 20000 concurrency: 100 requests per client: 200 INFO[0005] took 5919 ms for 20000 requests INFO[0005] sent requests : 20000 INFO[0005] received requests : 20000 INFO[0005] received requests_OK : 20000 INFO[0005] throughput (TPS) : 3378 INFO[0005] mean: 29085037 ns, median: 23538610 ns, max: 313951227 ns, min: 981326 ns, p99.9: 191517186 ns INFO[0005] mean: 29 ms, median: 23 ms, max: 313 ms, min: 0 ms, p99.9: 191 ms 122/lzb/test> ./bench -c 100 -n 40000 INFO[0000] total: 40000 concurrency: 100 requests per client: 400 INFO[0063] took 63539 ms for 40000 requests INFO[0063] sent requests : 40000 INFO[0063] received requests : 40000 INFO[0063] received requests_OK : 40000 INFO[0063] throughput (TPS) : 629 INFO[0063] mean: 157058698 ns, median: 31084441 ns, max: 1529706480 ns, min: 990788 ns, p99.9: 1178707764 ns INFO[0063] mean: 157 ms, median: 31 ms, max: 1529 ms, min: 0 ms, p99.9: 1178 ms ``` ### 5.2 radix 压测 radix与map压测,通过下面压测结果可以发现,radix的性能要优于map,尤其是插入性能,接近两倍的提升,因为map是通过hash表实现的,需要处理碰撞等问题,而radix是通过前缀树实现的,对于词组来说前缀树的查找效率要高于hash表。 ```shell > go test -bench=BenchmarkInsert BenchmarkInsert-16 4102198 378.5 ns/op BenchmarkInsertMap-16 5822322 236.5 ns/op PASS ok test 3.597s > go test -bench=BenchmarkGet BenchmarkGet-16 11889345 121.1 ns/op BenchmarkGetMap-16 9763209 158.8 ns/op PASS ok test 12.258s ``` ```shell > go test -bench=BenchmarkInsert BenchmarkInsert-16 1382823 784.6 ns/op BenchmarkInsertMap-16 3293953 375.9 ns/op PASS ok test 3.714s ``` ### 5.3 网关 压测 - ##### 并发500 第一轮(走Nginx 32301端口)-meta-test-去边车 ``` Label # Samples Average Min Max Std. Dev. Error % Throughput Received KB/sec Sent KB/sec Avg. Bytes checkheath 3330702 88 4 7112 171.9 0.00% 5532.45187 1420.93 3463.19 263 TOTAL 3330702 88 4 7112 171.9 0.00% 5532.45187 1420.93 3463.19 263 ``` - ##### 并发500 第二轮(走Nginx 32301端口)-meta-test-去边车 ``` Label # Samples Average Min Max Std. Dev. Error % Throughput Received KB/sec Sent KB/sec Avg. Bytes checkheath 4468417 66 4 14490 230.99 0.00% 7430.56858 1908.43 4651.36 263 TOTAL 4468417 66 4 14490 230.99 0.00% 7430.56858 1908.43 4651.36 263 ``` - ##### 并发500 (走Nginx 32301端口)-meta-dev对比 ``` Label # Samples Average Min Max Std. Dev. Error % Throughput Received KB/sec Sent KB/sec Avg. Bytes checkheath 338573 871 3 3182 384.53 0.00% 563.11986 143.53 343.7 261 TOTAL 338573 871 3 3182 384.53 0.00% 563.11986 143.53 343.7 261 ``` - ##### 并发500 (不走Nginx 30524端口)-去边车 ``` Label # Samples Average Min Max Std. Dev. Error % Throughput Received KB/sec Sent KB/sec Avg. Bytes checkheath 7377730 39 3 1657 20.99 0.00% 12294.84374 2605.45 7696.28 217 TOTAL 7377730 39 3 1657 20.99 0.00% 12294.84374 2605.45 7696.28 217 ``` - ##### 限流TPS 2000-去边车 ``` Label # Samples Average Min Max Std. Dev. Error % Throughput Received KB/sec Sent KB/sec Avg. Bytes checkheath 1198111 246 6 2224 34.96 0.00% 1996.0732 423 1249.5 217 TOTAL 1198111 246 6 2224 34.96 0.00% 1996.0732 423 1249.5 217 ``` ### 5.4 业务对比 压测 - ##### 并发500-dim-mes-基础数据查询-meta-test-去边车 ``` Label # Samples Average Min Max Std. Dev. Error % Throughput Received KB/sec Sent KB/sec Avg. Bytes dim-mes-基础数据查 139423 2114 1 9728 808.97 0.06% 231.84229 2413.87 331.73 10661.6 TOTAL 139423 2114 1 9728 808.97 0.06% 231.84229 2413.87 331.73 10661.6 ``` - ##### 并发500-dim-mes-基础数据查询-meta-dev ``` Label # Samples Average Min Max Std. Dev. Error % Throughput Received KB/sec Sent KB/sec Avg. Bytes dim-mes-基础数据查 94646 3174 62 17989 1970.89 0.05% 154.1855 1607.86 220.74 10678.4 TOTAL 94646 3174 62 17989 1970.89 0.05% 154.1855 1607.86 220.74 10678.4 ```
Uploading file...
Sidebar
[[_TOC_]]
Edit message:
Cancel