728x90
반응형
Overview
Circuit break를 해결하는 방식은 기존에도 있었으며, 그 중 대표적으로 hystirx라는 라이브러리를 통해서 해결할 수 있었다. (넷플릭스가 개발하였으나 현재는 더 이상 업데이트가 없으며, 기존 기능에 대한 운영만 지원)
그러나 hystrix는 개별 마이크로서비스의 내부 코드에 이를(circuit break 함수) 반영해야만 하는 번거로움이 있으며, JVM기반의 어플리케이션만 지원하므로 go/python 등으로 개발된 마이크로서비스에는 적용할 수 없는 문제가 있다.
Istio는 마이크로서비스 외부의 proxy(envoy)를 이용하여 모든 네트워크를 제어하하는데, curcuit breker도 적용 가능하다. 즉, 마이크로서비스의 코드 변경없이 어떤 마이크로서비스에도 적용할 수 있는 장점이 있다
Demo Applications 배포
apiVersion: apps/v1 kind: Deployment metadata: name: position-simulator spec: selector: matchLabels: app: position-simulator replicas: 1 template: # template for the pods metadata: labels: app: position-simulator spec: containers: - name: position-simulator image: richardchesterwood/istio-fleetman-position-simulator:6 env: - name: SPRING_PROFILES_ACTIVE value: production-microservice command: ["java","-Xmx50m","-jar","webapp.jar"] imagePullPolicy: Always --- apiVersion: apps/v1 kind: Deployment metadata: name: position-tracker spec: selector: matchLabels: app: position-tracker replicas: 1 template: # template for the pods metadata: labels: app: position-tracker spec: containers: - name: position-tracker image: richardchesterwood/istio-fleetman-position-tracker:6 env: - name: SPRING_PROFILES_ACTIVE value: production-microservice command: ["java","-Xmx50m","-jar","webapp.jar"] imagePullPolicy: Always --- apiVersion: apps/v1 kind: Deployment metadata: name: api-gateway spec: selector: matchLabels: app: api-gateway replicas: 1 template: # template for the pods metadata: labels: app: api-gateway spec: containers: - name: api-gateway image: richardchesterwood/istio-fleetman-api-gateway:6 env: - name: SPRING_PROFILES_ACTIVE value: production-microservice command: ["java","-Xmx50m","-jar","webapp.jar"] imagePullPolicy: Always --- apiVersion: apps/v1 kind: Deployment metadata: name: webapp spec: selector: matchLabels: app: webapp replicas: 1 template: # template for the pods metadata: labels: app: webapp version: original spec: containers: - name: webapp image: richardchesterwood/istio-fleetman-webapp-angular:6 env: - name: SPRING_PROFILES_ACTIVE value: production-microservice imagePullPolicy: Always --- apiVersion: apps/v1 kind: Deployment metadata: name: vehicle-telemetry spec: selector: matchLabels: app: vehicle-telemetry replicas: 1 template: # template for the pods metadata: labels: app: vehicle-telemetry spec: containers: - name: vehicle-telemtry image: richardchesterwood/istio-fleetman-vehicle-telemetry:6 env: - name: SPRING_PROFILES_ACTIVE value: production-microservice imagePullPolicy: Always --- apiVersion: apps/v1 kind: Deployment metadata: name: staff-service spec: selector: matchLabels: app: staff-service replicas: 1 template: # template for the pods metadata: labels: app: staff-service version: safe spec: containers: - name: staff-service image: richardchesterwood/istio-fleetman-staff-service:6 env: - name: SPRING_PROFILES_ACTIVE value: production-microservice imagePullPolicy: Always ports: - containerPort: 8080 --- apiVersion: apps/v1 kind: Deployment metadata: name: staff-service-risky-version spec: selector: matchLabels: app: staff-service replicas: 1 template: # template for the pods metadata: labels: app: staff-service version: risky spec: containers: - name: staff-service image: richardchesterwood/istio-fleetman-staff-service:6-bad # 해당 소스가 장애가 가지고 있는 소스이고 Risky로 배포 될 예정이다. env: - name: SPRING_PROFILES_ACTIVE value: production-microservice imagePullPolicy: Always ports: - containerPort: 8080 --- apiVersion: v1 kind: Service metadata: name: fleetman-webapp spec: # This defines which pods are going to be represented by this Service # The service becomes a network endpoint for either other services # or maybe external users to connect to (eg browser) selector: app: webapp ports: - name: http port: 80 type: ClusterIP --- apiVersion: v1 kind: Service metadata: name: fleetman-position-tracker spec: # This defines which pods are going to be represented by this Service # The service becomes a network endpoint for either other services # or maybe external users to connect to (eg browser) selector: app: position-tracker ports: - name: http port: 8080 type: ClusterIP --- apiVersion: v1 kind: Service metadata: name: fleetman-api-gateway spec: selector: app: api-gateway ports: - name: http port: 8080 type: ClusterIP --- apiVersion: v1 kind: Service metadata: name: fleetman-vehicle-telemetry spec: selector: app: vehicle-telemetry ports: - name: http port: 8080 type: ClusterIP --- apiVersion: v1 kind: Service metadata: name: fleetman-staff-service spec: selector: app: staff-service ports: - name: http port: 8080 type: ClusterIP
Gw, Vs 구성
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: ingress-gateway-configuration
spec:
selector:
istio: ingressgateway # use Istio default gateway implementation
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "kiali-mng-dev.saraminhr.co.kr" # Domain name of the external website
---
# All traffic routed to the fleetman-webapp service
# No DestinationRule needed as we aren't doing any subsets, load balancing or outlier detection.
kind: VirtualService
apiVersion: networking.istio.io/v1alpha3
metadata:
name: fleetman-webapp
namespace: default
spec:
hosts: # which incoming host are we applying the proxy rules to???
- "kiali-mng-dev.saraminhr.co.kr"
gateways:
- ingress-gateway-configuration
http:
- route:
- destination:
host: fleetman-webapp
- 확인
문제가 있는 Risky와 같이 배포를 했더니 브라우저에서 확인 해보면 한번씩 500에러가 발생한다.
- curl로 확인
root # curl -w @curl.txt http://kiali-mng-dev.saraminhr.co.kr/api/vehicles/driver/City%20Truck
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}namelookup: 0.001459
connect: 0.002182
appconnect: 0.000000
pretransfer: 0.002226
redirect: 0.000000
starttransfer: 0.019133
--------------------------------------
total: 0.019139
[SARAMIN] root@sri-mng-kube-dev1:/usr/local/src/istio
04:49 오후
root # curl -w @curl.txt http://kiali-mng-dev.saraminhr.co.kr/api/vehicles/driver/City%20Truck
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}namelookup: 0.001552
connect: 0.002251
appconnect: 0.000000
pretransfer: 0.002260
redirect: 0.000000
starttransfer: 0.019725
--------------------------------------
total: 0.019842
[SARAMIN] root@sri-mng-kube-dev1:/usr/local/src/istio
04:49 오후
root # curl -w @curl.txt http://kiali-mng-dev.saraminhr.co.kr/api/vehicles/driver/City%20Truck
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}namelookup: 0.001496
connect: 0.002103
appconnect: 0.000000
pretransfer: 0.002477
redirect: 0.000000
starttransfer: 0.022399
--------------------------------------
total: 0.022466
[SARAMIN] root@sri-mng-kube-dev1:/usr/local/src/istio
04:49 오후
root # curl -w @curl.txt http://kiali-mng-dev.saraminhr.co.kr/api/vehicles/driver/City%20Truck
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/placeholder.png"}namelookup: 0.001412
connect: 0.002050
appconnect: 0.000000
pretransfer: 0.002138
redirect: 0.000000
starttransfer: 1.285805
--------------------------------------
total: 1.285837
[SARAMIN] root@sri-mng-kube-dev1:/usr/local/src/istio
04:49 오후
root # curl -w @curl.txt http://kiali-mng-dev.saraminhr.co.kr/api/vehicles/driver/City%20Truck
{"timestamp":"2023-11-07T07:49:21.555+0000","status":500,"error":"Internal Server Error","message":"status 502 reading RemoteStaffMicroserviceCalls#getDriverFor(String)","path":"//vehicles/driver/City%20Truck"}namelookup: 0.001339
connect: 0.001931
appconnect: 0.000000
pretransfer: 0.001974
redirect: 0.000000
starttransfer: 5.003001
--------------------------------------
total: 5.003088
- 한번씩 실패나기도 하면서 지연도 있는것 같다.
- 예거에서도 보면 다른 서비스에서도 4초 이상 지연이 발생했다.
- kiali에서 확인 해보면 Risky 하나로 전체적으로 지연 발생하는 것으로 보인다.
- Circuit Breaker 설정
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: circuit-breaker-for-the-entire-default-namespace
spec:
host: "fleetman-staff-service.default.svc.cluster.local"
trafficPolicy:
outlierDetection: # Circuit Breakers가 작동하는 기준 설정
consecutive5xxErrors: 2
interval: 10s
baseEjectionTime: 30s
maxEjectionPercent: 100
[consecutiveErrors]
연속적인 에러가 몇번까지 발생해야 circuit breaker를 동작시킬 것인지 결정
여기서는 연속 2번 오류가 발생하면 circuit breaker 동작 (테스트 환경으로 횟수를 낮췄다.)
[interval]
interval에서 지정한 시간 내에 consecutiveError 횟수 만큼 에러가 발생하는 경우 circuit breaker 동작
즉, 10초 내에 2번의 연속적인 오류가 발생하면 circuit breaker 동작
[baseEjectionTime]
차단한 호스트를 얼마 동안 로드밸런서 pool에서 제외할 것인가?
즉, 얼마나 오래 circuit breaker를 해당 호스트에게 적용할지 시간을 결정
[maxEjectionPercent]
네트워크를 차단할 최대 host의 비율. 즉, 최대 몇 %까지 차단할 것인지 설정
현재 구성은 2개의 pod가 있으므로, 100%인 경우 2개 모두 차단이 가능하다
10%인 경우 차단이 불가능해 보이는데(1개가 50%이므로),
envoy에서는 circuit breaker가 발동되었으나,
10%에 해당하지 않아서 차단할 호스트가 없으면
강제적으로 해당 호스트를 차단하도록 설정한다
- 확인
서킷 브레이커가 동작 중이면 번개 표시로 나타남
- curl로 동작 확인
while true; do curl http://kiali-mng-dev.saraminhr.co.kr/api/vehicles/driver/City%20Truck; echo; sleep 0.5; done
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/placeholder.png"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"timestamp":"2023-11-07T08:39:50.949+0000","status":500,"error":"Internal Server Error","message":"status 502 reading RemoteStaffMicroserviceCalls#getDriverFor(String)","path":"//vehicles/driver/City%20Truck"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"timestamp":"2023-11-07T08:39:53.483+0000","status":500,"error":"Internal Server Error","message":"status 502 reading RemoteStaffMicroserviceCalls#getDriverFor(String)","path":"//vehicles/driver/City%20Truck"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
^C
처음에 2번 에러가 나면서 서킷 브레이커가 동작하게 되면서 더이상 에러가 발생 안하는 모습을 볼수 있었다.
- 웹브라우저에서도 지연없이 사진도 잘 불러와지는 것을 확인 할 수 있었다.
- 전체 서비스에 서킷브레이커를 동작 시키고 싶다면 전역 설정이 있다.
728x90
300x250
'IT > Istio' 카테고리의 다른 글
Mutual TLS(mTLS) with Istio (1) | 2024.01.02 |
---|---|
Istio Traffic Management 트래픽 통제하기 (0) | 2023.11.10 |