Circuit break를 해결하는 방식은 기존에도 있었으며, 그 중 대표적으로 hystirx라는 라이브러리를 통해서 해결할 수 있었다. (넷플릭스가 개발하였으나 현재는 더 이상 업데이트가 없으며, 기존 기능에 대한 운영만 지원)
그러나 hystrix는 개별 마이크로서비스의 내부 코드에 이를(circuit break 함수) 반영해야만 하는 번거로움이 있으며, JVM기반의 어플리케이션만 지원하므로 go/python 등으로 개발된 마이크로서비스에는 적용할 수 없는 문제가 있다.
Istio는 마이크로서비스 외부의 proxy(envoy)를 이용하여 모든 네트워크를 제어하하는데, curcuit breker도 적용 가능하다. 즉, 마이크로서비스의 코드 변경없이 어떤 마이크로서비스에도 적용할 수 있는 장점이 있다
Demo Applications 배포
apiVersion: apps/v1 kind: Deployment metadata: name: position-simulator spec: selector: matchLabels: app: position-simulator replicas: 1 template: # template for the pods metadata: labels: app: position-simulator spec: containers: - name: position-simulator image: richardchesterwood/istio-fleetman-position-simulator:6 env: - name: SPRING_PROFILES_ACTIVE value: production-microservice command: ["java","-Xmx50m","-jar","webapp.jar"] imagePullPolicy: Always --- apiVersion: apps/v1 kind: Deployment metadata: name: position-tracker spec: selector: matchLabels: app: position-tracker replicas: 1 template: # template for the pods metadata: labels: app: position-tracker spec: containers: - name: position-tracker image: richardchesterwood/istio-fleetman-position-tracker:6 env: - name: SPRING_PROFILES_ACTIVE value: production-microservice command: ["java","-Xmx50m","-jar","webapp.jar"] imagePullPolicy: Always --- apiVersion: apps/v1 kind: Deployment metadata: name: api-gateway spec: selector: matchLabels: app: api-gateway replicas: 1 template: # template for the pods metadata: labels: app: api-gateway spec: containers: - name: api-gateway image: richardchesterwood/istio-fleetman-api-gateway:6 env: - name: SPRING_PROFILES_ACTIVE value: production-microservice command: ["java","-Xmx50m","-jar","webapp.jar"] imagePullPolicy: Always --- apiVersion: apps/v1 kind: Deployment metadata: name: webapp spec: selector: matchLabels: app: webapp replicas: 1 template: # template for the pods metadata: labels: app: webapp version: original spec: containers: - name: webapp image: richardchesterwood/istio-fleetman-webapp-angular:6 env: - name: SPRING_PROFILES_ACTIVE value: production-microservice imagePullPolicy: Always --- apiVersion: apps/v1 kind: Deployment metadata: name: vehicle-telemetry spec: selector: matchLabels: app: vehicle-telemetry replicas: 1 template: # template for the pods metadata: labels: app: vehicle-telemetry spec: containers: - name: vehicle-telemtry image: richardchesterwood/istio-fleetman-vehicle-telemetry:6 env: - name: SPRING_PROFILES_ACTIVE value: production-microservice imagePullPolicy: Always --- apiVersion: apps/v1 kind: Deployment metadata: name: staff-service spec: selector: matchLabels: app: staff-service replicas: 1 template: # template for the pods metadata: labels: app: staff-service version: safe spec: containers: - name: staff-service image: richardchesterwood/istio-fleetman-staff-service:6 env: - name: SPRING_PROFILES_ACTIVE value: production-microservice imagePullPolicy: Always ports: - containerPort: 8080 --- apiVersion: apps/v1 kind: Deployment metadata: name: staff-service-risky-version spec: selector: matchLabels: app: staff-service replicas: 1 template: # template for the pods metadata: labels: app: staff-service version: risky spec: containers: - name: staff-service image: richardchesterwood/istio-fleetman-staff-service:6-bad # 해당 소스가 장애가 가지고 있는 소스이고 Risky로 배포 될 예정이다. env: - name: SPRING_PROFILES_ACTIVE value: production-microservice imagePullPolicy: Always ports: - containerPort: 8080 --- apiVersion: v1 kind: Service metadata: name: fleetman-webapp spec: # This defines which pods are going to be represented by this Service # The service becomes a network endpoint for either other services # or maybe external users to connect to (eg browser) selector: app: webapp ports: - name: http port: 80 type: ClusterIP --- apiVersion: v1 kind: Service metadata: name: fleetman-position-tracker spec: # This defines which pods are going to be represented by this Service # The service becomes a network endpoint for either other services # or maybe external users to connect to (eg browser) selector: app: position-tracker ports: - name: http port: 8080 type: ClusterIP --- apiVersion: v1 kind: Service metadata: name: fleetman-api-gateway spec: selector: app: api-gateway ports: - name: http port: 8080 type: ClusterIP --- apiVersion: v1 kind: Service metadata: name: fleetman-vehicle-telemetry spec: selector: app: vehicle-telemetry ports: - name: http port: 8080 type: ClusterIP --- apiVersion: v1 kind: Service metadata: name: fleetman-staff-service spec: selector: app: staff-service ports: - name: http port: 8080 type: ClusterIP
Gw, Vs 구성
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
name: ingress-gateway-configuration
istio: ingressgateway # use Istio default gateway implementation
- port:
number: 80
name: http
protocol: HTTP
- "kiali-mng-dev.saraminhr.co.kr" # Domain name of the external website
# All traffic routed to the fleetman-webapp service
# No DestinationRule needed as we aren't doing any subsets, load balancing or outlier detection.
kind: VirtualService
apiVersion: networking.istio.io/v1alpha3
name: fleetman-webapp
namespace: default
hosts: # which incoming host are we applying the proxy rules to???
- "kiali-mng-dev.saraminhr.co.kr"
- ingress-gateway-configuration
- route:
- destination:
host: fleetman-webapp
- 확인
문제가 있는 Risky와 같이 배포를 했더니 브라우저에서 확인 해보면 한번씩 500에러가 발생한다.
- curl로 확인
root # curl -w @curl.txt http://kiali-mng-dev.saraminhr.co.kr/api/vehicles/driver/City%20Truck
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}namelookup: 0.001459
connect: 0.002182
appconnect: 0.000000
pretransfer: 0.002226
redirect: 0.000000
starttransfer: 0.019133
total: 0.019139
[SARAMIN] root@sri-mng-kube-dev1:/usr/local/src/istio
04:49 오후
root # curl -w @curl.txt http://kiali-mng-dev.saraminhr.co.kr/api/vehicles/driver/City%20Truck
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}namelookup: 0.001552
connect: 0.002251
appconnect: 0.000000
pretransfer: 0.002260
redirect: 0.000000
starttransfer: 0.019725
total: 0.019842
[SARAMIN] root@sri-mng-kube-dev1:/usr/local/src/istio
04:49 오후
root # curl -w @curl.txt http://kiali-mng-dev.saraminhr.co.kr/api/vehicles/driver/City%20Truck
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}namelookup: 0.001496
connect: 0.002103
appconnect: 0.000000
pretransfer: 0.002477
redirect: 0.000000
starttransfer: 0.022399
total: 0.022466
[SARAMIN] root@sri-mng-kube-dev1:/usr/local/src/istio
04:49 오후
root # curl -w @curl.txt http://kiali-mng-dev.saraminhr.co.kr/api/vehicles/driver/City%20Truck
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/placeholder.png"}namelookup: 0.001412
connect: 0.002050
appconnect: 0.000000
pretransfer: 0.002138
redirect: 0.000000
starttransfer: 1.285805
total: 1.285837
[SARAMIN] root@sri-mng-kube-dev1:/usr/local/src/istio
04:49 오후
root # curl -w @curl.txt http://kiali-mng-dev.saraminhr.co.kr/api/vehicles/driver/City%20Truck
{"timestamp":"2023-11-07T07:49:21.555+0000","status":500,"error":"Internal Server Error","message":"status 502 reading RemoteStaffMicroserviceCalls#getDriverFor(String)","path":"//vehicles/driver/City%20Truck"}namelookup: 0.001339
connect: 0.001931
appconnect: 0.000000
pretransfer: 0.001974
redirect: 0.000000
starttransfer: 5.003001
total: 5.003088
- 한번씩 실패나기도 하면서 지연도 있는것 같다.
- 예거에서도 보면 다른 서비스에서도 4초 이상 지연이 발생했다.
- kiali에서 확인 해보면 Risky 하나로 전체적으로 지연 발생하는 것으로 보인다.
- Circuit Breaker 설정
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
name: circuit-breaker-for-the-entire-default-namespace
host: "fleetman-staff-service.default.svc.cluster.local"
outlierDetection: # Circuit Breakers가 작동하는 기준 설정
consecutive5xxErrors: 2
interval: 10s
baseEjectionTime: 30s
maxEjectionPercent: 100
연속적인 에러가 몇번까지 발생해야 circuit breaker를 동작시킬 것인지 결정
여기서는 연속 2번 오류가 발생하면 circuit breaker 동작 (테스트 환경으로 횟수를 낮췄다.)
interval에서 지정한 시간 내에 consecutiveError 횟수 만큼 에러가 발생하는 경우 circuit breaker 동작
즉, 10초 내에 2번의 연속적인 오류가 발생하면 circuit breaker 동작
차단한 호스트를 얼마 동안 로드밸런서 pool에서 제외할 것인가?
즉, 얼마나 오래 circuit breaker를 해당 호스트에게 적용할지 시간을 결정
네트워크를 차단할 최대 host의 비율. 즉, 최대 몇 %까지 차단할 것인지 설정
현재 구성은 2개의 pod가 있으므로, 100%인 경우 2개 모두 차단이 가능하다
10%인 경우 차단이 불가능해 보이는데(1개가 50%이므로),
envoy에서는 circuit breaker가 발동되었으나,
10%에 해당하지 않아서 차단할 호스트가 없으면
강제적으로 해당 호스트를 차단하도록 설정한다
- 확인
서킷 브레이커가 동작 중이면 번개 표시로 나타남
- curl로 동작 확인
while true; do curl http://kiali-mng-dev.saraminhr.co.kr/api/vehicles/driver/City%20Truck; echo; sleep 0.5; done
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/placeholder.png"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"timestamp":"2023-11-07T08:39:50.949+0000","status":500,"error":"Internal Server Error","message":"status 502 reading RemoteStaffMicroserviceCalls#getDriverFor(String)","path":"//vehicles/driver/City%20Truck"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"timestamp":"2023-11-07T08:39:53.483+0000","status":500,"error":"Internal Server Error","message":"status 502 reading RemoteStaffMicroserviceCalls#getDriverFor(String)","path":"//vehicles/driver/City%20Truck"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
처음에 2번 에러가 나면서 서킷 브레이커가 동작하게 되면서 더이상 에러가 발생 안하는 모습을 볼수 있었다.
- 웹브라우저에서도 지연없이 사진도 잘 불러와지는 것을 확인 할 수 있었다.
- 전체 서비스에 서킷브레이커를 동작 시키고 싶다면 전역 설정이 있다.
'IT > Istio' 카테고리의 다른 글
Mutual TLS(mTLS) with Istio (1) | 2024.01.02 |
Istio Traffic Management 트래픽 통제하기 (0) | 2023.11.10 |