728x90
반응형

Overview

Circuit break를 해결하는 방식은 기존에도 있었으며, 그 중 대표적으로 hystirx라는 라이브러리를 통해서 해결할 수 있었다. (넷플릭스가 개발하였으나 현재는 더 이상 업데이트가 없으며, 기존 기능에 대한 운영만 지원)

그러나 hystrix는 개별 마이크로서비스의 내부 코드에 이를(circuit break 함수) 반영해야만 하는 번거로움이 있으며, JVM기반의 어플리케이션만 지원하므로 go/python 등으로 개발된 마이크로서비스에는 적용할 수 없는 문제가 있다.

Istio는 마이크로서비스 외부의 proxy(envoy)를 이용하여 모든 네트워크를 제어하하는데, curcuit breker도 적용 가능하다. 즉, 마이크로서비스의 코드 변경없이 어떤 마이크로서비스에도 적용할 수 있는 장점이 있다

  1. Demo Applications 배포

     apiVersion: apps/v1
     kind: Deployment
     metadata:
       name: position-simulator
     spec:
       selector:
         matchLabels:
           app: position-simulator
       replicas: 1
       template: # template for the pods
         metadata:
           labels:
             app: position-simulator
         spec:
           containers:
           - name: position-simulator
             image: richardchesterwood/istio-fleetman-position-simulator:6
             env:
             - name: SPRING_PROFILES_ACTIVE
               value: production-microservice
             command: ["java","-Xmx50m","-jar","webapp.jar"]
             imagePullPolicy: Always
     ---
     apiVersion: apps/v1
     kind: Deployment
     metadata:
       name: position-tracker
     spec:
       selector:
         matchLabels:
           app: position-tracker
       replicas: 1
       template: # template for the pods
         metadata:
           labels:
             app: position-tracker
         spec:
           containers:
           - name: position-tracker
             image: richardchesterwood/istio-fleetman-position-tracker:6
             env:
             - name: SPRING_PROFILES_ACTIVE
               value: production-microservice
             command: ["java","-Xmx50m","-jar","webapp.jar"]
             imagePullPolicy: Always
     ---
     apiVersion: apps/v1
     kind: Deployment
     metadata:
       name: api-gateway
     spec:
       selector:
         matchLabels:
           app: api-gateway
       replicas: 1
       template: # template for the pods
         metadata:
           labels:
             app: api-gateway
         spec:
           containers:
           - name: api-gateway
             image: richardchesterwood/istio-fleetman-api-gateway:6
             env:
             - name: SPRING_PROFILES_ACTIVE
               value: production-microservice
             command: ["java","-Xmx50m","-jar","webapp.jar"]
             imagePullPolicy: Always
     ---
     apiVersion: apps/v1
     kind: Deployment
     metadata:
       name: webapp
     spec:
       selector:
         matchLabels:
           app: webapp
       replicas: 1
       template: # template for the pods
         metadata:
           labels:
             app: webapp
             version: original
         spec:
           containers:
           - name: webapp
             image: richardchesterwood/istio-fleetman-webapp-angular:6
             env:
             - name: SPRING_PROFILES_ACTIVE
               value: production-microservice
             imagePullPolicy: Always
     ---
     apiVersion: apps/v1
     kind: Deployment
     metadata:
       name: vehicle-telemetry
     spec:
       selector:
         matchLabels:
           app: vehicle-telemetry
       replicas: 1
       template: # template for the pods
         metadata:
           labels:
             app: vehicle-telemetry
         spec:
           containers:
           - name: vehicle-telemtry
             image: richardchesterwood/istio-fleetman-vehicle-telemetry:6
             env:
             - name: SPRING_PROFILES_ACTIVE
               value: production-microservice
             imagePullPolicy: Always
     ---
     apiVersion: apps/v1
     kind: Deployment
     metadata:
       name: staff-service
     spec:
       selector:
         matchLabels:
           app: staff-service
       replicas: 1
       template: # template for the pods
         metadata:
           labels:
             app: staff-service
             version: safe
         spec:
           containers:
           - name: staff-service
             image: richardchesterwood/istio-fleetman-staff-service:6
             env:
             - name: SPRING_PROFILES_ACTIVE
               value: production-microservice
             imagePullPolicy: Always
             ports:
             - containerPort: 8080
     ---
     apiVersion: apps/v1
     kind: Deployment
     metadata:
       name: staff-service-risky-version
     spec:
       selector:
         matchLabels:
           app: staff-service
       replicas: 1
       template: # template for the pods
         metadata:
           labels:
             app: staff-service
             version: risky
         spec:
           containers:
           - name: staff-service
             image: richardchesterwood/istio-fleetman-staff-service:6-bad    # 해당 소스가 장애가 가지고 있는 소스이고 Risky로 배포 될 예정이다.
             env:
             - name: SPRING_PROFILES_ACTIVE
               value: production-microservice
             imagePullPolicy: Always
             ports:
             - containerPort: 8080
     ---
     apiVersion: v1
     kind: Service
     metadata:
       name: fleetman-webapp
     spec:
       # This defines which pods are going to be represented by this Service
       # The service becomes a network endpoint for either other services
       # or maybe external users to connect to (eg browser)
       selector:
         app: webapp
       ports:
         - name: http
           port: 80
       type: ClusterIP
     ---
     apiVersion: v1
     kind: Service
     metadata:
       name: fleetman-position-tracker
     spec:
       # This defines which pods are going to be represented by this Service
       # The service becomes a network endpoint for either other services
       # or maybe external users to connect to (eg browser)
       selector:
         app: position-tracker
       ports:
         - name: http
           port: 8080
       type: ClusterIP
     ---
     apiVersion: v1
     kind: Service
     metadata:
       name: fleetman-api-gateway
     spec:
       selector:
         app: api-gateway
       ports:
         - name: http
           port: 8080
       type: ClusterIP
     ---
     apiVersion: v1
     kind: Service
     metadata:
       name: fleetman-vehicle-telemetry
     spec:
       selector:
         app: vehicle-telemetry
       ports:
         - name: http
           port: 8080
       type: ClusterIP
     ---
     apiVersion: v1
     kind: Service
     metadata:
       name: fleetman-staff-service
     spec:
       selector:
         app: staff-service
       ports:
         - name: http
           port: 8080
       type: ClusterIP

  2. Gw, Vs 구성

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: ingress-gateway-configuration
spec:
  selector:
    istio: ingressgateway # use Istio default gateway implementation
  servers:
  - port:
      number: 80
      name: http
      protocol: HTTP
    hosts:
    - "kiali-mng-dev.saraminhr.co.kr"   # Domain name of the external website
---
# All traffic routed to the fleetman-webapp service
# No DestinationRule needed as we aren't doing any subsets, load balancing or outlier detection.
kind: VirtualService
apiVersion: networking.istio.io/v1alpha3
metadata:
  name: fleetman-webapp
  namespace: default
spec:
  hosts:      # which incoming host are we applying the proxy rules to???
    - "kiali-mng-dev.saraminhr.co.kr"
  gateways:
    - ingress-gateway-configuration
  http:
    - route:
      - destination:
          host: fleetman-webapp

  1. 확인

문제가 있는 Risky와 같이 배포를 했더니 브라우저에서 확인 해보면 한번씩 500에러가 발생한다.

  1. curl로 확인
root # curl -w @curl.txt http://kiali-mng-dev.saraminhr.co.kr/api/vehicles/driver/City%20Truck
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}namelookup:    0.001459
connect:       0.002182
appconnect:    0.000000
pretransfer:   0.002226
redirect:      0.000000
starttransfer: 0.019133
--------------------------------------
total:         0.019139
[SARAMIN] root@sri-mng-kube-dev1:/usr/local/src/istio
04:49 오후
root # curl -w @curl.txt http://kiali-mng-dev.saraminhr.co.kr/api/vehicles/driver/City%20Truck
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}namelookup:    0.001552
connect:       0.002251
appconnect:    0.000000
pretransfer:   0.002260
redirect:      0.000000
starttransfer: 0.019725
--------------------------------------
total:         0.019842
[SARAMIN] root@sri-mng-kube-dev1:/usr/local/src/istio
04:49 오후
root # curl -w @curl.txt http://kiali-mng-dev.saraminhr.co.kr/api/vehicles/driver/City%20Truck
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}namelookup:    0.001496
connect:       0.002103
appconnect:    0.000000
pretransfer:   0.002477
redirect:      0.000000
starttransfer: 0.022399
--------------------------------------
total:         0.022466
[SARAMIN] root@sri-mng-kube-dev1:/usr/local/src/istio
04:49 오후
root # curl -w @curl.txt http://kiali-mng-dev.saraminhr.co.kr/api/vehicles/driver/City%20Truck
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/placeholder.png"}namelookup:    0.001412
connect:       0.002050
appconnect:    0.000000
pretransfer:   0.002138
redirect:      0.000000
starttransfer: 1.285805
--------------------------------------
total:         1.285837
[SARAMIN] root@sri-mng-kube-dev1:/usr/local/src/istio
04:49 오후
root # curl -w @curl.txt http://kiali-mng-dev.saraminhr.co.kr/api/vehicles/driver/City%20Truck
{"timestamp":"2023-11-07T07:49:21.555+0000","status":500,"error":"Internal Server Error","message":"status 502 reading RemoteStaffMicroserviceCalls#getDriverFor(String)","path":"//vehicles/driver/City%20Truck"}namelookup:    0.001339
connect:       0.001931
appconnect:    0.000000
pretransfer:   0.001974
redirect:      0.000000
starttransfer: 5.003001
--------------------------------------
total:         5.003088

  • 한번씩 실패나기도 하면서 지연도 있는것 같다.
  • 예거에서도 보면 다른 서비스에서도 4초 이상 지연이 발생했다.
  • kiali에서 확인 해보면 Risky 하나로 전체적으로 지연 발생하는 것으로 보인다.
  1. Circuit Breaker 설정
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: circuit-breaker-for-the-entire-default-namespace
spec:
  host: "fleetman-staff-service.default.svc.cluster.local"
  trafficPolicy:
    outlierDetection: # Circuit Breakers가 작동하는 기준 설정
      consecutive5xxErrors: 2
      interval: 10s
      baseEjectionTime: 30s
      maxEjectionPercent: 100

[consecutiveErrors]
연속적인 에러가 몇번까지 발생해야 circuit breaker를 동작시킬 것인지 결정
여기서는 연속 2번 오류가 발생하면 circuit breaker 동작 (테스트 환경으로 횟수를 낮췄다.)

[interval]
interval에서 지정한 시간 내에 consecutiveError 횟수 만큼 에러가 발생하는 경우 circuit breaker 동작
즉, 10초 내에 2번의 연속적인 오류가 발생하면 circuit breaker 동작

[baseEjectionTime]
차단한 호스트를 얼마 동안 로드밸런서 pool에서 제외할 것인가?
즉, 얼마나 오래 circuit breaker를 해당 호스트에게 적용할지 시간을 결정

[maxEjectionPercent]
네트워크를 차단할 최대 host의 비율. 즉, 최대 몇 %까지 차단할 것인지 설정
현재 구성은 2개의 pod가 있으므로, 100%인 경우 2개 모두 차단이 가능하다
10%인 경우 차단이 불가능해 보이는데(1개가 50%이므로),
envoy에서는 circuit breaker가 발동되었으나,
10%에 해당하지 않아서 차단할 호스트가 없으면
강제적으로 해당 호스트를 차단하도록 설정한다

  1. 확인

서킷 브레이커가 동작 중이면 번개 표시로 나타남

  • curl로 동작 확인
while true; do curl http://kiali-mng-dev.saraminhr.co.kr/api/vehicles/driver/City%20Truck; echo; sleep 0.5; done
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/placeholder.png"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"timestamp":"2023-11-07T08:39:50.949+0000","status":500,"error":"Internal Server Error","message":"status 502 reading RemoteStaffMicroserviceCalls#getDriverFor(String)","path":"//vehicles/driver/City%20Truck"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"timestamp":"2023-11-07T08:39:53.483+0000","status":500,"error":"Internal Server Error","message":"status 502 reading RemoteStaffMicroserviceCalls#getDriverFor(String)","path":"//vehicles/driver/City%20Truck"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
{"name":"Pam Parry","photo":"https://rac-istio-course-images.s3.amazonaws.com/1.jpg"}
^C

처음에 2번 에러가 나면서 서킷 브레이커가 동작하게 되면서 더이상 에러가 발생 안하는 모습을 볼수 있었다.

  • 웹브라우저에서도 지연없이 사진도 잘 불러와지는 것을 확인 할 수 있었다.
  • 전체 서비스에 서킷브레이커를 동작 시키고 싶다면 전역 설정이 있다.
728x90
300x250

'IT > Istio' 카테고리의 다른 글

Mutual TLS(mTLS) with Istio  (1) 2024.01.02
Istio Traffic Management 트래픽 통제하기  (0) 2023.11.10

+ Recent posts