Friday, August 30, 2019

Circuit Breaker and Bulkhead on Microservice

(this is a very old post that I forgot to publish)

I’m thinking how we can leverage circuit breaker in our services.  Let’s say we’ve the following services

UI -> svc A -> svc B

Putting circuit breaker on svc A would make it fail fast and a chance for svc B to recover.

And, say, we have a pool of svc B and we leverage some service discovery and “find” a svc B when svc A starts up (probably don’t want to do that every time svc B is needed)

If that instance of svc B fails, we shouldn’t just trip the breaker but to find another svc B from the service registry.  Now, the circuit breaker should really be implemented in the service registry (to mark that as opened) and we’d need some way to close (half-open) the breaker for that instance.

Now, instead of having a direct connection to svc B, we put a loadbalancer before svc B,

UI -> svc A -> LB -> (svc B)xN

Now, putting a circuit break on svc A actually doesn’t make (too) much sense.  If a couple of svc B got very busy and timed out, svc A might open the circuit breaker while some of the svc B are actually fine.  And if just one of them are slowed, the circuit break on svc A might never open and will suffer intermittent performance issue.   We could instead put the circuit breaker on the LB and to svc A, unless the LB itself or all svc B are dead, it won’t trip the breaker.  However, timeout would be different.  Since LB will trip the breaker for that svc B instance, tripping the breaker on svc A would just failing requests for no reason (assuming there’re more svc B available.)

Using mesos and marathon, we can do either service discovery (the consuming service look for production service directly) or loadbalance (it has haproxy integration and if using consul, it has nginx integration too (and I read that it’s quicker to change the config)).  We’ll have to make a decision and that’d affect the docker/mesos/marathon exercise I’m working on (I’ll make sure the scenario we pick worked)

And a more general questions, should we implement the circuit breaker per service (server) or per api (url)?  Don’t know what hystrix has implemented, but I’d think per service would be good enough.  i.e any fail API could trip the breaker.

As for bulkhead, basically microservice is a bulkhead pattern on service layer.  And nodejs, accidentally, on the process level.  I couldn’t find any documentation, but the one process that nodejs has appears to bind to one processor.  And the example the book keep using , self-denial attack, is something we can prepare ahead of time and I really doubt if our customer will have that use case.  But if there’s anything to do, we might have to do it with our orchestrating framework (I’d recommend mesos now as it’s the most matured framework, most other solutions are built on top of mesos or it’s new, like Google Kubernetes or ClusterHQ Flocker.

BTW, it appears the deployment is well though on Marathon.  https://mesosphere.github.io/marathon/docs/deployments.html  should be really fun to try out.

also, if we are using the same lb for all services, it will become a hotspot and we might want to have a lb for each, or a few services. 

with all these services and lb, adding monitoring and logging, it's vital that we have the orchestrating piece done,  installer just won't cut it.  




No comments:

Post a Comment