Thursday, October 24, 2019

Setting up Spring Boot with SpringDoc OpenApi UI

to add springdoc openapi ui to a project is very easy, just add

<dependency>
  <groupId>org.springdoc</groupId>
  <artifactId>springdoc-openapi-ui</artifactId>
  <version>last-release-version</version>
</dependency>

to your project.  you can than access the swagger UI with /swagger-ui.html.

my spring boot app also serve the static UI with context path / (root) and are packaged under classpath:/statics.  This is one of the default path Spring autoconfig provides.  The problem is with springdoc-openapi-ui, the static page won't work anymore.  Turns out, it because springdoc-openapi-ui has a SwaggerConfig configuration which will add a resource handler, like this.

if(swaggerPath.contains("/") uiRootPath = swaggerPath.substring(0, swaggerPath.lastIndexOf('/'));
registry.addResourceHandler( uiRootPath + "/**").addResourceLocations(WEB_JARS_PREFIX_URL + "/").resourceChain(false);

defaul swaggerPath is "/swagger-ui.html", thus uiRootPath resolves to an empty string.

This prevents the resource handler spring autoconfig added from working as both are binded to /**.

To resolve this, we just have to set swaggerPath to something with more than one level.  e.g.

springdoc.swagger-ui.path=/swagger/apidoc.html

This will add the /webjars/ directory to /swager/** instead.

Tuesday, September 24, 2019

Setup HTTP Git Server using Nginx on Docker

I was working on the kubernates executor for nextflow which appears to only able to pull the pipeline from GitHub or Bitbucket (which turns out to be not true), however, the pipeline script is proprietary and it's company's policy to house the source code on internal GHE.  Thus, I've decided to setup up a Git server just as described here, https://www.howtoforge.com/tutorial/ubuntu-git-server-installation/

(note git's own website also talked about the use of the `git-http-backend`, https://git-scm.com/book/en/v2/Git-on-the-Server-Smart-HTTP.)

Also, I'm running the nextflow script in a pod, so the git server needs to run in a pod too.

Here's the dockerfile I used to build the docker image.



The nginx config is mostly the same as the one in the howtoforge tutorial.

the run.sh file (since there's no systemd on docker, we have to launch nginx and fcgiwrap in the background


since the purpose for me is to launch nextflow, I've started fcgiwrap and nginx in the background and then launch nextflow. If you'd like to run a pod serving the files over git, you can launch run.sh as CMD in the Dockerfile and start nginx with daemon off.

Friday, August 30, 2019

Circuit Breaker and Bulkhead on Microservice

(this is a very old post that I forgot to publish)

I’m thinking how we can leverage circuit breaker in our services.  Let’s say we’ve the following services

UI -> svc A -> svc B

Putting circuit breaker on svc A would make it fail fast and a chance for svc B to recover.

And, say, we have a pool of svc B and we leverage some service discovery and “find” a svc B when svc A starts up (probably don’t want to do that every time svc B is needed)

If that instance of svc B fails, we shouldn’t just trip the breaker but to find another svc B from the service registry.  Now, the circuit breaker should really be implemented in the service registry (to mark that as opened) and we’d need some way to close (half-open) the breaker for that instance.

Now, instead of having a direct connection to svc B, we put a loadbalancer before svc B,

UI -> svc A -> LB -> (svc B)xN

Now, putting a circuit break on svc A actually doesn’t make (too) much sense.  If a couple of svc B got very busy and timed out, svc A might open the circuit breaker while some of the svc B are actually fine.  And if just one of them are slowed, the circuit break on svc A might never open and will suffer intermittent performance issue.   We could instead put the circuit breaker on the LB and to svc A, unless the LB itself or all svc B are dead, it won’t trip the breaker.  However, timeout would be different.  Since LB will trip the breaker for that svc B instance, tripping the breaker on svc A would just failing requests for no reason (assuming there’re more svc B available.)

Using mesos and marathon, we can do either service discovery (the consuming service look for production service directly) or loadbalance (it has haproxy integration and if using consul, it has nginx integration too (and I read that it’s quicker to change the config)).  We’ll have to make a decision and that’d affect the docker/mesos/marathon exercise I’m working on (I’ll make sure the scenario we pick worked)

And a more general questions, should we implement the circuit breaker per service (server) or per api (url)?  Don’t know what hystrix has implemented, but I’d think per service would be good enough.  i.e any fail API could trip the breaker.

As for bulkhead, basically microservice is a bulkhead pattern on service layer.  And nodejs, accidentally, on the process level.  I couldn’t find any documentation, but the one process that nodejs has appears to bind to one processor.  And the example the book keep using , self-denial attack, is something we can prepare ahead of time and I really doubt if our customer will have that use case.  But if there’s anything to do, we might have to do it with our orchestrating framework (I’d recommend mesos now as it’s the most matured framework, most other solutions are built on top of mesos or it’s new, like Google Kubernetes or ClusterHQ Flocker.

BTW, it appears the deployment is well though on Marathon.  https://mesosphere.github.io/marathon/docs/deployments.html  should be really fun to try out.

also, if we are using the same lb for all services, it will become a hotspot and we might want to have a lb for each, or a few services. 

with all these services and lb, adding monitoring and logging, it's vital that we have the orchestrating piece done,  installer just won't cut it.  




gmail's plus sign trick

I am working on a test for the user registration on my website.  problem is I need to create a new account every time and that ties to the user's email address (gmail).  I couldn't create a new email account every time I ran the test.  Not only will I create a lot of email accounts (even if I can automate that) but also enabling API for each new account would have a lot of work.

Turns out there's a plus sign trick that can have a seemingly different email address send to the same email account.  someone@gmail.com and someone+12345@gmail.com will deliver to the same someone@gmail.com inbox!

With that, I can generate gmail addresses and search the inbox by "to: someone+12345@gmail.com" to retrieve the email for the test.

Getting HTML from Gmail body using GMail API and protractor

Part of the automated tests we are building involves checking email, verify its contents and click on a link to continue the registration process.  To do that, i setup a new gmail account, and follow Google's Quickstart instruction to enable GMail API.  Well, all you really have to do is to click the "ENABLE THE GMAIL API" button on the page. But before you do that, make sure you have selected the correct google account on the upper right corner.

Now that we have enabled the API and downloaded the credentials json file.  We can follow the example on the quickstart instruction to authenticate to gmail API.  However, the getNewToken will simply display a URL on the console and you are supposed to manually go to the URL on a browser and copy the code back to the program.  But we're writing an automated test with protractor, let's automate that too!

It's mostly the same as the example, except that when token.json is not found, it'll open a new browser window and grab the code automatically.  Also, note that the code uses an "AppPo" class, it's just a simple utility class I use to check if the button exists before clicking it.

The gmail API's list method returns a list of emails, only with the message id and the thread id, we'll have to call the get method to retrieve the content. The structure it returned is too complicated to my taste, and after all, I just want to grab the HTML content, so I decided to just get the raw content and reconstruct the HTML. Now you have it, we can now call searchEmail and pass it's returns to getHtmlFromEmailBody to get the HTML content for all emails returned.  With cheerio, we can easily find all links like this.

const $ = gmailUtils.getHtmlFromEmailBody(...);
$('a').each( (i,a) => console.log( $(a).attr('href') ) );

Wednesday, July 10, 2019

Snakemake

I'm doing some evaluation of different workflow languages, turns out I like Snakemake a lot!

I've converted https://gencore.bio.nyu.edu/variant-calling-pipeline/ to use Snakemake, https://github.com/arthurtsang/variant-calling-pipeline

What I like about it,


  1. yaml syntax is clean and easy to follow.
  2. wildcard is very useful and easy to understand.
    • If I have a rule with "{sample}.fastq" for the output, and another rule to have "A.fastq, B.fastq, C.fastq" as input, the first rule would be executed once for A, B and C.
  3. expand is also very helpful.
    • It tripped me up a bit initially the wildcard in the expand function cannot use the variable in the global.  i.e.  if we have SAMPLES=["A", "B", "C"] defined, expand( "/some/directory/{SAMPLES}.fastq") won't work.  the correct syntax is expand( "/some/directory/{samples}.fastq", samples = SAMPLES ).  
  4. integration with Conda.  
    • unfortunately, it pollutes the code a bit by having a conda directive in every rule, but it works out really nice and easy.
    • i haven't tried, but it should be possible to build custom channel hosting somewhere in the infrastructure for private binaries distribution.  
  5. using the filename to build DAG
    • rules are connected using the output of a rule to an input of another rule.  It's kinda like how spring is finding which bean to create first.  
    • the limitation is everything is file based.  i.e. if a step doesn't really need to generate a file, we'll have to touch an empty state file for snakemake to build the DAG.
    • Also, there is no support for Linux style pipe.  You can't really pipe the result of a rule to another.  

EPIPE write EPIPE error when using Protractor with control flow turned off

I was hit with an error `EPIPE write EPIPE` when running protractor with control flow turned off.  https://github.com/angular/protractor/issues/4294.  It is caused by misusing await somewhere in the code.
tsline/Intellij is pretty good at warning developer that a promise returned has been ignored.  However, one that it didn’t catch and caught me as a surprise is the `ElementArrayFinder` returned by `element.all()`.  It’s not a promise, but if you try to use it directly, like `element.all().find()` or my favorite `element.all().getText()`, you’ll have about 0.1% of the time running into the EPIPE error.
Unfortunately, the test I’m working on, calls that for about 8000 times…  so it always fails after an hr of running.
Anyway, from the github issue, it appears to be a bug in the selenium driver and it’s fixed in the latest 4.0.0-beta driver which protractor is not using.  The solution is to `await` on the `ElementArrayFinder` too, it’ll return `ElementFinder[]` which you can loop through.