Minimalist Mesh for Micro Services

So the story goes like this:
  • You have container workloads running in production (nomad).
  • You are on a bare metal environment.
  • Multiple container networking software solutions are in use in different data-centers - contiv and flannel.
  • The perimeter is secured for the cluster (firewalls, WAF).
  • Service to service communication within the cluster is non-secure (the journey started before service mesh concepts was in place).
  • Customer insists that service to service communication is over https within the cluster when it crosses machine boundaries within the perimeter too.
  • Incremental approach for migration service by service is mandatory.

Options

Introduce a full-fledged service mesh

  • A complete networking and software stack upgrade is impossible without a downtime.

Replace the existing container networking with one that supports encryption

  • Do we have one such solution which is usable in production?

Solution

“Introduce a light weight sidecar proxy that can do this job”

Details

  • Nginx as a sidecar.
  • We added it as part of the base images we supply and with the single flag in Dockerfile, the sidecar is launched.
  • For launching the sidecar we extended the configuration file of containerpilot which we were already using.
  • The certificates are auto created during the launch of the container. How did we achieve this without a control plane? We off-loaded that into the startup scripts of the container itself to generate the certificate.
  • Well, the next question is, how can we use the same certification authority across the same cluster? The answer is to inject the intermediate CA and Key into the container during startup sequence using Vault and use that intermediate key to sign the dynamic certificate that is created.
  • The sidecar used the existing variables that specifies the service ports to run a reverse proxy to send the traffic to the internal application process inside container.
  • The containerpilot configuration also switches it’s behaviour to register the service using the new TLS port instead of the non-TLS port that it was doing before.

Overall, we got our first service up in production in a week running TLS without the application team doing anything, other than setting a single variable in their configuration file.

Comments

Popular Posts