With our next release ofHouston, we’re moving our proxy technology from NGINX to Envoy . We’ve followed Envoy closely since before the first line of code was written. A number of us worked at Twitter, some of us alongside Matt Klein on the team that built TFE, Twitter’s edge proxy. We knew that Lyft was planning to build an open source proxy modeled loosely on TFE, and we were hungry for it. Unfortunately it wasn’t ready, so we built on NGINX instead. We were excited to see Envoy’s initial feature set, which incorporated a number of ideas inspired by operating microservices at scale. We’re even more excited to see how quickly its community has formed and its technology has matured. The features it provides now let us deliver capabilities to our customers far more quickly than we could on NGINX, and the roadmap is extremely exciting.
Picking a Proxy
When we started Turbine Labs, we knew that a load-balancing proxy was going to be a critical component of our infrastructure. This was in the very long ago (Autumn 2015), and the proxy landscape wasn’t the vibrant space we see today. We chose NGINX because it was lightweight, production-tested, open-source, (relatively) easy to extend, and had a thriving community of users.
We understood we’d have to do a lot of additional work to build a fully-functional traffic management solution. Service discovery, stats management, and finer grained load balancing are critical features of modern infrastructure. We put a lot of work into wrapping these features around an NGINX proxy, but still have a lot to do. Envoy lets us dramatically accelerate the time to implement some incredibly useful features (such as gRPC, native tracing, and traffic shadowing), while providing similar (or better) performance, stability, and community benefits.
Adopting any new technology requires working through disqualifiers. Because we deploy proxies on-premise, we need not only to make ourselves comfortable with our proxy, but also to anticipate the questions our customers will ask. For open-source projects these questions generally fall into the following categories:
- When it fails, how does it fail?
- Is it easy to get help?
- Is it easy to make changes?
- How much does it cost?
We’ve been closely watching Envoy’s progress, and are amazed at how quickly it has matured. It’s being run in production at companies handling tons of traffic in a wide range of configurations. It’s now a CNCF project, meaning governance is transparent and open. Contributors come from a broad set of companies, and we’ve been able to contribute substantively and directly to the Envoy project, instead of writing our own bespoke NGINX modules.
Cost is an interesting consideration for proxies. Assidecars become a more widespread approach to managing traffic, the footprint of the proxy becomes a larger concern. While many customers will continue to run centralized pools of load balancers, we wanted a proxy that would gracefully support a sidecar deployment model. Envoy is written in modern C++11 which enables it to run with a very small memory footprint, significantly reducing the burden of a sidecar deployment compared to proxies that depend on heavier runtimes.
Benefits for Vendors
Large changes to one’s technology stack should always be approached with caution. We didn’t take the decision to move to Envoy lightly, but the benefits we gain, and the benefits we can pass on to our customers are dramatic.
From the beginning, Envoy was built to be manageable at scale. We’ve put a lot of work into making our NGINX-based proxy manageable, but that configuration interface isn’t an API we can readily expose to other tools. The Envoy data plane API provides an open standard for centralized management of a large fleet of Envoys. Instead of copying files around, we can provide a central, open point of control.
NGINX is an extremely successful, stable open-source project. But its configuration files and module ecosystem have a large surface area, and a large base of existing users to support. Contributing to core NGINX is challenging, which in many cases leads to writing custom modules or lua scripts in order to extend its functionality. Envoy has a much narrower focus, a more modern language for development, and a larger appetite to support changes to the core proxy. Instead of writing a pile of Envoy filters and shipping a custom binary, we’ve instead contributed more than thirty commits to Envoy over the last few months, including major features like the OSX build , subset load balancing and detailed upstream logging .
Richer Cluster Model
Most of our extensions to NGINX were in support of enhancing their upstream model to add more detail. In environments where multiple versions of the same service are deployed simultaneously, simply knowing the host and port of an instance isn’t enough. Envoy (through patches we’ve contributed) allows the attachment of arbitrary metadata to service instances, and the definition of routing rules based on that metadata. This enables advanced traffic management techniques like incremental blue/green releases, seamless monolith decomposition, and testing in production .
NGINX supports a handful of protocols. Envoy’s architecture makes it easy to add support for new protocols, and it comes with a wide variety out of the box. While HTTP still accounts for a large portion of internet traffic, adding visibility into Redis, Mongo, Dynamo, websockets, and gRPC traffic is a huge win for organizations.
Dynamic Service Discovery
As microservices, containers, and orchestrators have become more prevalent, service topologies have become much more dynamic. A list of servers in a file gets stale quickly. Envoy uses an eventually consistent model for service discovery that is API-driven, and deals well with instances coming and going frequently. We currently collect service discovery data from a variety of platforms and orchestrators, and Envoy’s cluster discovery service (CDS) provides a more natural abstraction for us than a fixed config file. Envoy takes this even further by supporting routing topology discovery through the listener discovery service (LDS) and route discovery service (RDS). This allows dynamic reconfiguration of large portions of service topology from a central point of control, which is extremely useful.
Microservices mean that the network is relied on more heavily as a service abstraction boundary. As the number of interdependent services grows, it becomes increasingly rare that the system is 100% up; instead it usually exists in a partially degraded state. Managing network policies, such as retries, timeouts, and rate limiting are critical to maintaining a smooth customer experience in the face of bumps in system health. Envoy allows configuration of these policies at both the proxy (on a per-route basis) and the client layer (on a per-request basis). This yields flexible implementation of extremely fine grained resilience policies that are difficult to implement with NGINX.
Envoy comes with industry standard request logging, but also provides out-of-the-box integration with a wide array of telemetry systems. It also includes native support for Zipkin and Lightstep , for a deeper view into the overall request chain.
We’ve been extremely pleased with the process of moving to Envoy. It’s stable, fast, lightweight, and has a great community. Its architecture makes it a natural fit for microservices, but it’s an equally capable edge proxy. As a vendor, it’s fantastic to have configuration APIs instead of static files.
We’re excited to build out our next batch of capabilities for Houston, the easiest way to manage Envoy at scale. If you have an active Envoy fleet, or are considering moving to Envoy, we’d love to talk. With a wide range of service discovery integrations and a great management UI, we can help you get Envoy deployed and operable quickly and smoothly. Start your free trial today !