Logging Traffic Between NGINX and Upstream Servers at CDN77

NGINX | December 12, 2018

Regardless of what platform you’re running, logging is often a core requirement for big data processing, statistics, audits, client reports, and transactions, as well as for debugging client‑server communication and possible issues.

In this blog, we address the role of logging in debugging, where it provides the ultimate tool for locating the various issues so common to Internet communications.

Logging with a direction connection between client and web server

Logging of client‑web server communication greatly helps in debugging possible issues related to the browser version, the client’s network, and the files being accessed. It does not end there, however. There are endless possibilities of what to log, entirely depending on your platform and requirements.

NGINX’s Logging Architecture

When you run a reverse proxy, load balancer, or content delivery network (CDN), you’re adding another node into the communication flow between client and server. An intermediate server that manipulates data, like a reverse proxy, often introduces unexpected issues.

As a reverse proxy, NGINX converts the connection from client to web server (or other app server) into two separate connections by terminating the connection from the client and creating a new one to the server. The “split” into two connections also creates two separate contexts for logging, and NGINX supports logging for them somewhat differently.

For traffic between the client and the reverse proxy, NGINX provides both an error log and an access log, which records processing events and actions that are not errors. You enable the error log with the error_log directive, and can set the severity level of the errors to be logged. You enable the access log with the access_log directive. The associated log_format directive enables to you customize the kind of information included in the logs and the format of the log entries. You can write the logs to a file, to syslog, or both.
For traffic between the reverse proxy and the web or app server (which NGINX refers to as an upstream server), NGINX supports the error log. It does not, however, support access logging for this traffic. The only way to see non‑error events between the proxy server and an upstream server is to set the severity level in the error log to debug. The downside of this setting is that huge amounts of data are logged. This both slows down request processing and creates very large files which can quickly fill up storage. (Note that you must also recompile NGINX with the --with-debug argument on the configure command, as support for debugging is not enabled by default.)

Our Upstream Log Solution

As a CDN provider, we often run into upstream servers that are not properly communicating with our reverse proxy servers and clients. The debug‑level messages in the error log don’t always provide the kind of information we need to fix an issue with the upstream server.

The solution soon became quite evident: generate an upstream access log by adding a function that gathers just the essential information from the communication between the reverse proxy and upstream servers.

Logging with a split connection between client and web server

The general idea is to have NGINX call our function every time a request to an upstream server is made. This allows us to program all logic related to the upstream logging in the function itself.

The standard ngx_http_upstream_module handles upstream requests, and we need it to call the function for us. The module doesn’t currently support such functionality, so we patched it to enable the callback when needed.

The logging itself is handled in a separate module we wrote, which uses the log callback capability we added in the patched upstream module. The new module defines a new upstream_log directive for configuring the logging functionality. The directive uses the same parser as the access_log directive, so data can be written either in a file or sent to a syslog server using a socket.

When NGINX reads the upstream_log directive in nginx.conf during startup, two functions are called:

ngx_http_upstream_log_set_log, which parses the directive and prepares the log structure itself (ngx_log_t) using ngx_log_set_log internally
ngx_http_upstream_log_init (in a post‑configuration step), which registers our main logging function with the upstream module

This way, when a request needs to be proxied upstream, everything is ready. The upstream module fires up a connection to the upstream server, and our patch makes sure the logging function is called to log the request details.

The log format is hardwired into the upstream module. We still have the option to add support for configuration using log_format directive, but it’s not necessary for our use case.

Logged Values from the Upstream

The logging function is called right after the connection to the upstream server is closed. Its argument is a pointer to a currently processed request (the ngx_http_ request_t structure), which enables the function to access and log all data in the structure. The upstream field (pointer to ngx_http_upstream_t) is of particular interest as it contains data about the upstream request. We’re particularly interested in:

The origin response status code
The duration of the request
The value of an HTTP header in response or host header

The access to the entire request structure is what gives the module flexibility, because a wide variety of information can be logged.

Issues We Encountered

We initially implemented the functionality in ngx_http_core_module. This was good enough for a prototype, but it wasn’t a very clean solution, as it could complicate future updates and modifications. Eventually, we separated the upstream logging function into a standalone module as described in Our Upstream Log Solution.

Of course, there were some implementation issues. Most notably, in some places we mishandled ngx_str_t strings, for example using the C library function sprintf instead of ngx_snprintf. This can cause writing of undefined data into the upstream log or even a segmentation fault in the worker thread. Such problems were solved after extensive debugging and testing using tools like Valgrind and AddressSanitizer.

NGINX Features Used in CDN77

The main reason CDN77 uses NGINX is for its caching capabilities. The CDN server is a node introduced between the client and web server (upstream server), passing client requests and requesting appropriate files from the upstream server. When the file gets cached, it’s delivered to other users requesting the same file from the same location.

“Locally served files” is one of the features we’re using to have NGINX provide a file from the server’s disk when it receives a request.

Some additional features and configurations are necessary for secure caching. We make use of SSL (TLS 1.3 with 0‑RTT) or secure tokens that can be generated for a specific IP address to protect the content properly.

Other features that we use include custom error pages for our clients, the default NGINX implementation of OCSP stapling, and FastCGI for PHP to reduce the number of required PHP processes.

Conclusion

Not only has upstream logging helped us and continues to help with debugging various issues, it also provided an excellent opportunity to dive more into the NGINX’s core functionality and simplified numerous other projects we have undertaken.

About CDN77

CDN77 makes content delivery better and more convenient worldwide. With more than 30 data centers, we’re capable of effectively caching and delivering content all over the globe. This includes static content at your website, software distribution, video on demand (VoD), and live streaming through various protocols, such as HLS or MPEG‑DASH using a dedicated streaming engine.

Getting started with CDN77 is very easy, quick, and straightforward. Sign up for a free trial, create a CDN resource, and use the generated CDN URL or customized CNAME record to integrate it with your website or streaming solution. All the features, settings, and possible custom solutions ensure CDN77 meets your requirements.