Nginx Built-in Variables

Nginx Built-in Variables

built-in variables

Reading Notes / Web Notes

2020.01.21

0 👣 #nginx

Note took from:

agentzh’s Nginx Tutorials (version 2020.03.19)

Nginx variables that are pre-defined by either the Nginx core or Nginx modules are called “built-in variables”.

uri and request_uri

The built-in variable $uri provided by ngx_http_core is used to fetch the (decoded) URI of the current request, excluding any query string arguments. Another is $request_uri variable provided by the same module, which is used to fetch the raw non-decoded form of the URI, including any query string. Let’s look at the following example:

location /test {
  echo "uri = $uri";
  echo "request_uri = $request_uri";
}

The outside server configuration block was omitted for brevity. Below is the result of testing this /test interface with different request:

$ curl 'http://localhost:8080/test'
uri = /test
request_uri = /test

$ curl 'http://localhost:8080/test?a=3&b=4'
uri = /test
request_uri = /test?a=3&b=4

$	curl 'http://localhost:8080/test/hello%20world?a=3&b=4'
uri = /test/hello	world
request_uri = /test/hello%20world?a=3&b=4

## Variables with infinite names

There’s another very common built-in variable that does not have a fixed variable name. Instead, it has infinite variations. That is, all those variables whose names have the prefix arg_, like $arg_foo and $arg_bar. The $arg_name variable is evaluated to the value of the name URI argument for the current request. Also, the URI argument’s value obtained here is not decoded yet, potentially containing the %XX sequences. Let’s check out a complete example:

location /test {
  echo "name: $arg_name";
  echo "class: $arg_class";
}

Then we test this interface with various different URI argument combinations:

$	curl 'http://localhost:8080/test'
name:
class:

$	curl 'http://localhost:8080/test?name=Tom&class=3'
name: Tom
class: 3 

$	curl 'http://localhost:8080/test?name=hello%20world&class=9'
name: hello%20world
class: 9

In fact, $arg_name does not only match the name argument name, but also NAME or even Name, the letter case does not matter here. Behind the scene, Nginx just concerts the URI argument names into the pure lower-case form before matching against the name specified by $arg_xXx.

If you want to decode the special sequences like %20 in the URI argument values, then you cloud use the set_unescape_uri directive provided by the 3rd-party module ngx_set_misc.

location /test {
  set_unescape_uri $name $arg_name;
  set_unescape_url $class $arg_class;
  
  echo "name: $name";
  echo "class: $class";
}
$ curl 'http://localhost:8080/test?name=hello%20world&class=9'
name: hello world
class: 9

The space has indeed been decoded!

Another thing that we can observe is that the set_unescape_uri directive can also implicitly create Nginx user-defined variables.

The Nginx core offers a lot of such built-in variables in addition to $arg_xxx, like the $cookie_xxx variable group for fetching HTTP cookie values, the $http_xxx variable group for fetching request headers, as well as the $sent_http_xxx variable group for retrieving response headers. Refer to the official documentation for the ngx_http_core module.

Read-only built-in variables

All the user-defined variables are writable. However, most of the built-in variables are effectively read-only, like the $uri and $request_uri variables that we just introduced. Assignments to such read-only variables must always be avoided.

Attempt of writing to some other read-only built-in variables like $arg_xxx will just lead to server crashes in some particular Nginx versions.

Writable built-in variable $args

Some built-in variable are writable as well. For instance, when reading the built-in variable $args, we get the URL query string of the current request, but when writing to it, we’re effectively modifying the query string:

location /test {
  set $orig_args $args;
  set $args "a=3&b=4";
  
  echo "original args: $orig_args";
  echo "args: $args";
}
$ curl 'http://localhost:8080/test?a=0&b=1&c=2'
original args: a=0&b=1&c=2
args: a=3&b=4

It should noted that when reading $args, Nginx will execute a special piece of code, fetching data from a particular place where the Nginx core stores the URL query string for the current request. On the hand, when we overwrite $args, Nginx will execute another special code, storing new value into the same place in the core. Other parts of Nginx also read the same place whenever the query string is needed. So, our modification to $args will immediately affect all the other parts’ functionality later on.

Below is an example to demonstrate that assignments to $args affect the HTTP proxy module ngx_proxy:

server {
  listen 8080;
  location /test {
    set $args "foo=1&bar=2";
    proxy_pass http://localhost:8081/args;
  }
}

server {
  listen 8081;
  
  location /args {
    echo "args: $args";
  }
}
$ curl 'http://localhost:8080/test?blah=7'
args: foo=1&bar=2

Variable “get handler” and “set handler”

In previous section, we learned that when reading the built-in variable $args, Nginx executes a special piece of code to obtain a value on-the-fly and when writing to this variable, Nginx executes another special piece of code to propagate the change. In Nginx’s terminology, the special code executed for reading the variable is called get handler and the code for writing to the variable is called set handler.

When a variable is being created at “configure time”, the creating Nginx module must make a decision on whether to allocate a value container for it and whether to attach a custom “get handler” and/or a “set handler” to it. Those variables owing a value container are called “indexed variables” in Nginx. Otherwise, they are said to be not indexed.

We already know that the variable group like $arg_xxx discussed in earlier sections do not have a value container and thus are not indexed. When reading $arg_xxx, it is its “get handler” at work, that is, its “get handler” scans the current URL query string on-the-fly, extracting the value of the specified URL argument. Nginx never tries to parse all the URL arguments beforehand, but rather scans the whole URL query string for a particular argument in a “get handler” every time that argument is requested by reading the corresponding $arg_xxx variable.

Value containers for caching & ngx_map

Some Nginx variables choose to use their value containers as a data cache when the “get handler” is configured. In this setting, the “get handler” is run only once, i.e., at the first time the variable is read, which reduces overhead when the variable is read multiple times during its lifetime.

map $args $foo {
  default	0;
  debug		1;
}

server {
  listen 8080;
  
  location /test {
    set $orig_foo $foo;
    set $args debug;
    
    echo "original foo: $orig_foo";
    echo "foo: $foo";
  }
}

Nginx’s map directive is used to define a “mapping” relationship between two Nginx variables, or in other words, “function relationship”. Here in this example, we use the map directive to define the “mapping” relationship between user variable $foo and built-in variable $args. When using the mathematical function notation, y=f(x)y=f(x), our $args variable is effectively the “independent variable”, xx, while $foo is the “dependent variable”, yy. That is, the value of $foo depends on the value of $args, or rather, we map the value of $args onto the $foo variable (in some way). Therefore, we obtain the following complete mapping rule in this example: if the value of $args is debug, variable $foo gets the value 1; otherwise $foo gets the value 0. So essentially, this is a conditional assignment to the variable $foo.

$ curl 'http://localhost:8080/test'
original foo: 0
foo: 0

The first output line indicated that the value of $orig_foo is 0, which is exactly what we expected: the original request does not take a URL query string, so the initial value of $args is empty, leading to the 0 initial value of $foo, according to the “default” condition in the mapping rule.

But surprisingly, the second output line indicated that the final value of $foo is still 0, even after we overwrite $args to the value debug. The reason is pretty simple: when the first time variable $foo is read, its value computed by ngx_map’s “get handler” is cached in its value container. We already learned earlier that Nginx modules may choose to use the value container of the variable created by themselves as a data cache for its “get handler”. Obviously, the ngx_map module considers the mapping computation between variables expensive enough and caches the result automatically, so that the next time the same variable is read within the lifetime of the current request, Nginx can just return the cached result without invoking the “get handler” again. It verifies by:

$ curl 'http://localhost:8080/test?debug'
original foo: 1
foo: 1

The map directive is actually a unique example, because it not only register a “get handler” for the user variable, but also allow the user to define the computing rule in the “get handler” directly in the Nginx configuration file. Meanwhile, it must be made clear that not all the variables using “get handler” will cache the result. For instance, the $arg_xxx variable does not use its value container at all.

Side note for use contexts of directives

We should note that the map directive is put outside the server configuration block, that is, it is defined directly within the outermost http configuration block.

Every configuration directive does have a pre-defined set of use contexts in the configuration file. When in doubt, always refer to the corresponding documentation for exact use contexts of a particular directive.

Lazy evaluation of variable values

We have learned how the map directive works. It is the “get handler” that performs the value computation and related assignment. And the “get handler” will not run at all unless the corresponding user variable is actually being read. Therefore, for those requests that never access that variable, no useless computation involved.

The technique that postpones that value computation off to the point where the value is actually needed is called “lazy evaluation” in the computing world. In contrast, it is much more common to see “eager evaluation”.

Variable in subrequests

A detour to subrequest

You might have assumed the “requests” in that context are just those HTTP request initiated from the client side. In fact, there are two kinds of “requests” in the Nginx world. One is called the main requests, and the other is called the subrequests.

Main requests are those initiated externally by HTTP clients. Whereas subrequest are a special kind of requests initiated from within the Nginx core. But please do not confuse subrequests with those HTTP requests created by the ngx_proxy modules!

Subrequests may look very much like an HTTP request in appearance, their implementation, however, has nothing to do with neither the HTTP protocol nor any kind of socket communication. A subrequest is an abstract invocation for decomposing the task of the main request into smaller “internal requests” that can be served independently by multiple different location blocks, either in series or in parallel. Subrequests can also be recursive: any subrequest can initiate more sub-requests, targeting other location blocks or even current location itself. According to Nginx’s terminology, if request A initiates a subrequest B, then A is called the “parent request” of B.

location /main {
  echo_location /foo;
  echo_location /bar;
}

location /foo {
  echo foo;
}

location /bar {
  echo bar;
}
$ curl 'http://localhost:8080/main'
foo
bar

The subrequests initiated by echo_location are always running sequentially according to their literal order in the configuration file. The response body of these two subrequests get concatenated together according to their running order, to form the final response body of their parent request.

It should be noted that the communication of location block via subrequests is limited within the same server block (i.e., the same virtual server configuration).

Independent variable container in subrequests

Variables with the same name between a parent request and a subrequest will generally not interfere with each other, both the main request its sub requests do own different copies of variable containers.

Shared variable containers among requests

Subrequests initiated by certain Nginx modules do share variable containers with their parent requests, like those initiated by 3rd-party module ngx_auth_request.

location /main {
  set $var main;
  auth_request /sub;
  echo "main: $var";
}

location /sub {
  set $var sub;
  echo "sub: $var";
}
$ curl "http://localhost:8080/main"
main: sub

Obviously, the value change of $var in the subrequest to /sub does affect the main request to /main. Thus the variable container of $var is indeed shared between the main request and the subrequest created by the ngx_auth_request module. The auth_request directive discards the response body of the subrequest it manages, and only checks the response status code of the subrequest. When the status code looks good, like 200, auth_request will just allow Nginx continue processing the main request; otherwise it will immediately abort the main request by returning a 403 error page. In this example, the subrequest to /sub just return a 200 response implicitly created by the echo directive in /sub.

Even though sharing variable containers among the main request and all its subrequests could make bidirectional data exchange easier, it could also lead to unexpected subtle issues that are hard to debug in real-world configurations. Because users often forget that a variable with the same name is actually used in some deeply embedded subrequest and just use it for something else in the main request, this variable could get unexpectedly modified during processing.

Such bad side effects make many 3rd-party modules like ngx_echo, ngx_lua and ngx_srcache choose to disable the variable sharing behaviour for subrequests by default.

Built-in variables in subrequests

Built-in variables sensitive to the subrequest context

When reading $args in a subrequest, its “get handler” should naturally return the query string for the subrequest.

location /main {
  echo "main args: $args";
  echo_location /sub "a=1&b=2";
}

location /sub {
  echo "sub args: $args";
}
$ curl "http://localhost:8080/main?c=3"
main args: c=3
sub args: a=1&b=2

It is clear that when $args is read in the main request (to /main), its value is the URL query string of the main request; whereas when in the subrequest (to /sub), it is the query string of the subrequest. This behaviour indeed matches our intuition.

Built-in variables for main requests only

Unfortunately, not all built-in variables are sensitive to the context of subrequests. Several built-in variables always act on the main request even when they are used in a subrequest. The built-in variable $request_methos is such an exception.

Whenever $request_methos is read, we always get the request method name (such as GET and POST) for the main request, no matter whether the current request is a subrequest or not.

location /main {
  echo "main method: $reuqust_method";
  echo_location /sub;
}

location /sub {
  echo "sub method: $request_method";
}

Now, let’s do a POST request to /main:

$ curl --data hello "http://localhost:8080/main"
main method: POST
sub method: POST

Here we use the --data option of the curl utility to specify our POST request body, also this option makes curl use the POST method for the request. The result turns out as we expected, the variable $request_method is evaluated to the main request’s method name, POST, despite its use in a GET subrequest.

Indeed, we can turn to the built-in variable $echo_request_method provided by the ngx_echo module to get the actually method name.

Variable container sharing and value caching together

map $uri $tag {
  default	0;
  /main 	1;
  /sub		2;
}

...
location /main {
  auth_request /sub;
  echo "main tag: $tag";
}
location /sub {
  echo "sub tag: $tag";
}
$ curl "http://localhost:8080/main"
main tag: 2

It works like this: the $tag variable was first read in the subrequest to /sub, and the “get handler” register by map computed the value 2 for $tag in that context and the value of 2 got cached in the value container of $tag from then on. Because the parent request shared the same container as the subrequest created by auth_request, when the parent request read $tag later, the cached value 2 was directly returned.

For this example, we can conclude that it can hardly be a good idea to enable variable container sharing in subrequests.

THE END
Ads by Google

林宏

Frank Lin, PhD

Hey, there! This is Frank Lin (@flinhong), one of the 1.41 billion . This 'inDev. Journal' site holds the exploration of my quirky thoughts and random adventures through life. Hope you enjoy reading and perusing my posts.

YOU MAY ALSO LIKE

Using Liquid in Jekyll - Live with Demos

Web Notes

2016.08.20

Using Liquid in Jekyll - Live with Demos

Liquid is a simple template language that Jekyll uses to process pages for your site. With Liquid you can output complex contents without additional plugins.

Practising closures in JavaScript

JavaScript Notes

2018.12.17

Practising closures in JavaScript

JavaScript is a very function-oriented language. As we know, functions are first class objects and can be easily assigned to variables, passed as arguments, returned from another function invocation, or stored into data structures. A function can access variable outside of it. But what happens when an outer variable changes? Does a function get the most recent value or the one that existed when the function was created? Also, what happens when a function invoked in another place - does it get access to the outer variables of the new place?