Why doesn't lighttpd always generate ETags?
- Saturday September 20 2014
- code-review http
ETags are a wonderful part of the HTTP standard. You can read about them in RFC 2616 and on wikipedia. ETags are an optional opaque identifier returned by an HTTP server to identify a version of a resource. Servers generate the ETag of a file by using a piece of data that changes when the file does. For example, the modified time of the file can be used. Clients optionally send the ETag back to the server in the headers of requests for the same resource. The server compares the ETag provided by the client against the current ETag of the resource. If the ETags match, the server can respond with a status line of 304 Not Modified
. The server omits the actual content of the resource in the body of the response since the client has it already.
This is particularly excellent if you have a process that needs to take some action when a resource changes. The process can poll the resource at some frequency and only actually get the new copy of the resource when it changes. As you may have noticed I have been working with the pickle format recently. In my case, a Python process periodically writes a new pickled file into a part of the filesystem that is served by the web server lighttpd. On Ubuntu 12.04 LTS the included package from the repositories serves requests with ETags by default.
ETag generation is part of the core of lighttpd
. If your distribution doesn't enable ETags by default in lighttpd
, you can do so by setting the following value in your lighttpd.conf
.
#This configuration file is usually located /etc/lighttpd/lighttpd.conf on Debian & Ubuntu machines. static-file.etags="true"
The missing ETags
For example, using curl
to retrieve a file from a server on my home network returns the resource with an ETag. the -D -
flag to the curl
command cause it to display the HTTP headers.
ericu@ericu-desktop:~$ curl -D - http://corei7/~ericu/example.txt HTTP/1.1 200 OK Content-Type: text/plain Accept-Ranges: bytes ETag: "3912221703" Last-Modified: Sat, 20 Sep 2014 16:37:00 GMT Content-Length: 13 Date: Sat, 20 Sep 2014 16:37:26 GMT Server: lighttpd/1.4.28 Hello ETags!
The ETag header in this case has a value of "3912221703"
. The double quotes are actually part of the header. I executed touch example.txt
on the server to change the modified time of the file. Then I performed the same request again.
ericu@ericu-desktop:~$ curl -D - http://corei7/~ericu/example.txt HTTP/1.1 200 OK Content-Type: text/plain Accept-Ranges: bytes ETag: "3912220878" Last-Modified: Sat, 20 Sep 2014 16:39:09 GMT Content-Length: 13 Date: Sat, 20 Sep 2014 16:39:11 GMT Server: lighttpd/1.4.28 Hello ETags!
The ETag now has a value of "3912220878"
because lighttpd
generated a new ETag when the file changed.
Interestingly, I found that this did not work with pickled data. This is what I got when I serialized the string "Hello ETags!" using the pickle
module into a file called example.pkl
.
ericu@ericu-desktop:~$ curl -D - http://corei7/~ericu/example.pkl HTTP/1.1 200 OK Content-Type: application/octet-stream Accept-Ranges: bytes Content-Length: 20 Date: Sat, 20 Sep 2014 16:41:14 GMT Server: lighttpd/1.4.28 S'Hello ETags!' p0
The content is returned but there is no ETag
header. This means I cannot use the pattern of having a process poll for a change in the file. This would result in unacceptably high bandwidth usage as the file would be transferred completely on each request.
Finding the problem
I initially poked around in the lighttpd
configuration. I assumed I just needed to flip some configuration value to cause lighttpd
to always generate ETags. Unfortunately the only thing I was able to find in my search was numerous individuals indicating problems with ETags and mod_compress
. No changes to lighttpd.conf
seem to be able to do what I wanted.
Giving up on that avenue, I decided to pull the source code for lighttpd
. On Ubuntu 12.04 LTS the version is 1.4.28. You can download the source here. Using find
, xargs
, and grep
I located the section of code responsible for ETag generation in src/mod_staticfile.c
. Here is the relevant excerpt.
440 if (NULL == array_get_element(con->response.headers, "Content-Type")) { 441 if (buffer_is_empty(sce->content_type)) { 442 /* we are setting application/octet-stream, but also announce that 443 * this header field might change in the seconds few requests 444 * 445 * This should fix the aggressive caching of FF and the script download 446 * seen by the first installations 447 */ 448 response_header_overwrite(srv, con, CONST_STR_LEN("Content-Type"), CONST_STR_LEN("application/octet-stream")); 449 450 allow_caching = 0; 451 } else { 452 response_header_overwrite(srv, con, CONST_STR_LEN("Content-Type"), CONST_BUF_LEN(sce->content_type)); 453 } 454 } 455 456 if (con->conf.range_requests) { 457 response_header_overwrite(srv, con, CONST_STR_LEN("Accept-Ranges"), CONST_STR_LEN("bytes")); 458 } 459 460 if (allow_caching) { 461 if (p->conf.etags_used && con->etag_flags != 0 && !buffer_is_empty(sce->etag)) { 462 if (NULL == array_get_element(con->response.headers, "ETag")) { 463 /* generate e-tag */ 464 etag_mutate(con->physical.etag, sce->etag); 465 466 response_header_overwrite(srv, con, CONST_STR_LEN("ETag"), CONST_BUF_LEN(con->physical.etag)); 467 } 468 }
On line 460 if allow_caching
is non-zero then an ETag is generated if none already exists. The value allow_caching
is set to zero on line 450. The if statement on line 440 checks for the absence of a Content-Type
header in the response. On line 441 if the content type is empty, a default of application/octet-stream
is used and caching is disabled. The result of this is simple: if lighttpd
doesn't know the content type of a file, it won't generate an ETag.
I am unsure why lighttpd
was implemented in this fashion. I can't find anything in RFC2616 indicating that this is the required behavior. The comments on lines 442-447 indicate that this may be some sort of workaround for caching problems in Firefox.
Fixing the problem
Armed with the knowledge that lighttpd
needs to know the content type of a file to generate an ETag, I set out to inform lighttpd
thusly. I found the configuration value mimetype.assign
in lighttpd's documentation. It was absent in my lighttpd.conf
so I set it as follows.
mimetype.assign = ( ".pkl" => "application/pickle" )
However, this didn't work. Lighttpd refused to start. On Ubuntu the following line is present in lighttpd.conf
.
include_shell "/usr/share/lighttpd/create-mime.assign.pl"
This means that lighttpd runs the script and reads configuration from standard output. Unsuprisingly this script generates the configuration value mimetype.assign
. I didn't bother understanding all of the script, but it reads from /etc/mime.types
if the file exists. I edited that file and added the following line to the end.
application/pickle pkl pickle
This defines files ending in .pkl
or .pickle
to have a MIME type of application/pickle
. Next, lighttpd must be restarted so its configuration is updated. Then I performed my curl
request again for the pickle file.
ericu@ericu-desktop:~$ curl -D - http://corei7/~ericu/example.pkl HTTP/1.1 200 OK Content-Type: application/pickle Accept-Ranges: bytes ETag: "3635400004" Last-Modified: Sat, 20 Sep 2014 16:41:03 GMT Content-Length: 20 Date: Sat, 20 Sep 2014 17:07:31 GMT Server: lighttpd/1.4.28 S'Hello ETags!' p0
Success! Lighttpd now generates an ETag for my files ending in .pkl
.