Fixing what isn't broke
- Tuesday March 22 2016
- html web javascript
Starting on the morning of March 9, 2016 I noticed a pecuilar pattern in requests to my employer's web application. Suddenly, HTTP POST
requests to the web application were failing. The codebase for this web application is fast moving, but some of the requests were to code paths that hadn't been touched in ages.
I managed to identify all of the failing POST
requests in the logs from the application. The first thing I noticed was over 50 of the requests were from a single user on a single page. I found that if the request to the server failed the user was given no indication as to what happened. As a result the user likely just clicked the button many times. So at least one user had already been frustrated by this regression.
I tried reproducing the problem by taking the same actions the user had, but everything worked fine for me. I noticed a few patterns in the requests that failed:
- All of the users were running Windows
- All of the
POST
requests were coming from web forms with the option to attach a file - All of the users were using Firefox 45
The fact that the users were all running Windows turned out to have nothing to do with this problem. Firefox 45 had been released just the day before. I decided that Firefox 45 must have a bug and set out to track it down.
The "bug" in Firefox 45
Our production system logs all of the parameters in a request to the web application with sensitive information scrubbed out. As a result I can see exactly what each user was placing into the web form before the POST
action was made to the web server. Since our web application is rack based the application code requires very little knowledge of the underlying details of the HTTP & web standards. Uploaded files are always parsed into a Ruby hash object that you interact just like a file. Without going into too much detail, this means you don't have to worry about parsing a request body in the multipart/form-data
format as laid out in RFC2388. Well at least you don't normally have to.
Eventually I installed Firefox 45.0 myself and reproduced the problem. I also discovered that as long as I always attached a file in the HTML form, the POST
request worked as expected. Doubling-back I checked the production logs again. Sure enough, if the POST
requests were made with an attached file from Firefox 45 the web application worked as expected.
Each successful request had parameters that looked like the following
{"attachment"=> {:filename=>"0077894A7.pdf", :type=>"application/pdf", :name=>"attachment", :tempfile=>#<File:/tmp/RackMultipart20160322-5885-1qul7qf.pdf>, :head=> "Content-Disposition: form-data; name=\"attachment\"; filename=\"0077894A7.pdf\"\r\nContent-Type: application/pdf\r\n"}}
Each failing request had parameters that looked like the following
{"attachment"=>""}
This immediately jumped out to me as wrong. How was the empty string making its way to our web application? At this point my suspicions jumped back that our web application was failing to parse the request body properly.
Reproducing the problem
The normal way to have a user upload a file to a web application is to serve the user HTML that includes a <form>
element with another <input>
element with a type
attribute of file
.
Something I've intuitively been aware for years is how a browser handles this type of form. If the user selects a file, the request body sent to the server includes the file. If the user does not select a file, the request body sent to the server does not include a filename. What this means is that most web application middleware checks for the presence of the filename in the parameters sent along with the POST
request. If no filename is present, it assumes no file was uploaded.
I developed a test application to demonstrate the problem using the sinatra gem. You can get the source for that application here. In developing the test application, I made another discovery. In order to reproduce the problem in isolation I had to use the javascript FormData object along with an XMLHttpRequest
. Just creating a regular HTML form and submitting it did not reproduce the problem.
In case you're not familiar with ruby, here are basic of running the test application from your favorite shell
#gem install sinatra #gem install pry #ruby form_attachment_demo.rb
The application is then listening on http://localhost:4567
Grabbing the raw request body
To investigate this problem further I needed to grab the complete request body as sent to the web server. You can use a program like tcpdump
to do this, but most modern browsers have the options to export a browsing session as a HAR
file. This allows all the request and response data to be recorded and exported as JSON.
After exporting a HAR
file, you can use anything that can parse JSON to extract the request body from the request you care about. Using my test application, I captured the following from the application log and the HAR file when using Firefox 44.
I, [2016-03-22T20:11:45.046923 #7199] INFO -- : User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:44.0) Gecko/20100101 Firefox/44.0 I, [2016-03-22T20:11:45.046977 #7199] INFO -- : Params: {} ::1 - - [22/Mar/2016:20:11:45 -0500] "POST / HTTP/1.1" 200 - 0.0220
The request body from the HAR
is as follows
-----------------------------39758295915015798572665205 Content-Disposition: form-data; name="attachment"; filename="" Content-Type: application/octet-stream -----------------------------39758295915015798572665205--
This is what I consider to be the expected behavior. The request body includes the attachment
form element, but with an empty filename. The implementation of the rack
gem (version 1.6.4) contains the following in lib/rack/multipart/parser.rb
on line 228
if filename == "" # filename is blank which means no file has been selected return
Looking at the log file, we can see that the parameters hash passed to the web application from the rack middleware is empty. So Rack is able to identify the absence of a file and does not include the attachment
key in the request parameters.
When I used Firefox 45 I got a different result
I, [2016-03-22T20:09:11.318775 #7076] INFO -- : User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0 I, [2016-03-22T20:09:11.318883 #7076] INFO -- : Params: {"attachment"=>""} ::1 - - [22/Mar/2016:20:09:11 -0500] "POST / HTTP/1.1" 200 - 0.0124
The parameters hash here contains an empty string where I would normally expect nothing at all. This is the same exact problem I saw in my employer's web application. The request body from the HAR
file is as follows
-----------------------------2062120312297476491396361917 Content-Disposition: form-data; name="attachment" -----------------------------2062120312297476491396361917--
The important thing to notice here is that the Content-Type
line is missing entirely and there is nothing indicating that filename
is the blank string. So what is really going on here is that Firefox 45 broke the FormData
object. Due to this breakage, the rack
middleware we use can no figure out what parameters to filter and suppress.
What is really going on?
Obviously, I feel this is a regression introduced with Firefox 45 that breaks many websites. It turns out I am not the only person to experience this. This post on Stack Overflow covers the same problem. A formal specification of the expected behavior can be found here.
This changes appears to originate from an attempt to change the FormData
object to report some files as being named "blob". This bug report details that the problem of the empty string being sent to the server was identified. Apparently it is was not corrected before release. I do not understand the goal of such a change, but it is clearly not compatible. This seems to be a case of trying to fix something that was not broke.
Patching around this problem
Once I knew the problem was limited to Firefox 45, I had a couple of options in fixing the issue. The one I finally settled on was a patch to the rack
gem. Most of the web forms in the application used form elements with a name like file_attachment
. So I patched the rack gem to check for parameters with a name including attachment
and a value of the empty string. It simply removes the key from the parameters hash. This addresses the problem with Firefox 45 in middleware rather than requiring the application code to understand it.
Here is my patch:
require 'active_support/all' require 'rack' module Rack module Multipart class Parser UPLOAD_HASH_KEYS = Set.new( [:type, :name , :tempfile, :head] ).freeze mattr_accessor :old_parse @@old_parse = Rack::Multipart::Parser.instance_method(:parse) # This was first seen in Firefox 45. The recommendation from a working group # is here # https://html.spec.whatwg.org/multipage/forms.html#constructing-form-data-set:the-input-element-7 # The recommendation says that an HTML form file attachment with no file selected is to be uploaded as an empty # file with a content type of 'application/octet-stream'. What Firefox 45 actually does is # sends up the empty string in the multipart POST. This code checks for and strips both. def strip_empty_attachments(h) h.select! do |k,v| if v.is_a?(String) !(v.empty? && k.include?('attachment')) elsif v.is_a?(Hash) keys = v.keys keys.select! { |x| x.is_a?(Symbol) } keys = Set.new(keys) if UPLOAD_HASH_KEYS.subset?(keys) !( v.fetch(:tempfile).length == 0 && v.fetch(:type) == 'application/octet-stream' ) else strip_empty_attachments(v) true end else true end end h end def parse result = Rack::Multipart::Parser.old_parse.bind(self).call return result if !result.present? strip_empty_attachments(result) end end end end