Fixing what isn't broke

Starting on the morning of March 9, 2016 I noticed a pecuilar pattern in requests to my employer's web application. Suddenly, HTTP POST requests to the web application were failing. The codebase for this web application is fast moving, but some of the requests were to code paths that hadn't been touched in ages.

I managed to identify all of the failing POST requests in the logs from the application. The first thing I noticed was over 50 of the requests were from a single user on a single page. I found that if the request to the server failed the user was given no indication as to what happened. As a result the user likely just clicked the button many times. So at least one user had already been frustrated by this regression.

I tried reproducing the problem by taking the same actions the user had, but everything worked fine for me. I noticed a few patterns in the requests that failed:

  • All of the users were running Windows
  • All of the POST requests were coming from web forms with the option to attach a file
  • All of the users were using Firefox 45

The fact that the users were all running Windows turned out to have nothing to do with this problem. Firefox 45 had been released just the day before. I decided that Firefox 45 must have a bug and set out to track it down.

The "bug" in Firefox 45

Our production system logs all of the parameters in a request to the web application with sensitive information scrubbed out. As a result I can see exactly what each user was placing into the web form before the POST action was made to the web server. Since our web application is rack based the application code requires very little knowledge of the underlying details of the HTTP & web standards. Uploaded files are always parsed into a Ruby hash object that you interact just like a file. Without going into too much detail, this means you don't have to worry about parsing a request body in the multipart/form-data format as laid out in RFC2388. Well at least you don't normally have to.

Eventually I installed Firefox 45.0 myself and reproduced the problem. I also discovered that as long as I always attached a file in the HTML form, the POST request worked as expected. Doubling-back I checked the production logs again. Sure enough, if the POST requests were made with an attached file from Firefox 45 the web application worked as expected.

Each successful request had parameters that looked like the following

{"attachment"=>
  {:filename=>"0077894A7.pdf",
   :type=>"application/pdf",
   :name=>"attachment",
   :tempfile=>#<File:/tmp/RackMultipart20160322-5885-1qul7qf.pdf>,
   :head=>
    "Content-Disposition: form-data; name=\"attachment\"; filename=\"0077894A7.pdf\"\r\nContent-Type: application/pdf\r\n"}}

Each failing request had parameters that looked like the following

{"attachment"=>""}

This immediately jumped out to me as wrong. How was the empty string making its way to our web application? At this point my suspicions jumped back that our web application was failing to parse the request body properly.

Reproducing the problem

The normal way to have a user upload a file to a web application is to serve the user HTML that includes a <form> element with another <input> element with a type attribute of file.

Something I've intuitively been aware for years is how a browser handles this type of form. If the user selects a file, the request body sent to the server includes the file. If the user does not select a file, the request body sent to the server does not include a filename. What this means is that most web application middleware checks for the presence of the filename in the parameters sent along with the POST request. If no filename is present, it assumes no file was uploaded.

I developed a test application to demonstrate the problem using the sinatra gem. You can get the source for that application here. In developing the test application, I made another discovery. In order to reproduce the problem in isolation I had to use the javascript FormData object along with an XMLHttpRequest. Just creating a regular HTML form and submitting it did not reproduce the problem.

In case you're not familiar with ruby, here are basic of running the test application from your favorite shell

#gem install sinatra
#gem install pry
#ruby form_attachment_demo.rb

The application is then listening on http://localhost:4567

Grabbing the raw request body

To investigate this problem further I needed to grab the complete request body as sent to the web server. You can use a program like tcpdump to do this, but most modern browsers have the options to export a browsing session as a HAR file. This allows all the request and response data to be recorded and exported as JSON.

After exporting a HAR file, you can use anything that can parse JSON to extract the request body from the request you care about. Using my test application, I captured the following from the application log and the HAR file when using Firefox 44.

I, [2016-03-22T20:11:45.046923 #7199]  INFO -- : User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:44.0) Gecko/20100101 Firefox/44.0
I, [2016-03-22T20:11:45.046977 #7199]  INFO -- : Params: 
{}
::1 - - [22/Mar/2016:20:11:45 -0500] "POST / HTTP/1.1" 200 - 0.0220

The request body from the HAR is as follows

-----------------------------39758295915015798572665205
Content-Disposition: form-data; name="attachment"; filename=""
Content-Type: application/octet-stream


-----------------------------39758295915015798572665205--

This is what I consider to be the expected behavior. The request body includes the attachment form element, but with an empty filename. The implementation of the rack gem (version 1.6.4) contains the following in lib/rack/multipart/parser.rb on line 228

  if filename == ""
    # filename is blank which means no file has been selected
    return

Looking at the log file, we can see that the parameters hash passed to the web application from the rack middleware is empty. So Rack is able to identify the absence of a file and does not include the attachment key in the request parameters.

When I used Firefox 45 I got a different result

I, [2016-03-22T20:09:11.318775 #7076]  INFO -- : User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0
I, [2016-03-22T20:09:11.318883 #7076]  INFO -- : Params: 
{"attachment"=>""}
::1 - - [22/Mar/2016:20:09:11 -0500] "POST / HTTP/1.1" 200 - 0.0124

The parameters hash here contains an empty string where I would normally expect nothing at all. This is the same exact problem I saw in my employer's web application. The request body from the HAR file is as follows

-----------------------------2062120312297476491396361917
Content-Disposition: form-data; name="attachment"


-----------------------------2062120312297476491396361917--

The important thing to notice here is that the Content-Type line is missing entirely and there is nothing indicating that filename is the blank string. So what is really going on here is that Firefox 45 broke the FormData object. Due to this breakage, the rack middleware we use can no figure out what parameters to filter and suppress.

What is really going on?

Obviously, I feel this is a regression introduced with Firefox 45 that breaks many websites. It turns out I am not the only person to experience this. This post on Stack Overflow covers the same problem. A formal specification of the expected behavior can be found here.

This changes appears to originate from an attempt to change the FormData object to report some files as being named "blob". This bug report details that the problem of the empty string being sent to the server was identified. Apparently it is was not corrected before release. I do not understand the goal of such a change, but it is clearly not compatible. This seems to be a case of trying to fix something that was not broke.

Patching around this problem

Once I knew the problem was limited to Firefox 45, I had a couple of options in fixing the issue. The one I finally settled on was a patch to the rack gem. Most of the web forms in the application used form elements with a name like file_attachment. So I patched the rack gem to check for parameters with a name including attachment and a value of the empty string. It simply removes the key from the parameters hash. This addresses the problem with Firefox 45 in middleware rather than requiring the application code to understand it.

Here is my patch:

require 'active_support/all'
require 'rack'

module Rack
  module Multipart
    class Parser
      UPLOAD_HASH_KEYS = Set.new( [:type, :name , :tempfile, :head] ).freeze

      mattr_accessor :old_parse
      @@old_parse = Rack::Multipart::Parser.instance_method(:parse)

      # This was first seen in Firefox 45. The recommendation from a working group
      # is here
      #  https://html.spec.whatwg.org/multipage/forms.html#constructing-form-data-set:the-input-element-7
      # The recommendation says that an HTML form file attachment with no file selected is to be uploaded as an empty
      # file with a content type of 'application/octet-stream'. What Firefox 45 actually does is 
      # sends up the empty string in the multipart POST. This code checks for and strips both.

      def strip_empty_attachments(h)
        h.select! do |k,v|
          if v.is_a?(String)
            !(v.empty? && k.include?('attachment'))
          elsif v.is_a?(Hash)
            keys = v.keys
            keys.select! { |x| x.is_a?(Symbol) }
            keys = Set.new(keys)
            if UPLOAD_HASH_KEYS.subset?(keys)
              !( v.fetch(:tempfile).length == 0 && v.fetch(:type) == 'application/octet-stream' )
            else
              strip_empty_attachments(v)
              true
            end
          else
            true            
          end        
        end
        h
      end 

      def parse
        result = Rack::Multipart::Parser.old_parse.bind(self).call
        return result if !result.present?

        strip_empty_attachments(result)
      end
    end
  end
end

Copyright Eric Urban 2016, or the respective entity where indicated