Node multipart/form-data Explained

I was recently working on a project that involved sending large amounts of data through a series of HTTP based web service. After running through various scenarios to optimize throughput of these services we landed on using multipart/form-data to transmit data between our services.

In tech interviews, I often ask candidates to explain the difference between the GET and POST HTTP verbs. Most candidates understand that GET is used for requesting information and uses the query string, while POST is used for submitting data and data is submitted via form data.

In all my years, I don't think I've had someone mention the content type of the encoding for form data... yet it is undeniable that a web developer should have working knowledge of these content types. So lets back up and first discuss the encoding content types.

Encoding Content Type

The encoding content types are defined in the W3C HTML Specification. They are typically added to the enctype property on a FORM HTML element.

As a developer, chances are pretty good that you've seen them and worked with them:

  • application/x-www-form-urlencoding
  • multipart/form-data

By default, application/x-www-form-urlencoded is used to submit standard form field.

<form action="/update" method="post">  
  <input type="text" name="username" />
  <button type="submit" />
</form>  

We often switch to multipart/form-data where we need to upload a file:

<form action="/update" method="post" encrypt="multipart/form-data>  
  <input type="text" name="username" />
  <input type="file" name="avatar" />
  <button type="submit" />
</form>  

By what do the encoding content types do?

Here's what the W3C has to say:

The enctype attribute of the FORM element specifies the content type used to encode the form data set for submission to the server.

So, functionally, the content type is specifying that the keys and values submitted in a form will be encoded in a specific format.

Setup

Before we dive into the specifics, let's create a simple Express app that will output the headers and content of a request so we can see exactly what a request looks like:

let express = require('express');  
let app     = express();

app.post('/raw', (req, res) => {

  // output the headers
  console.log(req.headers);

  // capture the encoded form data
  req.on('data', (data) => {
    console.log(data.toString());
  });

  // send a response when finished reading
  // the encoded form data
  req.on('end', () => {
    res.send('ok');
  });
});

// start server on port 8080
app.listen(8080);  

The above code simply creates an endpoint at /raw and will log the headers and request body to stdout.

Using Postman we can submit requests with various encoding types and form data.

Encoding with x-www-form-urlencoded

x-www-form-urlencoded is the default encoding content type. It is also the simplest form of transmitting data.

It involves URL encoding the name and values pairs according to the rules outline in the HTML specification. URL encoding is something you've undoubtedly seen before.

If you type "C# multipart/form-data" into Google the URL you navigate to is https://www.google.com/#safe=off&q=c%23+multipart/form-data.

You can see the q=c%23+multipart/form-data is the actual name/value pair for the query. In the URL, it is encoded to make a valid URL.

x-www-form-urlencoded requests will also have an HTTP header specified for Content-Type with a value of application/x-www-form-urlencoded.

Lets take a look at what a full request would look like from the server's perspective.

In the example below, two fields are submitted via a Postman request that looks like

The server outputs the following headers in the request:

host: 'localhost:8080',  
connection: 'keep-alive',  
'content-length': '41',  
'cache-control': 'no-cache',  
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36',  
'content-type': 'application/x-www-form-urlencoded',  
accept: '*/*',  
dnt: '1',  
'accept-encoding': 'gzip, deflate',  
'accept-language': 'en-US,en;q=0.8'  

The important ones here are the content-length and content-type headers. The content-length relates to the length of transmitted data. The content-type is our old friend application/x-www-form-urlencoded.

More interestingly, and more pertinent to this discussion is the actual request body submitted to the server.

The form data that was submitted is URL encoded, just as if it was in a query string:

username=brian+mancini&nickname=turkey%24  

Encoding with form-data

As we mentioned previously, the other encoding content-type is multipart/form-data.

The W3C provides some guidance on when it is appropriate to use this encoding type:

The content type "application/x-www-form-urlencoded" is inefficient for sending large quantities of binary data or text containing non-ASCII characters. The content type "multipart/form-data" should be used for submitting forms that contain files, non-ASCII data, and binary data.

So there you have it, use form-data when sending data that:

  1. contain files
  2. non-ASCII data
  3. binary data

But what does a multipart request look like? Using our buddy Postman, we'll submit a request that uses form-data like this:

The server will log the headers and the content. The headers will look similar to below:

host: 'localhost:8080',  
connection: 'keep-alive',  
'content-length': '306',  
'cache-control': 'no-cache',  
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36',  
'content-type': 'multipart/form-data; boundary=----WebKitFormBoundaryvlb7BC9EAvfLB2q5',  
accept: '*/*',  
dnt: '1',  
'accept-encoding': 'gzip, deflate',  
'accept-language': 'en-US,en;q=0.8'  

The most interesting piece of the headers

'content-type': 'multipart/form-data; boundary=----WebKitFormBoundaryvlb7BC9EAvfLB2q5',  

You'll see that this contains two pieces of information:

  1. multipart/form-data
  2. boundary

The first pieces specifies that our request was submitted as multipart/form-data and the boundary is what is used to separate the "multiple parts" of the multipart request body.

The HTML specification does a decent job of explaining the rules. multipart/form-data conforms to standard multipart MIME data streams as outlined in RFC2045. This means a few things:

  • a message consists of a series of parts
  • each part is separated by a boundary
  • the boundary cannot occur in the data
  • each part must contain a Content-Disposition header with the value of form-data
  • each part must contain a name attribute specifying the name of the part

Additionally...

  • each part may be encoded and should specify the Content-Tranfer-Encoding header if it is encoded
  • files should include a filename in the Content-Disposition header
  • files should specify the Content-Type for the transmitted file

Whew! That's a lot of rules to follow, but it's a bit easier to grok once you see it action.

Here's what the body of the Postman request above looks like:

------WebKitFormBoundaryvlb7BC9EAvfLB2q5
Content-Disposition: form-data; name="username"

brian mancini  
------WebKitFormBoundaryvlb7BC9EAvfLB2q5
Content-Disposition: form-data; name="somefile"; filename="test.txt"  
Content-Type: text/plain

hello world!  
------WebKitFormBoundaryvlb7BC9EAvfLB2q5--

In the sample request, we had two inputs, which corresponds to two parts in our request:

The first part was a text input with the value brian mancini. The part looks like:

------WebKitFormBoundaryvlb7BC9EAvfLB2q5
Content-Disposition: form-data; name="username"

brian mancini  

The Content-Disposition header has a name value that matches the name of the form control that was submitted.

The second part was a file that was uploaded with the file contents of hello world!. This part looks like:

------WebKitFormBoundaryvlb7BC9EAvfLB2q5
Content-Disposition: form-data; name="somefile"; filename="test.txt"  
Content-Type: text/plain

hello world!  

This Content-Disposition header includes the name of the field, somefile, as well as the submitted filename, test.txt. Lastly, because the file was a text file it includes the Content-Type of text/plain.

If this was a binary file instead of a text file, the unencoded data would appear after the part's headers and the Content-Type would be application/octet-stream.

Now that we have an understanding of how multipart/form-data works, lets take a look a some tools that can help us with our Node apps.

Multer

As you saw in the last section, form-data has a lot going on. If you look at the sample application we created at the begining to log request output, you'll see that the request body has the raw multipart/form-data. We need a way to get that information parsed so that our application can easily use it.

Thankfully we have a few tools to help us out. In particular, multer is a fanstastic tool for easily working with form data.

multer is built ontop of busboy which is a parser for HTML form data. Multer provides middleware for multipart/form-data.

Below is a simple app that will allow a single file or multiple files to be uploaded:

let multer  = require('multer');  
let express = require('express');  
let app     = express();  
let upload  = multer({ storage: multer.memoryStorage() });

app.post('/single', upload.single('somefile'), (req, res) => {  
  console.log(req.body);
  console.log(req.file);
  res.send();
});

app.post('/array', upload.array('somefile'), (req, res) => {  
  console.log(req.body)
  console.log(req.files);
  res.send();
});

app.listen(8080);  

The middleware that gets added as a method to the route handler applies the multipart/form-data parsing. multer has a variety of methods to control and validate the input data.

With multer, normal text form data is automatically parsed onto req.body regardless of the use of single, array, or fields. These three methods refer to how file uploads are handled.

Single File Upload

In the first handler, /single, we use the single method to construct the middleware and instruct the middleware to look for a file that has been uploaded with the field name somefile.

A Postman requires to this handler looks like:

The resulting output from the service will contain:

Output from req.body:

{ username: 'brian mancini' }

Output from req.file:

{ 
  fieldname: 'somefile',
  originalname: 'test.txt',
  encoding: '7bit',
  mimetype: 'text/plain',
  buffer: <Buffer 68 65 6c 6c 6f 20 77 6f 72 6c 64 21>,
  size: 12 
}

As you can see the, text fields were parsed into the request body and the file was attached to file property.

Multifile Upload

If we want to upload multiple files we can use the array method.

A request where we upload multiple files at the same time may look like this:

In the /array handler aboe, we use the array method of multer to look for multiple files uploaded with the same form name. This will place the file results into req.files as an array of file objects.

The resulting output for this method is:

Output of req.body:

{ username: 'brian mancini' }

Output of req.files:

[ 
  { 
    fieldname: 'somefile',
    originalname: 'test.txt',
    encoding: '7bit',
    mimetype: 'text/plain',
    buffer: <Buffer 68 65 6c 6c 6f 20 77 6f 72 6c 64 21>,
    size: 12 
  },
  { 
    fieldname: 'somefile',
    originalname: 'test.txt',
    encoding: '7bit',
    mimetype: 'text/plain',
    buffer: <Buffer 68 65 6c 6c 6f 20 77 6f 72 6c 64 21>,
    size: 12 
  } 
]

From these properties, you can easily get standard text data as well as parsed files from multipart/form-data requests to Node in a variety of different input types.

Conclusion

Hopefully this has broadened your horizons on multipart/form-data and how to use it with Node applications.

If you want to download the example code, check out the Github repo: http://github.com/bmancini55/node-multer-test.

comments powered by Disqus