Newline delimited JSON

From Trephine

Jump to: navigation, search
« JavaScript prototype inheritance NDJ vs other popular formats »

[subscribe] Recent blog entries

Live Demos

Newline delimited JSON

This short article describes a file format which I've found extraordinarily useful on many occasions, but haven't seen written about. It's called "newline delimited JSON" (NDJ), and is great for tackling a wide array of programming problems.

A newline delimited JSON file is exactly what it sounds like: a file containing arbitrary JSON encoded data structures, separated by line breaks ("\n"). In addition, if the first two characters of a given line are "//", then that line is to be considered a comment, and summarily ignored. Blank lines (lines consisting of only whitespace) are ignored as well.

Here's a simple example:

// example.ndj
{"id":1,"name":"Smith","tags":["agent","program"]}
{"id":2,"name":"Neo","tags":["whoa","knows kungfu"]}
{"id":3,"name":"Egon","tags":["streams","wrong movie"]}

Benefits of newline delimited JSON:

  • Easy to parse (see below for a few examples)
  • Easy to stream both as input and output
  • Allows for data structures of arbitrary complexity
  • Allows for heterogeneous data (a mix of different kinds)
  • Easy to split for parallel processing
  • JSON libraries are widely available (many languages have built-in support) and FAST

Here's an example NDJ parser written in JavaScript:

/**
 * Simple stream-like NDJ parser.
 * @param source The input string.
 * @param callback Function to pass decoded objects to as encountered.
 * @param errorcallback Function to call when a line can't be decoded (optional).
 */
function parseNDJ( source, callback, errorcallback ) {
  var pos, next = -1, len = source.length, sub;
  while ( (pos = next+1) < len ) {
    var next = source.indexOf( "\n", pos );
    if (next==-1) next = len;
    if (!(next-pos)) continue;
    sub = source.substring( pos, next );
    if (sub.substr(0,2)=='//' || (/^\s*$/).test(sub)) continue;
    try {
      callback( eval( ['(', sub, ')'].join('') ) );
    } catch (err) {
      if (errorcallback) errorcallback( err, sub );
    }
  }
}

And here's one written in PHP (requires PHP 5.2+, or a substitue library for providing the json_decode() function):

/**
 * Simple streaming NDJ parser.
 * @param $file File to parse (use 'php://stdin' for standard input)
 * @param $callback Function to call for each decoded data object.
 */
function parseNDJ( $file, $callback ) {
  $fp = fopen( $file, 'rt' );
  while( !feof( $fp ) ) {
    $line = fgets( $fp );
    if ( !$line || preg_match( '%^\\s*$%', $line ) ) continue;
    if ( isset($line[1]) && $line[0]=='/' && $line[1]=='/' ) continue;
    $data = @json_decode( $line, true );
    call_user_func( $callback, $data );
  }
  fclose( $fp );
}

So there it is. I wanted to throw together examples for Python and Ruby, but I just didn't have the time. As always, I look forward to your comments!

Public domain declaration

Just so there's no confusion: all of the code snippets on this page are provided "AS IS", without warranty of any kind, express or implied.

All of the code snippets on this page are hereby released into the public domain by the me, the copyright holder. This applies worldwide. Or in case this is not legally possible: The copyright holder grants any entity the right to use this work for any purpose, without any conditions, unless such conditions are required by law.

If you'd feel better with a "real" license, you're free to use code snippets on this page under the MIT license as described on the about page.

Any links back to this site are always appreciated, but not required. Enjoy!

--Jim R. Wilson (jimbojw) 21:33, 15 April 2009 (UTC)
Personal tools