Map Reduce in MongoHub

Posted on 12th December 2012 in development, mongodb, Open Source Software, Software

I recently had a request to produce a list of the users who had created the most content in a system that I was building.

All of the data is stored in a MongoDB, which made this a little more challenging than a normal query. Each piece of content in the system has an embedded document with some details about the user who created it, including their username, so I immediately thought of performing a Map/Reduce on the collection to gather the data that had been asked for.

There are a number of examples on the web for how to perform Map/Reduce in MongoDB, but I will admit that I am not too familiar with the CLI for MongoDB, as I have been primarily using MongoHub to interact with the data. I knew that there was a MapReduce tab on the MongoHub interface, which I had never used before, so I wanted to try it out.

I struggled for a while to figure out exactly what needed to be entered where on the MapReduce tab and what format the entries should take, with no luck finding any examples online explaining this tab’s usage. A combination of the MongoDB docs, this very helpful thread on the mongodb-user mailing list and looking through the code of MongoHub, I finally came up with something that worked.

Map Reduce in MongoHub

Map Reduce in MongoHub

The javascript in the Map text entry is called for every document returned by the Query, which in my case is every document in my chosen collection.

The javascript in the Reduce text entry is called for every unique key emitted from the map function, with an array containing the data that accompanied each emit. In my case the key is each unique username and the data is an array containing the value 1 in each item.

By summing the contents of the array I get the number of items created by each user in the system. I later changed this to simply return the length of the ‘count’ array, which gave the same total value as each entry was 1.

The part that took me the longest to work out was the Out text entry, even though it was the simplest in the end. I was unsure as to what should be put in here and also what format the entry should take. This example of Map/Reduce in the MongoDB docs was the key to my answer.

Once I had the Output field filled in correctly I was able to run the query and create a new collection called “Ambassador” which contained the results of the Map/Reduce. I was then able to perform queries on this data to discover who had created the most content.

Hopefully this example will help someone else figure out how to use this tab in a shorter time than it took me.

comments: Comments Off