Map Reduce in MongoHub

Posted on 12th December 2012 in development, mongodb, Open Source Software, Software

I recently had a request to produce a list of the users who had created the most content in a system that I was building.

All of the data is stored in a MongoDB, which made this a little more challenging than a normal query. Each piece of content in the system has an embedded document with some details about the user who created it, including their username, so I immediately thought of performing a Map/Reduce on the collection to gather the data that had been asked for.

There are a number of examples on the web for how to perform Map/Reduce in MongoDB, but I will admit that I am not too familiar with the CLI for MongoDB, as I have been primarily using MongoHub to interact with the data. I knew that there was a MapReduce tab on the MongoHub interface, which I had never used before, so I wanted to try it out.

I struggled for a while to figure out exactly what needed to be entered where on the MapReduce tab and what format the entries should take, with no luck finding any examples online explaining this tab’s usage. A combination of the MongoDB docs, this very helpful thread on the mongodb-user mailing list and looking through the code of MongoHub, I finally came up with something that worked.

Map Reduce in MongoHub

Map Reduce in MongoHub

The javascript in the Map text entry is called for every document returned by the Query, which in my case is every document in my chosen collection.

The javascript in the Reduce text entry is called for every unique key emitted from the map function, with an array containing the data that accompanied each emit. In my case the key is each unique username and the data is an array containing the value 1 in each item.

By summing the contents of the array I get the number of items created by each user in the system. I later changed this to simply return the length of the ‘count’ array, which gave the same total value as each entry was 1.

The part that took me the longest to work out was the Out text entry, even though it was the simplest in the end. I was unsure as to what should be put in here and also what format the entry should take. This example of Map/Reduce in the MongoDB docs was the key to my answer.

Once I had the Output field filled in correctly I was able to run the query and create a new collection called “Ambassador” which contained the results of the Map/Reduce. I was then able to perform queries on this data to discover who had created the most content.

Hopefully this example will help someone else figure out how to use this tab in a shorter time than it took me.

comments: Comments Off

Doppler Project : Progress So Far

Posted on 3rd February 2008 in Doppler, Open Source Software

So the end of January has come and we have still not released Doppler v3.  Some progress has been made on fixing issues, however some have proven difficult to either reproduce or solve.  There are 23 22 open issues listed in the project, only 8 7 of these appear to be software issues, the others are future work listings.

Outstanding Bugs

4619 : Doppler fails to close podcast files after download
4615 : Doppler fails to clear old logs
5197 : Doppler consumes 100% CPU if total size spacesaver is reached and oldest file in directory is .incomplete
5252 : Retrying a failed download causes an exception
4964 : Repeated failure to retrieve size of mp3s
4616 : Doppler fails to set genre of downloaded podcast
4618 : Doppler fails to download podcasts on first try but then succeeds on second try
4614 : Trouble with DPI at 192

The first issue appears to be caused by the Windows Media Player plug-in, I will write about this separately as it needs more research.
The second issue I have been unable to reproduce and my version appears to delete all previous logs without any issue.
Issue 4616 has now (Monday 3th Feb 2008) been partially fixed.  The setting of genre will now happen if the file is an mp3 file.
The others on the list I am starting to look into when I have time.

Migration to vs 2008

One of the positive things this week on the project was the actual move to VS 2008.  This is something that has been attempted a couple of times.  If you look at the check-in history for the project, you will see a number of commits claiming to have moved the project to VS 2008.

Starting build script creation

So with all the changes to the solution and no longer requiring iTunes to be installed for a compilation, it is time to begin the Continuous Integration work.

This might seem pointless right now, as there are no tests for the project, but I want to start out with a simple script to help everyone working with the project to understand what is going on.  The plan was to do this today/tonight, however I ended up fixing one of the issues above.

IRC channel

It is a bit lonely in the #Doppler IRC channel on irc.freenode.net.  I have had three visitors :) The first was HolisticDeveloper who popped in to talk to me about a couple of issues.  The second was none other than Erwin himself, who came in to chat about project file migrations and the Window Media Player issue.  The final visitor was someone coming looking for help.  They seemed to be having problems with a feed they were subscribed to not downloading.  I wasn’t much help as I am running the latest developer version of the code.  I subscribed to the feed that was causing issues and found that it started downloading immediately.  I hope they have some luck finding out what the problem was.

comments: Comments Off

Preparing for Doppler v4 Iteration 1

Posted on 18th January 2008 in Doppler, Open Source Software

The Doppler Open Source Software project began in October 2007, when the source code for Doppler was released on codeplex, following this this post on DopplerRadio.

I joined the project not long after this and started looking forward to working on the source code.  Since then I have fixed a few problems in the current development version and actually checked source in.  This probably doesn’t seem like a big achievement, however this is my first OSS and I am proud to be contributing.

The current goal for the project is to finish Doppler 3 CTP and make it stable enough to release.  We seem to be getting rid of problems at a reasonable rate and aim to have this task complete by the end of January 2008.

In the middle of December we had a Live Meeting to discuss the future of the project, the roles people will play and what we want to get out of the time working with the code base.

It was decided that we would try and re-write the application using as much of the technology provided in the .net 3.5 stack as possible.  This would include: -

I am sure that there are some areas that I have missed, this is what I can remember right now.

We are going to attempt to use a Test Driven Design approach with Behaviour Driven Development style specifications to drive the project.  The BDD documentation will provide a set of user stories that we will attempt to prioritise and split into iterations.  I am not sure how well this is going to work on an OSS project, but it will be interesting to see.

I believe that one of the important decisions made was that anything used during the development of Doppler v4 will be available freely, or at least have a free version available.  The tools that are in use right now, that I know of, are :-

TeamCity and NCover are both commercial products; however JetBrains offer a free professional version of the TeamCity and Gnoso still make some beta versions of NCover available for free.  Both companies also provide a free license for OSS projects.  We are currently using the free professional version of TeamCity, with the intention of approaching them for an OSS license shortly.  Gnoso have already granted us an OSS project license for professional version of NCover and NCoverExplorer.

The reason for choosing NUnit instead of the build in testing features of Visual Studio 2008 was two-fold: -

  1. Visual Studio 2008 Express editions do not support the Microsoft Test features.
  2. NUnit is familiar to several of the developers and is an excellent testing framework.

As I work with the Doppler project I am going to write about my experiences on here.  The first subject will be the setting up of a Continuous Integration process for the project.

comments: Comments Off