Map Reduce in MongoHub

Posted on 12th December 2012 in development, mongodb, Open Source Software, Software

I recently had a request to produce a list of the users who had created the most content in a system that I was building.

All of the data is stored in a MongoDB, which made this a little more challenging than a normal query. Each piece of content in the system has an embedded document with some details about the user who created it, including their username, so I immediately thought of performing a Map/Reduce on the collection to gather the data that had been asked for.

There are a number of examples on the web for how to perform Map/Reduce in MongoDB, but I will admit that I am not too familiar with the CLI for MongoDB, as I have been primarily using MongoHub to interact with the data. I knew that there was a MapReduce tab on the MongoHub interface, which I had never used before, so I wanted to try it out.

I struggled for a while to figure out exactly what needed to be entered where on the MapReduce tab and what format the entries should take, with no luck finding any examples online explaining this tab’s usage. A combination of the MongoDB docs, this very helpful thread on the mongodb-user mailing list and looking through the code of MongoHub, I finally came up with something that worked.

Map Reduce in MongoHub

Map Reduce in MongoHub

The javascript in the Map text entry is called for every document returned by the Query, which in my case is every document in my chosen collection.

The javascript in the Reduce text entry is called for every unique key emitted from the map function, with an array containing the data that accompanied each emit. In my case the key is each unique username and the data is an array containing the value 1 in each item.

By summing the contents of the array I get the number of items created by each user in the system. I later changed this to simply return the length of the ‘count’ array, which gave the same total value as each entry was 1.

The part that took me the longest to work out was the Out text entry, even though it was the simplest in the end. I was unsure as to what should be put in here and also what format the entry should take. This example of Map/Reduce in the MongoDB docs was the key to my answer.

Once I had the Output field filled in correctly I was able to run the query and create a new collection called “Ambassador” which contained the results of the Map/Reduce. I was then able to perform queries on this data to discover who had created the most content.

Hopefully this example will help someone else figure out how to use this tab in a shorter time than it took me.

comments: Comments Off

Adding GD to PHP on OS X 10.5.5 Leopard (Desktop)

Posted on 18th November 2008 in apache, development, OSX, php

Introduction

I am trying to help out on a cakePHP project for a friend. One of the things that has been added recently was a phpCaptcha component. This component uses the GD module to create images using text. Of course everything works fine on the test website, but not on my machine.
I am working on a MacBook Pro, Intel Core 2 Duo running Mac OS X Version 10.5.5.

This is mainly for my own memory, as it took me a few hours to get this working, when it should have only taken a few minutes.

Disclaimer

If anyone actually finds this page and follows the instructions on it, you do so at you’re own risk. Backup your system before you start. Please follow the information provided by topicdesk.com. The information here worked on my system, on 18th November 2008, it may or may not work for you. If something goes wrong, I will not be able to fix it for you. I will accept no responsibility for any use of this information.

Background Research

There are a number of tutorials out there that already deal with this issue, the ones that I found were either slightly out of date, or missed something. So rather than have to go through the discovery process again I am leaving myself a note here.

Let me stress that point, this is a note for myself. Other people who know far more about OS X/Apache/PHP than me have written excellent articles, which have helped me get to this point.

First of all here are some links to people who have already written about the process.

  • topicdesk.com – A pdf that contains the instructions for adding the GD Extensino to PHP5 on OS X Server 10.5.x
  • Kenior Design – Good instructions with a decent level of detail. Interesting conversations in the comments.
  • 90kts.com – A shorthand version of the topicdesk pdf. The comments to this post helped me get to my solution.

The instructions found in the PDF from topicdesk.com got me as far as installing GD and editing my php.ini file, which meant that GD was listed in phpinfo(). (Instructions on how to use phpinfo are in the pdf from topicdesk.com)

At this point I thought that I was home and dry, however this wasn’t the case. When trying to access a captcha image I was still not seeing anything. Checking my apache error log (tail /var/log/apache2/error_log), I discovered this error message.

The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug

Which lead me to these pages that describe what is going on.

With all these sources of information at hand I went ahead and tried to follow the steps outlined, and failed. The mistakes were simple ones, but for someone who is not used to working from the OS X command line, I am sure they are not uncommon.

These are the mistakes that I made: -

  • When building the various downloads, ensure that you are building for the correct architecture, or build for multiple platforms. My MacBook Pro has an Intel Core 2 Duo, which is a 64 bit CPU not a 32 bit.
  • Before downloading any of the source code that is listed in these posts, ensure that you are downloading the version appropriate to your installation.
    1. I am using OS X 10.5.5, most of the references have links to 10.5.4 or earlier.
    2. My PHP installation is 5.2.6, again the reference material talks about 5.2.5 or earlier.

Successful Installation

So these are the command lines instructions that I used to successfully install GD into PHP 5.2.6 on Mac OS X Leopard 10.5.5 (Most of this is identical to the PDF from topicdesk.com, with changes to the download locations). I am including these steps here merely for completeness. Please refer to their PDF for more information: -

Installing libjpeg was exactly how the topicdeskpdf explained.

mkdir -p /SourceCache
cd /SourceCache

curl -O http://www.ijg.org/files/jpegsrc.v6b.tar.gz
tar xzpf jpegsrc.v6b.tar.gz

cd /SourceCache/jpeg-6b
cp /usr/share/libtool/config.sub .
cp /usr/share/libtool/config.guess .

(This is what I used as it I am installing for my 64 bit CPU)

MACOSX_DEPLOYMENT_TARGET=10.5 CFLAGS=”-arch ppc -arch ppc64 -arch i386 -arch x86_64 -g -Os -pipe -no-cpp-precomp” CCFLAGS=”-arch ppc -arch ppc64 -arch i386 -arch x86_64 -g -Os -pipe” CXXFLAGS=”-arch ppc -arch ppc64 -arch i386 -arch x86_64 -g -Os -pipe” LDFLAGS=”-arch ppc -arch ppc64 -arch i386 -arch x86_64 -bind_at_load” ./configure –enable-shared

make
sudo mkdir -p /usr/local/include
sudo mkdir -p /usr/local/bin
sudo mkdir -p /usr/local/lib
sudo mkdir -p /usr/local/man/man1
sudo make install

(Deviation from the topicdesk.comPDF)
Now we need to get a newer version of FreeType so that we don’t see that horrible error. I chose to use a version from this location http://download.savannah.gnu.org/releases/freetype/ specifically the 2.3.7 version.

cd /SourceCache
curl -O http://download.savannah.gnu.org/releases/freetype/freetype-2.3.7.tar.gz
tar xvfp freetype-2.3.7.tar.gz
cd freetype-2.3.7

This is the important line as it re-compiles FreeType with these options –with-fsspec=no –with-fsref=no –with-quickdraw-toolbox=no –with-quickdraw-carbon=no
This prevents FreeType from causing the Fork() error.

MACOSX_DEPLOYMENT_TARGET=10.5 CFLAGS=”-arch ppc -arch ppc64 -arch i386 -arch x86_64 -g -Os -pipe -no-cpp-precomp” CCFLAGS=”-arch ppc -arch ppc64 -arch i386 -arch x86_64 -g -Os -pipe” CXXFLAGS=”-arch ppc -arch ppc64 -arch i386 -arch x86_64 -g -Os -pipe” LDFLAGS=”-arch ppc -arch ppc64 -arch i386 -arch x86_64 -bind_at_load” ./configure –with-fsspec=no –with-fsref=no –with-quickdraw-toolbox=no –with-quickdraw-carbon=no

make
sudo make install

Then we get back to the topicdesk PDF. Again my versions of PHP and OS X are different from the ones that they are referring to, so I went looking here http://www.opensource.apple.com/darwinsource/10.5.5/ to find the right code to download, and found http://www.opensource.apple.com/darwinsource/10.5.5/apache_mod_php-44.1/

cd /sourcecache
curl -O http://www.opensource.apple.com/darwinsource/10.5.5/apache_mod_php-44.1/php-5.2.6.tar.bz2
tar xjf php-5.2.6.tar.bz2
cd php-5.2.6/ext/gd

sudo phpize

This next line introduces another change, in that I am now linking to the newly compiled FreeType library, not the pre-installed version. (–with-freetype-dir=/usr/local/lib)

MACOSX_DEPLOYMENT_TARGET=10.5 CFLAGS=”-arch ppc -arch ppc64 -arch i386 -arch x86_64 -g -Os -pipe -no-cpp-precomp” CCFLAGS=”-arch ppc -arch ppc64 -arch i386 -arch x86_64 -g -Os -pipe” CXXFLAGS=”-arch ppc -arch ppc64 -arch i386 -arch x86_64 -g -Os -pipe” LDFLAGS=”-arch ppc -arch ppc64 -arch i386 -arch x86_64 -bind_at_load” ./configure –with-zlib-dir=/usr –with-jpeg-dir=/usr/local/lib –with-png-dir=/usr/X11R6 –with-freetype-dir=/usr/local/lib –with-xpm-dir=/usr/X11R6

make
sudo make install

sudo apachectl graceful

Finished! Wasn’t so bad after all. Remember the configure options that I have used here are because I am compiling on a 64 bit CPU. I imagine that I could change some of those compilation options to simply create the 64 bit option, but as I was following other tutorials to get this done, I didn’t want to stray off the path even further than I had to.

comments: 4 »

Football Application – Preparing to code

Posted on 18th July 2008 in development

A Domain Model?

I don’t know. I have only read the first chapter of Eric Evans’, Domain Driven Design so far, so I might be on the right track here.
From what we have figured out so far from the requirements we have the following entities.
  • Club
  • Member
  • Team
  • Manager
  • Player
  • Staff-Member
This is where I am starting from. It might not be the right place, but it seems as good as any right now. I believe that with these concepts I should be able to start building an application that the users will understand.
Some more detailed requirements

Further discussions with people related to the club highlights the fact that the system should be available outside of office hours and over the internet, apparently the managers/coaches of the kids teams are volunteers who have day jobs. It should be secured in some way so that club information stays private. This leads me to the decision to build the application as a website. The website will be hosted on a windows server machine with IIS and ASP.NET available.
(Ok, so that is a little staged, but isn’t all of this?)
I decide to leverage what I can of the Membership, Roles, Profile and Personalization features offered by ASP.NET. (Learning about the ASP.NET MVC or Monorail is a step too far right now, I might do that on the second iteration.)
After this decision is made I need to figure out what to do first. As the club secretary is the person who will be in acting as the administrator of the software I need to make sure that they have the features that they want as soon as possible. We need to get them familiar with the system and allow them to get the initial sets of data entered.
Somehow I manage to get in touch with the club secretary again to talk through some features of the system. This is what I got out of the conversation.
(This is going to make BDD people cry, please let me know what I can do to make this better)
When I try to access data in the system, I should not be allowed to, unless I have logged in.
Given that a user is not logged in,
When sensitive data is accessed,
Should be shown the login screen.
When I want to contact a member, I need to be able to see a list of all members, so that I can see the ones I want.
Given that there are members in the system,
When the member list page loads,
A list is populated with all the members of the club.
When a member joins the club, they need to be added to the system, so that I can make sure they are organised.
Given the member does not already have details in the system,
When I enter their details,
They appear in the list of members for my club.
Given the member is already in the system,
When I enter their details,
I should be made aware that they already exist
And I should have the choice to duplicate the entry

When a member leaves that club, I need to be able to remove them from the system, so that I don’t end up with lots of out of date information.
Given that a member exists in the system,
When I delete their information,
They are removed from all lists of members,
And they are removed from all Teams

As I am leaving the club owner wants to find out how things are going, so we go through the list of requirements that I have gathered so far and he is quite happy with them. In fact he thinks that it sounds so good that we should make this available to other clubs to use, at a price of course. This introduces another concept, multiple clubs active in the system at the same time. We might consider setting each club up with their own system, but lets not go there until we know that it is a problem.
I think that this will do for now. There is actually quite a lot of work involved in getting to this point. Some infrastructure needs to be in place to handle some of the non-functional requirements that we have.
  • The system needs to be stable.
  • If there is a problem we need to be able to find out what happened.
  • The system need to be responsive.
Lets jump in and start coding. I will be trying to do as much as I can in a TDD style, no sure how the database interaction part is going to go yet, I always hate writing tests that hit the database.
As the title of the first post suggested, I will be using NHIbernate for data access and other goodies that object relational mappers offer. Ninject will be used for any dependency injection needs, I am sure that there will be some somewhere. jQuery will be used for the web UI and any ajax capabilities. What about the iPhone? Well I am not sure about that yet, all I know is that I have an iPod Touch, a usb cable, a Mac Book Pro and an iPhone developer account. I am sure that I can come up with something for remote administration, even if it is just an iPhone Web Application.
I’ll be back when I have something interesting to show.
comments: 1 »

NHibernate, Ninject, jQuery and the iPhone?

Posted on 17th July 2008 in development

Background

I am intending on writing a small application over the next day or two that will help me understand, at a reasonable level, some interesting technologies.

The software will be a fictional Football Club organisation application; that is Football, the game where you kick the round ball with your feet, not whatever this soccer thing is, never seen someone playing a football game in their socks ;) That can be used to manage club members, teams, management staff, fixtures and whatever else comes up along the way.

The reason for doing this is to delve into some of the technologies that I have been meaning to learn for some time. After reading through documentation, articles, blog posts, tutorials and whatever text I could get my hands on, I am ready to start attempting to implement something. If anyone does find this series then be warned, it is more of a learning experience than a manual. Something that I say/do in the first post may later be ripped out and done in a different way later on. Also if you spot something that isn’t right, let me know, that’s what comments are there for.

Requirements

So lets get going then. The first thing we need is a set of requirements. I am going to use a style that might be similar in style to BDD requirements, although I’ve never really done any BDD before.
Requirements gathering can be quite boring, so I’m going to tell a story.

I set off in search of some people with a vested interest in the software. The first person I meet is the club owner.
  • As a club owner, I want my club to be well organised, so that we run efficiently.
That’s it, that is the only requirement he has, after that he gets on his phone and I leave quietly. Don’t you just love requirements. Ok, lets try and get some more out of the people at the club. I will assume that they have agreed that a software solution will be created that has some sort of data storage ability. I’m certainly not going to go through this exercise and tell them to go and buy everyone their own leather bound diary!

I find the club secretary who is responsible for looking after the smooth running and organisation of the club.
  1. As club secretary, I want to be able to know who the members of the club are, so that I can contact them when I need to.
  2. As club secretary, I want to be able to control which managers have access to which teams through the system, so that they are not overwhelmed by the amount of information.
  3. As club secretary, I want to be able to send emails to our members, so that I can send out reminders or newsletters.
Wandering around the club offices I find the treasurer, who is responsible for the finances of the club. Far too busy to talk I only get a few requirements.
  1. As club treasurer, I want to be able to keep track of membership fees, so that I can balance the books.
  2. As club treasurer, I want to be able to keep track of any fines awarded to players or staff, so that we can ensure they are paid in time.
Later on I manage to get a meeting with a few of the team managers associated with the club. Ok, so all I managed to do was catch up with a manager during half time at a game.
  1. As a team manager, I want to know who is available to me for selection, so that I can pick the best team.
  2. As a team manager, I want to know which members of the club’s ‘back room’ staff have been assigned to my team, so that I can make sure the players are well looked after.
  3. As a team manager, I want to be able to access my fixture list whenever I need to, so that I can make sure everyone knows who we are playing next.
While I’m at the game, I decided to talk to some of the substitutes, they had some requirements too.
  1. As a player, I want to be able to access the fixtures, because the manager sometimes forgets.
  2. As a player, I want to be able to let the manger know when I am on holiday, so that he doesn’t put me in the team.
  3. As a player, I want my personal information to be secure, so that I don’t wake up to find someone has stolen my identity.
As I am thanking the players and setting off to get back to work, (doesn’t everyone work on a Saturday evening?), one of the team’s fanatics fans confronts me. “You making a website for this lot? Well I got things I want on it too!”
Oh well, might as well humour him.
  1. As a fan, I want to be able to see the fixtures, so that I don’t miss a game.
  2. As a fan, I want to be able to see the results of the games, so when I miss one I know what the score was.
  3. As a fan, I want to be able to read news stories about the team, so that I am never out of the loop.
  4. As a fan, I want to be able to talk on the internet with other fans, so that we can make up new chants.
After that I switched off and drew fluffy clouds all over my notes.

Overall I think that we have a good enough set of feature requests to get us going. As everyone in the club is too busy to talk to us now, we are going to have to work out priorities by ourselves. We have a week day to get something together to show the club owner. The intention of the software is to help organise the club, so the fan’s website requirements can probably be put deep into the product backlog!
Now that you have some background to what I am trying to do, the next post will hopefully start getting into some technology.

comments: Comments Off

Using Ninject in a Web Application

Posted on 10th July 2008 in development, ninject

I have been meaning to look at Ninject for a while now, and today I finally got my chance.
I am only using some basic features of Ninject to replace my normal use of Constructor Injection that I tend to favour.

This first example is based on the code needed to drive my jQuery examples. When I reached the point of requiring server side data, I decided that I wanted to try out Ninject.

I wanted to load some data using an HttpHandler. I used a handler as I wanted to also save data using the same URL.
The handler has what might be a PageController, which will determine the action to take based on the request being an HTTP GET or POST.
The controller itself relies on an implementation of IDataLayer which handles the data access.

Currently I have only explored the basic binding techniques in Ninject. This involves creating a class that specifies the bindings, then registering it with the Kernel.

    using System;
using Ninject.Core;

namespace Core.NinjectModules{
public class PeopleModule : StandardModule{
public override void Load(){
Bind().To();
Bind().ToSelf();
}
}
}

This is the binding module for specifying the dependancies of my People class.

    using System;
using System.Text;
using Ninject.Core;

namespace Core{
public class People{
IDataLayer _dataAccess;

[Inject]
public People(IDataLayer dataAccess){
_dataAccess = dataAccess;
}
...
}
}

This is the People class and with the single attribute needed to identify to ninject what to use for injection.

It is using the constructor injection feature. I have removed the rest of the implementation of the class to help with reading.

Ninject offers a set of features specifically for web applications. They can be found in the Ninject.Frameworks.Web namespace.

From this namespace I used the NinjectHttpApplication and HttpHandlerBase classes.

The NinjectHttpApplication provides the implementation required to attach a ninject Kernel to your HttpApplication.

It is an abstract class that requires the user to implement a CreateKernel() method, in which the user initialises the Kernel.

    using System;
using Ninject.Framework.Web;
using Ninject.Core;

namespace jQueryExamples
{
public class Global : NinjectHttpApplication{
protected override Ninject.Core.IKernel CreateKernel(){
IKernel kernel = new StandardKernel(new Core.NinjectModules.People());
return kernel;
}
}
}

This is all that I needed to register my People module with the ninject Kernel.

I then needed the People controller to be injected into my handler. This is where the HttpHandlerBase comes into it.


using System;
using System.Web;
using System.Web.Services;
using Core;
using Ninject.Core;

namespace jQueryExamples.handlers{

[WebService(Namespace = "http://tempuri.org/")]
[WebServiceBinding(ConformsTo = WsiProfiles.BasicProfile1_1)]
public class people : Ninject.Framework.Web.HttpHandlerBase{
private People _people;

[Inject]
public People PeopleController{
get { return _people; }
set { _people = value; }
}

protected override void DoProcessRequest(HttpContext context){
_people.RequestType = context.Request.RequestType;
string responseText = _people.Process();
context.Response.Clear();
context.Response.Write(responseText);
context.Response.ContentType = "text/xml";
context.Response.StatusCode = 200;
}

public override bool IsReusable{
get { return false; }
}
}
}

This is the entire implimentation of my handler. The People class will be injected into the handler when it is needed.

That is it for now for ninject. I can do the basics, next I need to learn about the context specific bindings etc

comments: Comments Off