The Holy Grail of the Funky Data Model

I’ve blogged about the “Funky Data Model” before…

When I was working with some very clever Oracle database dudes I came up with what I thought was a great idea. The idea was that rather than having database tables for users, pages, blog posts etc that we could have just two tables called “things” and “types of things”.

The Oracle guys rolled their eyes and said everyone goes through this stage of thinking sometime and they even had a name for it… “the funky data model”. They told me that The Funky Data Model, although cool, never, ever works. They were probably right.

But there’s something that keeps pulling me back to The Funky Data model simply because working with relational databases is a bit of a pain. Relational databases really are wonderful things, you put your data in, and if you’ve put it in properly, you can get your data out… quickly.

It would seem like that saving data would be an easy thing for a computer to do, wouldn’t it? It’s not. Well actually, saving data is an easy thing to do but if you ever want to get it back before next tuesday then you are probably going to find that a relational database or two will be in the mix somewhere.

The problem with relational databases is that they are pretty “fixed”. For example, if you want to add a “recipe” that has a title, then all recipes have to have a title, a recipe can’t have two titles. You have to pretty much decide what you are going to put in your database before you put it in. Life (and data) often isn’t like that… it’s messier.

So last week, I thought I’d explore what are called schema-less databases for a day and see how far I got. These databases are less “fixed” than relational ones, closer to my idea of what the funky data model is all about and probably don’t work but I wanted to see if I could maybe take one small step towards the funky data model for the hell of it.

My “plan” was to stick a load of data in and then see how easy (and quick) it was to get out using python (my programming language of choice).

I gave dbxml a wide berth simply because the means of getting your data out, called xQuery seems to require brains I don’t have.

In the morning I tried working with CouchDB. I used MacPorts to install all the dependencies and started adding data. Things were going swimmingly until I discovered that you get your data out by writing snippets of JavaScript. My JavaScript isn’t great, but then JavaScript itself isn’t great. The whole JavaScript-iness of it put me off.

In the afternoon, slightly disappointed with CouchDB, Andy suggested I give the ZODB a whirl. The ZODB is what underpins both Zope and Plone. It’s an object database with a long track record and as you know, very python-friendly. After working with a database for a while I struggled to get the “server part” running on MacOS X called the ZEO. And because I base lots of technological decisions on gut-feelings and tea leaves and because I give up quickly I gave up the ZODB quickly and thought the “there has to better way” thought beloved of many a funky data model hunter.

In the evening, thinking that relational databases weren’t so bad after all, I gave a last ditch trial run to MongoDB. Installation was easy enough and the documentation is very pretty (one of the most important things for geek stuff).

MongoDB is a database that stores its data in dictionaries. A dictionary is a complex enough to store anything, for example a recipe might look like this…

recipe = {'title': "Eggs on Toast",
'ingredients': ["eggs", "butter", "toast"],
'yumminess': 7.5}

… but interestingly, because Mongo is “schema-less”,  I could add extra data to my recipe, an image or “preparation instructions” without requiring all my other recipes to have an image.

Even more interestingly, you can create “collections” simply adding something to a collection. This too is unlike the relational model, where you have to make the container BEFORE you put things in it and you have to decide exactly WHAT A THING is before you can put it into your container. With MongoDB you can simply type…

db.recipes.save( {'title': 'Bacon Sandwich"})

… and not only have you added a recipe, you’ve created the collection called “recipes”. This is staggeringly “fall off the chair” close to what I think of a Funky Data Model.

Now, I know what you’re thinking. You are thinking “Any fool can put data INTO a database, it’s getting it OUT that counts”. And you, as ever, are totally right. How do you do it?

I was pleasantly suprised that you get data out of MongoDB by creating a query that is itself a dictionary. So using my example above, to write a query that gets my bacon sandwich out of the database, I simply …

cursor = db.recipes.find( {'title': "Bacon Sandwich"} )
bacon_sandwich = cursor.find_one( )
print bacon_sandwich['title']

… There! How funky, easy and simple was that? So, I then set about tweaking my engagement engine crawlers to fill up a database to see how it performed in the “getting data out” tests. Within minutes I’d added my database to Django and although I’d only added a few thousand records, it seemed to be able to get them out very quickly and easily indeed. I’d expected my crawlers, which run multiple threads would have blown MongoDB up, but it seemed to cope fine.

Of course, one of the wonderful thing about relational databases is that you can write pretty complex queries that bring you data back quickly (normally). I can see that MongoDB is probably going to struggle when I start looking for recipes that include “bacon” and have a yumminess factor of 8 or above (which is most of them as it happens).

My big problem now, is that when looked in the eye by the Funky Data Model, I realise that, like a blank piece of paper, a totally fluid database is a very daunting thing. Would you create a collection called “recipes” or one called “sandwiches”? Would you create a collection called “ingredients” or, as in the real world, put the ingredients in the sandwich itself? Or both?

So… here I am, really impressed with MongoDB and realising that my brain is still vaguely stuck in relational mode. It seems that with MongoDB I might have to spend a lot more time thinking about how to get my data out, time that you wouldn’t need to spend with a relational database.

I’m planning to give MongoDB a more thorough test in the next few weeks, I’m both excited and scared it’s capabilities, it really might be the funky data model that has eluded me for so long and I may not have the abilities to be able to deal with it.

I’m also planning to be a bit more careful about what I wish for.

This entry was posted in Uncategorized and tagged , , , . Bookmark the permalink.

6 Responses to The Holy Grail of the Funky Data Model

  1. dm says:

    try from the shell:

    db.recipes.ensureIndex({ingredients:1});
    db.recipes.find(
    {
    ingredients : ‘bacon’,
    yumminess : { $gt : 8.0 }
    } );

    should be fast.

  2. dm says:

    um,

    db.recipes.find({ingredients : ‘bacon’,yumminess : { $gt : 8.0 }} );

    on one line in the shell, shell doesn’t like multiline i think.

  3. Dan says:

    db.recipes.find( { title : /bacon/i , yumminess : { $gte : 8 } } );

    should do the trick.

    can i see the results :)

  4. Dan says:

    This should get you what you want:

    db.recipes.find( { title : /bacon/i , yumminess : { $gte : 8 } } );

    Can you send me the result :)

  5. Mike Miller says:

    Just stumbled upon this. One nice thing about the CouchDB community is that their are libs in any language you like. If you’re not happy with curl at the command line, I’m very happy with couchdb-python lib from cmlenz. Mixed with ipython it’s really natural, and we’ve been running it in production for half a year now, so very well supported.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>