justjs: node.js tutorials

New here? You might want to start at the beginning.

5/19 '12

Models, MongoDB and Modules, oh my!

Please note: "oh my" should be read in George Takei's voice for maximum effect.

Previously, on Nodes of Our Lives...

Last time out, we refactored our code to map URLs to functions in a much more elegant way, using Express routes. But we didn't add any new features to the blog. So in this installment we'll do something a little more tactile: we'll store posts in a MongoDB database, so that we can actually add new posts! This blog is just getting more real all the time.

Separation of Concerns: Breaking Things Up Into Modules

Our app is getting bigger. And as it gets bigger, the code gets harder to grapple with in a single file. It's time to break it up into several Node modules.

Readability isn't the only reason to split code up into modules. An even better reason: code for unrelated purposes shouldn't be jumbled together. Ideally, code that performs a particular function is best off in a separate module, relying as little as possible on knowledge of the outside world. 

One can take this too far, but it's a good guiding principle. For instance, the route callback that displays a post to the user needs to fetch posts by their slugs. But it doesn't have to contain all the agonizing details of how to do that with MongoDB in particular. And if it doesn't - if you factor out that code to a separate module - then you can replace MongoDB with another database at any time, without searching your source code from top to bottom for things that might break. All you have to do is consistently provide the same set of functions.

Our First Module: settings.js

We need to start breaking up our code somewhere. Handily, since we're bringing the database into the picture, we need a place to keep our database settings. And a module to contain our settings is just about the simplest module possible.

Modules in node.js are very straightforward. A module is just a .js file and can contain any JavaScript code you wish. Any variables you create in that file are completely invisible to all other modules - with one especially important exception: module.exports.

When you write:

var settings = require('./settings');

In your server.js file, the require function returns whatever was assigned to module.exports in the settings.js file.

This simple rule means that you can party down with as many global variables as you want in each module without worrying about the impact on other modules. Just remember that if you want to make something available to other modules, you should make it a property of the module.exports object.

So let's get started! Create a blog-3 folder in your node-apps folder. Inside blog-3, create a new file called settings.js and paste in:

 

module.exports = { 
  db: {
    host: '127.0.0.1',
    port: 27017,
    name: 'justjsblogdemo'
  },
  http: {
    port: 3000
  }
};
We've just created a module that exports two properties: db and http. Each of these has properties of its own. One set are handy for setting up a MongoDB database connection, while the other tells Express what port number to listen on.
 
At the top of server.js, we could now write:
var settings = require('./settings');
Note the use of ./ at the start of the filename. This tells Node not to look for a module that was installed in npm, but rather to look in the current directory.
 
However, I'm going to start off by first creating a container called "context" for all of the important objects that make up the application. That way we can easily pass both the settings and various objects like the Express application object to modules that have a need for them. We also get the option of being pickier and passing only the parts we deem important to a particular module. The latter is a smart choice for modules you might want to reuse in other projects.
 
So let's set up our context, beginning with adding the project-wide settings as a property of the context:
var context = {};
context.settings = require('./settings');
Now that we have our project settings, we're ready to configure the database:
setupDb();

function setupDb()
{
  context.db = require('./db');
  context.db.init(context, setupApp);
}
Notice that the setupDb function first requires the db.js file, which we'll examine in a moment, and then calls the init method of the object we've just assigned to context.db. By passing the context to context.db.init, we make our database code aware of the project-wide settings. And by passing a callback function, setupApp, we're able to proceed to the next step of starting up our application: creating the Express application object, much as we did last week. But to make the code more maintainable, we'll migrate the actual route code to a separate module, just as we did with the settings and the database code.
 
Here's the setupApp function:
function setupApp(err)
{
  if (err)
  {
    throw err;
  }
  context.app = require('./app');
  context.app.init(context, listen);
}
Error Handling and Node: the callback(err) pattern
 
Notice that setupApp expects an argument called err. If we've just asked for a database connection, and the operation fails, then we need a way to know about it. The standard way to do this in Node is to always pass an error object as the first argument of a callback - or pass null if everything is just fine. If other arguments make sense for the callback, they come after err.
 
If we want to handle errors in a particular way- if, for instance, we know how to work around it and make everything OK, or we want to log the error somewhere special - we can examine the err object. In this case, we just stop the server by throwing an exception, since we can't have a blog without a database:
if (err)
{
  throw err;
}

async.series: you CAN carry out a simple list of steps

"Hey, wait a minute! This callback business is tedious and confusing."
 
Hey, you're right! This callback stuff is great when you want to let 500 things happen at once and not be stuck waiting for the first one to complete. And it also avoids the dangers of programming with threads. Callbacks help us stay out of the server's way and keep things very, very fast.
 
But when you really do need things to happen in a simple, specific order, it gets annoying real quick. Plus it's hard to follow what's going on, and tough to maintain the code.
 
Fortunately there's a better way. We can take advantage of the async module. 
 
The async module provides a convenient way to call a series of callback functions in a specific order. Which gives us back the ability to do simple, sequential things, as we might in other frameworks and languages.
 
Actually, the async module gives us lots of cool toys. But the one we need right now is async.series.
 
The async.series function lets us specify a list of functions to be invoked in order and a final callback that gets invoked with an error if any of those functions fail.
 
First, don't forget to use the npm command to add the async module to your project:
cd node-apps/blog-3

npm install async
Now, here's how we'll take advantage of async.series:
var async = require('async');
async.series([setupDb, setupApp, listen], ready);
We still write our setupDb, setupApp and listen functions, almost as before. The big difference is that they don't have to worry about receiving an err parameter, and they don't have to be hard-coded to call the next step. Instead they receive a single callback function as a parameter, and their sole responsibility is to call that callback when they're done, passing either null (when everything is great) or an error object. We also write a "ready" function that gets called last, to take care of any errors or simply declare victory.
 
Here's setupDb, adapted to be called by async.series:
function setupDb(callback)
{
  context.db = require('./db.js');
  context.db.init(context, callback);
}
Notice that we pass on the callback function to context.db.init. Since that function also expects a callback and calls it in the usual way, it's fine to delegate responsibility for invoking the callback.
 
Now let's look at setupApp:
function setupApp(callback)
{
  context.app = require('./app.js');
  context.app.init(context, callback);
}
Very similar. Both functions require a module, then call an init method that eventually invokes a callback.
 
Here's the listen function:
// Ready to roll - start listening for connections
function listen(callback)
{
  context.app.listen(context.settings.http.port);
  callback(null);
}
This one is a little bit different because the context.app.listen function works immediately. So there's no need to pass the callback function on to some other function to be invoked later. Instead we just call it ourselves with a null argument, meaning that everything went fine.
 
These functions are much more maintainable now that we don't have to explicitly code each one to call the next one and worry about errors from the last one! We get to do that in one much more readable place when we call async.series.
 
But let's not forget the second argument to async.series: the ready function. This one does take an err argument, which will be present if any of the functions in our list failed:
function ready(err)
{
  if (err)
  {
    throw err;
  }
  console.log("Ready and listening at http://localhost:" + context.settings.http.port);
}
Isn't it nice to worry about errors in just one place? 

The db module: talking to MongoDB

We've broken our application up into several modules. One of them is the "db" module. This module provides simple functions for fetching and storing blog posts, without worrying about the details of the database being used. Worrying about that is the db module's job. In this way, we create an interface that can easily be satisfied by a completely different database without changing code anywhere else in the project. 
 
Folks who develop in traditional "MVC" (Model, View, Controller) frameworks would refer to our db module as the "model layer."
 
Start by creating a new file, db.js, in your blog-3 folder. 
 
We'll kick things off by requiring the mongodb module:
var mongo = require('mongodb');
Now we can use the mongo object to work with mongodb.
 
Next we'll declare some variables that are visible only in this module. Remember, if it's not part of module.exports, it's nobody's business but your own. These are for our own private use in implementing the stuff we do export:
var db;
var postCollection;
var context;
var settings;
Now let's get things moving by implementing our "init" function. The "init" function provides a way to pass in the context as well as a callback function to be invoked once the database is ready.
 
We'll follow a common pattern by setting up the module.exports object pretty much the same way we did in settings.js, using the syntax:
{ property: value, .. } 
Just keep in mind this works for functions exactly the same way it works for other properties. They become methods of the object, accessed with the "." operator just like other properties.
 
In addition, we'll set the variable "db" to refer to the same thing as module.exports. This is a common convenience to let us refer to the object we're creating with a lot less typing.
 
Here's what the module looks like:
module.exports = db = {
  // Initialize the module. Invokes callback when ready (or on error)
  init: function(contextArg, callback) {
    context = contextArg;
    settings = context.settings;
    // ... Additional init code here ...
  },
  // ... More properties of the db object go here ...
};
Now let's take a look at the guts of the "init" function, the bit that does the real database work. We'll start by creating a database connection:
    var dbConnection = new mongo.Db(
      settings.db.name,
      new mongo.Server(settings.db.host, settings.db.port, {}),
      {});
Notice that we passed in settings we received via the context object. We need to know the host and port the MongoDB server is running on, as well as the name we want for our database.
 
Creating the connection just makes an object; it doesn't connect right away. So let's call dbConnection.open. As I'm sure you have already guessed, it takes a callback function that is invoked once the connection is ready:
    dbConnection.open(function(err) { ... }
Inside the callback function, we start by checking for an error. If there is one, we invoke our own callback (the one passed to the init function by the dbSetup function in server.js) with the error:
      if (err)
      {
        callback(err);
      }
If not, we're ready to get a collection object. MongoDB "collections" are similar to "tables" in SQL databases, but we don't have to explicitly create them; the first time we use them, they just come zinging right into existence:
      postCollection = dbConnection.collection('post');
We could use our collection right away at this point. But there's one more thing we'd like to do: make sure that slugs are unique, and that we can find posts by their slugs very quickly.
 
If someone tries to create a post with the same title as another post, we'll wind up with two posts that have the same URL and no way to distinguish them. So to prevent that, we use the ensureIndex function of our collection to index the "slug" field, and we specify the unique: true option to enforce that no two posts can have the same slug.
 
Needless to mention, ensureIndex takes a callback function too. Once that function is called, we're ready to invoke our own callback, signaling that the database is ready to use:
      postCollection.ensureIndex("slug", { unique: true }, function(err, indexName) 
      {
        callback(err);
      });
    });
Take care to balance your }'s and )'s. JavaScript may look like Java, PHP and friends, but it's really much more like LISP, Scheme and other languages that positively live for the joy of nesting things. Functions can go inside statements, and you still have to close those statements properly. Property lists can include functions, and you still have to separate them with commas. It takes a little getting used to.
 

Callbacks, Functions and Closures

You may notice that in this case I chose to write all of the callback functions "inline," rather than declaring them as separate, named functions. Node programmers tend to favor writing callback functions inline. My preference depends on readability, and whether I want to take advantage of the ability to see variables that are declared in the enclosing function - something JavaScript allows nested functions to do.
 
If you're paying close attention, you'll notice that JavaScript also allows my nested callback function to access the dbConnection variable even though the original init() function has already returned. The callback functions are written inside it, but they don't actually get called until later, when the MongoDB module acknowledges that the database is ready. Yet dbCollection still has its value in those callback functions. This extremely powerful feature of JavaScript is called a closure.
 
The ability to create closures is a key reason why callback-driven web development is so much more popular in JavaScript than it has been in other languages. Callbacks often yield fast, flexible code, regardless of the programming language. But in most languages, programmers tend to move away from callbacks (and events, a close cousin) whenever circumstances allow.  
 
That's because in most languages it is much harder to grasp the context of each callback function and understand the flow of event-driven code. But in a functional language like JavaScript, we can use nested functions, closures, and tools like async.series to get the performance of callbacks without making code too hard to understand and maintain.

Enough Talk! Let's Store Something!

We've spent enough time initializing things! Let's manipulate some posts.
 
We'd like functions to find all posts, find one specific post by its slug, and create a new post. (Later we'll need more.) All of these functions have something in common: they deal specifically with posts, not with anything else we might add to the database later.
 
So let's group them together in a "posts" object, nested inside the db object:
  posts: {
    // Find all posts in reverse order (blog order)
    findAll: function(callback) {
      // Code goes here 
    },
    // Fetch a particular post by its slug
    findOneBySlug: function(slug, callback) {
      // Code goes here
    },
    // Insert a new post
    insert: function(post, callback) {
      // Code goes here
    }
  }
The "posts" object is simply the next property after the "init" function in our "db" object (don't forget the comma after that closing "}" for the init function). This allows us to add more nested objects later to work with other types of data.
 
Now let's look at the inside of the findAll method. This method asks MongoDB to fetch all of the posts, in "blog order" (reverse chronological), then invoke a callback function with the results:
    findAll: function(callback) {
      postCollection.find().sort({created: -1}).toArray(function(err, posts) {
        callback(err, posts);
      });
    },
To fetch all of the posts in the collection we call the find() method without any arguments. find() returns a "cursor" on which we can call more methods to clarify exactly what we want MongoDB to do. Calling the sort() method with { created: -1 } indicates that we want to sort on the "created" field of each post, in reverse order (since blogs traditionally show the newest post first). And calling the toArray function with a callback finishes the job, asking MongoDB to execute the query we've described up to this point and then invoke our callback.
 
The findOneBySlug method is similar, using the findOne() method, a convenient variation on find() that retrieves just one object. In addition, we specify a particular field to search for and a specific value to be matched:
    findOneBySlug: function(slug, callback) {
      postCollection.findOne({slug: slug}, function(err, post) {
        callback(err, post);
      });
    },
By specifying { slug: slug } we tell MongoDB that we are only interested in the post whose "slug" field is equal to the "slug" argument passed to the findOneBySlug method.
 
The insert method does a little bit more work than the others. In addition to passing on a request to MongoDB, insert also takes care of converting the title of the post to a reasonable "slug," or URL. "insert" also has to worry about the very real possibility of an error, since the user might try to insert two posts with the same title:
    insert: function(post, callback) {
      post.slug = db.slugify(post.title);

      // Set the creation date/time
      post.created = new Date();

      postCollection.insert(post, { safe: true }, function(err) { 
        if (err) 
        { 
          callback(err); 
        } 
        else 
        { 
          callback(err, post); 
        } 
      }); 
    }
db.slugify is a simple method that converts a title to a plausible slug. It doesn't check whether someone else is already using it; the database index will do that (and I'll demonstrate a way to automatically fix it in a future installment). I'll look at db.slugify in a moment.
 
postColleciton.insert is where the real magic happens. When we call postCollection.insert, we are asking mongodb to insert a new document into the database. 

MongoDB: safe at one speed, not so safe at another

MongoDB is designed to run very, very fast, sometimes even at the expense of safety. In some situations, it might be okay to just keep right on zooming along without worrying about the occasional unreported error. In others... not so much.
 
Since our blog is in the "not so much" category - users take a dim view of lost posts - it's important that we pass the { safe: true } option to the insert method of our postCollection object. This ensures that MongoDB will actually wait to be sure the insert was successful before invoking the callback function.
 
Don't worry, though. Even with safe: true, MongoDB is still very fast indeed.
 
Making Reasonable Slugs
 
One last method of the db object to look at: slugify. slugify converts titles to slugs by using a series of regular expressions. And it does a good job, as long as you don't have non-Latin characters that you want to preserve in slugs. If you do, you might want to check out the xregexp module in npm.
 
Here's the slugify source code:
  slugify: function(s)
  {
    // Everything not a letter or number becomes a dash
    s = s.replace(/[^A-Za-z0-9]/g, '-');
    // Consecutive dashes become one dash
    s = s.replace(/\-+/g, '-');
    // Leading dashes go away
    s = s.replace(/^\-/, '');
    // Trailing dashes go away
    s = s.replace(/\-$/, '');
    // If the string is empty, supply something so that routes still match
    if (!s.length)
    {
      s = 'none';
    }
    return s.toLowerCase();
  }
Hmm... slugify works well enough that it may be worth turning it into a little npm module in its own right. But not tonight!

Tomorrow, We Ride!

Whew! That... was epic. We covered modules, refactoring, async.series, closures, and MongoDB.
 
So let's take a moment to breathe. Tomorrow we'll look at the app.js module, which contains our route functions for the latest version of the blog, including the "New Post" form that allows us to create and save blog posts.
 
 

 

blog comments powered by Disqus