justjs: node.js tutorials

New here? You might want to start at the beginning.

7/18 '12

@synomat: a Twitter thesaurus, powered by node.js

@synomat is a Twitter-powered thesaurus. It's easy to play with: tweet a word to @synomat and she tweets back a list of synonyms. Simple enough, but how does it work? node.js and the ntwitter module to the rescue!

It might surprise you to learn that not every node.js app is a website. Fundamentally, node.js is not a webserver, although it contains modules that are stupendously good at it. At its core, Node is an engine for doing work asynchronously as the need arises, responding to events as they happen. Node is good for pretty much anything that meets that description. And running a Twitter bot - an automated Twitter account that provides helpful services to the public - definitely meets that description.

ntwitter is a Node module that provides access to the Twitter API. That includes lots of things: logging in, tweeting, searching, and monitoring "streams" of tweets by particular users or groups of users. That last feature is really, really great for writing automated Twitter clients or "bots."
Rather than using the http module (either directly or indirectly via Express), @synomat uses ntwitter as its main point of departure.

Of course, it takes one more thing to make a Twitter thesaurus bot: a decent thesaurus! For that we should thank Grady Ward, who released a public domain thesaurus back in 1996.

Ward's thesaurus, aka the "Moby thesaurus," is as simple as they come. Each line contains a comma-separated list of words. The first word is the "base word" the user is looking up; the rest are synonyms. And as it turns out, it's very easy to parse this format with Node.

Let's dive into server.js and see how it all works!

(Remember, the complete source code is on github. No need to type it all in or copy and paste snippets unless you like doing it that way. You'll need to install various modules with npm, as we previously discussed.)

We'll start out by importing modules. You should be familiar with all of these by now, except for ntwitter. We also require an options.js file which will contain Twitter credentials for both a Twitter app - our bot must be registered as a Twitter application - and a Twitter user:

var _ = require('underscore');
var options = require('./options.js');
var ntwitter = require('ntwitter');
var fs = require('fs');

If you're not checking out the code from github, don't forget to install the ntwitter module with npm:

npm install ntwitter

Let's look at options.js. This file contains a consumer_key and consumer_secret, which work just like those in my vidmood application, so check out that previous installment if you are unfamiliar with the process of registering a Twitter app on dev.twitter.com.

The file also contains an access_token_key (which Twitter refers to as an "access token") and an access_token_secret. These grant the application access to a specific user's account. In the vidmood application, users grant permission to the vidmood app manually, and then the access token and secret come into being without our intervention. But for a simple app like @synomat that will be tweeting on behalf of our own Twitter account, we can skip that manual process and generate the token and secret directly on dev.twitter.com. To do that, first register your app, then click on the "details" tab and scroll down. You'll see a "create my access token" button. Click that, then you'll be able to copy and paste the access token and secret. If you don't see them right away, try clicking over to the "setting" tab, then back to "details." (Twitter does claim there may be some delay before the information becomes available.)

Here's a full-fledged options.js file. Naturally for your own bot you would change the username as well as the credentials:

module.exports = {
  twitter: {
    username: 'synomat',
    consumer_key: 'xxxx',
    consumer_secret: 'xxxx',
    access_token_key: 'xxxx',
    access_token_secret: 'xxxx'
  }
};

"Why can't I just put a username and password in there and be done with it?" Twitter requires that OpenAuth be used by all clients. Yes, it's a pain for simple apps like this that don't act on behalf of third parties, but I don't make the rules. Fortunately the ntwitter module does the tough bits for us. We just supply the credentials.

Now we're ready to take advantage of the ntwitter module. It's possible to interact with Twitter on behalf of more than one user or app, so rather than use the ntwitter module directly, we use it to construct a new object representing this particular user and app:

var tweeter = new ntwitter(options.twitter);

Now let's stash our bot's Twitter username in a global variable where we can use it more conveniently:

var me = options.twitter.username;

Our next steps are to load the thesaurus file, then start up the Twitter bot. I've abstracted these tasks to functions to make the code easier to read:

var thesaurus = loadThesaurus();
bot();

Loading the thesaurus is easy. Since the bot hasn't started up yet, it's acceptable for us to read the whole file with a synchronous call, even though that takes some time. We can split it into lines with JavaScript's familiar split() method, then iterate over those lines with Underscore's each():

function loadThesaurus()
{
  console.log("Loading thesaurus...");
  var thesaurus = {};
  var lines = fs.readFileSync(__dirname + '/mthesaur.txt').toString().split("\n");
  _.each(lines, function(line) {
    // Process each line
  });
  console.log("Thesaurus loaded.");
  return thesaurus;
}

Of course, we also need to split up each line of the thesaurus file. The words in that file are separated by commas, so we'll split each line at the commas:

    var words = line.split(',');

We'll ignore empty lines by checking for an empty array. If the array isn't empty, we know the first word is the term the rest are synonyms for. Pop the list of words, nicely punctuated, into our thesaurus object with the term as the name of the property, and we're ready to go... right?

if (words.length) {
  var term = words[0];
  thesaurus[term] = words.join(', ');
}

Well... almost. The thesaurus contains lots of synonyms for most words. And tweets can only be 140 characters long. We could respond with more than one tweet, but it's not really in the Twitter spirit.

The laziest solution? Join the words up again, separated the way we'd like them to be punctuated when we tweet them out. Check the length. If it's too long, pop the last word off the list and try again. If it's short enough, we're done. This isn't the fastest way, but it's good enough for our purposes.

Here's the code to shorten the list before adding it to the thesaurus. Note the arithmetic to make sure our tweets aren't too long. We need space for the username we're replying to, the punctuation around it and the response itself. The "pop()" method is a handy way to remove the last element from an array.

      while (true)
      {
        var synonyms = words.join(', ');
        // 140 characters minus 15 character username minus
        // leading @ in reply minus space after username
        if (synonyms.length > (140 - 15 - 2))
        {
          words.pop();
          continue;
        }
        else
        {
          break;
        }
      }
      thesaurus[term] = synonyms;

"Hey, why are you loading the thesaurus into memory each time we start up the app? Can't you load it into a database?" Well, sure we could. The thesaurus does take up a fair bit of memory, and if we pushed it into MongoDB, we wouldn't have to execute this code each time the app starts up. But the app doesn't start up very often, and we can make that transition at any time. Meanwhile, having the thesaurus in memory means that lookups are even faster than they'd be with MongoDB.

Let's Speak Twitter!

Let's move on to the bot() function, where all the magic happens.

We'll start by verifying our Twitter credentials:

function bot()
{
  tweeter.verifyCredentials(function (err, data) {
    if (err)
    {
      console.log("Credentials bad. Bummer. Go check that in dev.twitter.com.");
    }
    console.log("Verified credentials");
  })
 //  ... Great, now let's use them
}

The "verifyCredentials" method of our tweeter object confirms that our Twitter credentials are good. Now we're ready to do more exciting things.

All of the methods of the tweeter object are "chainable." That means that each method returns the tweeter object itself, allowing us to call another method without starting a new statement. It looks like this:

function bot()
{
  tweeter.verifyCredentials(function (err, data) {
    // etc
  })
  .nextMethodCallHere() ...
}

However, keep in mind that you cannot assume verifyCredentials() has actually succeeded just because it has returned. As always in Node, it works asynchronously and invokes its callback when it's truly complete (or has truly failed). In our case, we call verifyCredentials() just to log a message to the console if anything is wrong, so there's no harm in getting started with the next thing while that is going on. But you should keep in mind that method chaining is not a way to do asynchronous things in a predictable order. For that, you need to use callbacks and perhaps async.series, as we saw in a previous installment.

We're ready to listen to tweets! Recall that our goal is to respond to incoming tweets by providing a thesaurus service. But how do we listen?

Twitter provides a "stream API" that offers a way to get a continuous flow of updates regarding a particular user. And the ntwitter module wraps this API in pretty much the nicest way possible. Here's how it works:

  .stream('user', { track: options.twitter.username }, function(stream) {
    console.log("Listening to tweets");
    stream.on('data', function (data) {
      // Do something with the data
    });
  })

The first argument to the .stream method is the name of the stream API we want to access. We're interested in a single user's tweets, so we'll specify the "user" API.

The second is an object containing parameters for that API. Here we specify the string we want to search for - in this case, a username - as the "track" property. We'll filter the results to get the tweets we want. There are also ways to specifically receive "replies," which would seem appropriate for catching messages to @synomat, but those properties only return responses from people who follow @synomat. We want to be more inclusive than that so we'll dig through all mentions of synomat and filter them on our own.

Listening to Events

You might think that the callback function for the stream() method would receive tweets. But that's not the case. Instead, it receives a "stream" object on which it can choose to monitor various events. This make sense because incoming tweets are not the only event that can happen to a stream.

In Node, it is common for objects to support the "on" method, which is used to watch a particular named event relating to the object, invoking a callback when the event happens. In our case, we want to monitor the "data" event. Thus this line:

    stream.on('data', function (data) {
      // Do something with the data
    });

The callback of this method receives the actual tweets mentioning synomat - and, curiously, other things as well.

When we use the track property we receive tweets that are sent "to" the username we're tracking. But we also receive other things. The full list includes:

  • Tweets by the synomat user (rather than to it)
  • Tweets to them (tweets beginning with "@synomat ")
  • Tweets mentioning them (tweets with "@synomat" somewhere in the body)
  • Things that aren't tweets at all, such as arrays of friend ids

This is clearly the API we want. But since only the second case above relates directly to our needs, we'll need to make sure each "data" object we receive is really a tweet, and really a tweet "to" synomat.

We'll begin by ruling out data objects that don't have a "user" property, which does an effective job of filtering out things that aren't tweets at all. Then we'll grab the username of the person who sent the tweet:

      if (!data.user)
      {
        // Not a tweet. For example I've received a list of friend ids
        // here for some reason
        return;
      }
      var them = data.user.screen_name;

Now we'll make sure the tweet begins with @synomat. Twitter makes this easy for us with the in_reply_to_screen_name property:

if (data.in_reply_to_screen_name === me) {
  // Get excited and do things!
}

Now we can look at the tweet text, which is in the data.text property. We'll check to make sure that it ends with a word, using the regular expression pattern "\w+", which matches one or more word constituent characters (in English, at least), followed by \s* (to tolerate whitespace) and $ (to signify it must come at the end of the tweet). Then we'll convert the word to lowercase.

Notice that the match() method returns an array. Element 0 of that array contains the entire regular expression match. Element 1 contains the portion that matches the first subpattern in parentheses. And this is good, because it lets us capture the word with "(\w+)".

        var result = data.text.match(/ (\w+)\s*$/);
        if (result)
        {
          var word = result[1].toLowerCase();
          // The word is now "bird", not "Bird"
        }

At last we can respond! If our thesaurus contains the word, we'll send a list of synonyms. Otherwise, we'll apologize for being lame:

          if (_.has(thesaurus, word))
          {
            reply(thesaurus[word]);
          }
          else
          {
            reply("sorry, I don't know the word " + word + ".");
          }

You should always use _.has to test whether an object that you are using to store your own data really has a particular property or not. This avoids confusion with properties that actually belong to all objects, such as the hasOwnProperty method itself.

If the regular expression didn't match, we send back another polite note:

        else
        {
          reply("just tweet me one word and I will tweet back synonyms, analogues, equivalents.");
        }

Great! But something is missing. Where is the reply() function?

We'll nest it right here inside the stream.on("data", ...) callback function. That way, the reply() function can see the "me" and "them" variables for extra convenience:

      function reply(msg)
      {
        tweeter.updateStatus("@" + them + " " + msg, function(err, data)
        {
          if (err)
          {
            console.log(err);
            // Not a big deal if a tweet fails. We could log something interesting though.
          }
        });
      }

The "updateStatus" method of our tweeter object lets us send any tweet we wish. And its callback function also lets us capture errors if they take place. Since this is just a thesaurus service, we won't do anything dramatic with the errors, apart from logging them to the console. We'll get errors fairly often - because someone asks for the same word twice (Twitter forbids duplicate tweets), because we've hit Twitter's API rate limit, or simply because Twitter is currently displaying the fail whale.

Hmm, maybe we should log an ASCII art fail whale to the console.

   o' o' o'
  
><>

... Nah.

We're nearly through! In fact, the app is complete at this point and works just fine. But there are two more events worth listening for: "end" and "destroy." The "end" event occurs when Twitter closes the stream politely. The "destroy" event occurs when Twitter hangs up not-so-politely. Either way, our response is the same: wait a second, just to let Twitter calm down and regret its actions. Then call Twitter back and demand an apology.

    stream.on('end', function (response) {
      // Handle a disconnection
      console.log('end event, listening again');
      setTimeout(1000, listen);
    });
    stream.on('destroy', function (response) {
      // Handle a 'silent' disconnection from Twitter, no end/error event fired
      console.log('destroy event, listening again');
      setTimeout(1000, listen);
    });

Our twitter bot is complete! But how do we launch it? Same as any other Twitter app:

node server.js

A Twitter bot can run directly on your development laptop with no real problems, so you don't strictly need a separate Twitter app and Twitter username just for testing purposes.

Of course, to create a stable application that is always running and never misses a tweet, you should run your bot on a production server. Here's a tip: even though it's not a web app, you can deploy it with stagecoach, side by side with the rest of your Node apps. We covered how to do this in the previous installment, and the procedure is no different. The twitter bot simply won't bother to listen for HTTP connections on the port assigned to it. Naturally you can also deploy to Nodejitsu, Heroku and friends.

Have fun, amusement, good time, animal pleasure, blitheness!

 

blog comments powered by Disqus