justjs: node.js tutorials

New here? You might want to start at the beginning.

5/31 '12

Let's be bold: rich text editing and validation

We've got a blog... if you call being able to type in a single run-on paragraph with no line breaks a blog. Let's fix it so we can style blog posts any way we like.

We could tackle this the old-fashioned way... one of several old-fashioned ways, actually:

  • We could let users type in HTML, and just trust it.
  • We could let users type in HTML, and validate it to remove scary stuff like XSS attacks.
  • We could offer users a more convenient alternative markup language, like Markdown.
  • We could just convert newlines to <br /> tags and URLs to links.

We've all seen sites that use each of these strategies. And honestly they have their place. For comments and status updates, the last one can be a good choice; fancy formatting is probably just a distraction when folks are writing short comments. And Markdown is pretty awesome if you know what you're doing. Making people type HTML just to get line breaks is obnoxious, but it's not uncommon to mix that strategy with the "convert newlines to <br /> tags" strategy.

So which one are we going to pick? None of them! We're going to give the user a full-blown rich text editor, just like Tumblr or Google Docs or Gmail. Because there's no reason why you should have to learn a markup language just to use italics.

The Rich Text Editor: Already Built Into Your Browser...

There are many rich text editors for the web. No, wait, let me rephrase that: there is exactly one rich text editor for the web... the one that's baked right into your web browser. Adding rich text editing can be as easy as this:

<div contenteditable="true">
This text is editable.
</div>

Every "rich text editor" you can add to a website is nothing but a collection of buttons to trigger functions in this built-in rich text editor, accumulated workarounds for frequently encountered problems with it, and extensions to make it do just a little more.

... But Making It Work Well Is Another Story

Sounds like the Emperor has no clothes! So why doesn't everyone just take advantage of "contenteditable" for themselves? Because while it is easy to make an element editable in modern browsers, the number of workarounds necessary to really make this cooperate nicely in every browser is pretty scary. So much so that the difference between a well-written wrapper for "contenteditable" and a naively coded one can be enormous.

That's why there are zillions of rich text editor projects out there, but only a tiny handful that can be trusted to work across Internet Explorer, Safari, Opera, Firefox and Chrome in a reliable and predictable fashion. Subtract the editors that have problematic or non-free licenses or dependencies on JavaScript frameworks most people don't use, and you're left with two options that are pretty much guaranteed to work.

Those two options are TinyMCE and CKEditor. Both have been around for a long time. Both are open source with permissive licenses.

CKEditor is a complete rewrite of the rich text editor formerly known as FCKEditor. I recommend CKEditor because it has a larger development community behind it than TinyMCE, and when things go wrong, a larger open source community equals quicker fixes.

"Isn't CKEditor slow?" You're probably thinking of FCKEditor, which had a well-deserved reputation for being "heavier" than TinyMCE. CKEditor is a successful rewrite with much better performance. Even so, we will load CKEditor only on the pages that really need it.

"If we're going to use a rich text editor, what does this really have to do with node? The rich text editor runs on the browser." Good question. First, this blog is primarily about Node, but also the entire web development stack around Node applications. The goal is to learn how to build complete Node apps, and also to see how JavaScript techniques that matter for Node also matter for jQuery and other client-side work. So this isn't the last time we'll look at browser-side code. Second, we'll be looking at the server side when it comes time to actually save the rich text without allowing unwelcome things like XSS attacks to sneak through. And we'll also be looking at how to integrate CKEditor with our view layer effectively.

Installing CKEditor

You can download CKEditor from the CKEditor site. It doesn't really matter whether you download the .zip file or the .tar.gz file. Your browser will invite you to open either one with your Mac's Archive Utility app. This will extract the file to a subdirectory of your Downloads folder called ckeditor.

Copy your node-apps/blog-5 project to node-apps/blog-6. Then move the ckeditor folder to node-apps/blog-6/static/ckeditor (that is, make it a child of the static folder in your blog-6 directory).

If you take a peek at the ckeditor folder, you'll notice three things:

1. It's pretty big! You can delete the _samples and _source subdirectories. You can also remove every subdirectory of "lang" that starts with a two-letter language code, except for "en" (English). Feel free of course to leave the other languages in place.

2. CKEditor is about 90% JavaScript and 10% HTML and CSS for the various dialogs it presents. But it comes with example code to integrate it with server-side languages we're not using, like PHP and ASP. You can remove these, but since our server is based on Node and wouldn't know how to run an ASP file if you begged it to, there's no real security risk in leaving them in place.

3. There's a config.js file. You can edit this and uncomment the "config.language" line if you want to set a default language for the interface other than English.

Integrating CKEditor

CKEditor is remarkably easy to integrate. CKEditor is designed to "progressively enhance" the good old textarea HTML element. The idea is that server-side code can receive the form submission just as it always did.

Of course, what CKEditor submits to the server is rich text, not plaintext. In practice, this means CKEditor submits HTML tags. It is exactly as if the user had manually typed:

<p>This is a paragraph with some <b>bold</b> text</p>

... In an ordinary textarea.

So how do we ask CKEditor to enhance textareas? And how do we prevent it from enhancing textareas we'd rather leave as-is?

A good strategy is to assign a CSS class to the textareas that should be enhanced. And that's exactly what we'll do, in views/new.ejs:

<textarea name="body" class="post-body rich-text-editor"></textarea>

Note the addition of the rich-text-editor class.

That's great, but how do we take advantage of the class? Simple: in views/layout.ejs, we'll ask jQuery for all of the textareas that have that class. Then we'll enhance each one.

This code loads CKEditor, then calls CKEditor's replace() function with each element that has the .rich-text-editor class. We do it just before the </body> tag:

    <script type="text/javascript" src="/static/ckeditor/ckeditor.js"></script>
    <script type="text/javascript">
      $(function() {
        $('.rich-text-editor').each(function() {
          CKEDITOR.replace(this);
        });
      });
    </script>

jQuery and Node: Not So Different

Many programmers use jQuery without really understanding it. A good grasp of Node really helps here, because like Node jQuery relies on callbacks and nested functions.

jQuery's famous "$" is just a JavaScript function named "$." The "$" function is jQuery's main point of entry. If you pass a callback function to $, it is called when the DOM (Document Object Model) is ready- in other words, when all of the HTML elements in the page are ready to be safely manipulated. This is different from using an "onLoad" handler on the body element because it does not require waiting for every image to load.

When you pass a CSS-style selector like .rich-text-editor to the $ function, you get back a collection of all of the elements in the page that match that selector (in this case, the textareas we want to enhance). These collections support lots of useful functions that can be chained together, including each(), which is similar but not identical to Underscore's each() function; we pass a callback function to it and it is invoked once for each matching element, with "this" set to the matching element. Note that jQuery also has many methods that don't require using each(); you can call hide(), for instance, to hide every matching element in the collection without the need for a callback function for each one. We use each() when we want to do something to each element that is not included in jQuery's built-in set of functions.

Thanks to each(), we're ready to use CKEditor's API to finish the job. A call to CKEDITOR.replace(this); replaces the element we pass to it with a rich text editor that behaves identically.

CKEditor's ability to make its rich text editors behave exactly like the textareas theyreplace isn't perfect. If we were submitting the form via AJAX, we would have to nudge the rich text editor to update its underlying, hidden textarea first. We'll touch on that in a later installment. But for a normal form submission like ours the progressive enhancement works with no extra effort.

Displaying Rich Text

If you fire up the site now, you'll see a rich text editor with a toolbar not unlike Google Docs or Microsoft Word, in place of the plain-vanilla textarea. This is great, but once you save the post and go to view it, you'll see... escaped HTML tags. Not so awesome.

To fix that, all we have to do is edit views/post.js and change:

<%= post.body %>

To:

<%- post.body %>

While <%= is great for deliberately escaping things so that unwanted HTML doesn't creep in, <%- is perfect for situations where we actually want the HTML tags to take effect, because it does not escape what it outputs.

Sanitizing the Rich Text

"Rich text is great, but doesn't this mean that an irresponsible user could write a post containing malicious markup?" Yup! To be exact, if we decide to allow someone to post but don't trust them 100%, that person would be able to carry out an XSS (cross-site scripting) attack by inserting HTML that includes JavaScript code, altering the behavior of the site to mislead other users (although they would not be able to compromise the server itself).

Fortunately there is a handy node module to clean this up for us. The validator module, by Chris O'Hara, includes a variety of useful functions for validating and sanitizing form fields. With validator you can easily verify valid email addresses, check that numbers are in a certain range, and carry out a variety of other validation tasks. And the validator module also includes an xss() function which can be used to eliminate XSS attack vectors from HTML markup without discarding the rest of the tags.

To take advantage of validator, first install it:

cd node-apps/blog-6
npm install validator

Then add a line to the top of app.js to require it and save a reference to the "sanitize" function:

var sanitize = require('validator').sanitize;

The validator module includes two different kinds of functionality: validation and sanitization. Validation checks to be sure input meets certain expectations or rejects it, while sanitization actually cleans up the data. The xss() function is in the latter category.

Now let's modify the app.post('/new') route to take advantage of xss() by adding a new line just before our call to context.db.posts.insert():

post.body = sanitize(post.body).xss().trim();

Notice that we must call the sanitize() function first, passing in the body of the post. After that we can call as many sanitization methods as we want, chaining them together to get the desired result - in this case, removing XSS attacks, then trimming off any leading and trailing whitespace.

Any new posts you add after this point should automatically discard any links you add via CKEditor's link icon that attempt to embed JavaScript, such as "javasscript:alert('test')".

"Can't I just remove links from the toolbar by editing ckeditor's configuration file?" Yes, but remember this: never trust the web browser! Any attempt to secure your website with JavaScript on the browser side is doomed to failure because any skilled hacker can just rip out your JavaScript in a copy of the webpage and substitute their own code to submit whatever they please to the web server. The only validation that really protects you is validation on the server side, in your node code. Validation on the browser side is only appropriate as a convenient way to give honest users feedback. Don't trust it.

Don't Slow That Site Down!

Launch the server and visit the site. Add a new post with some styling via the rich text editor. Lo and behold, you get rich styling. Victory!

Just one problem: we're loading CKEditor all the time. And that's slow. Slow for the user, because the browser does extra work. And slow for the server, because it must deliver extra files to every single site visitor. Which is asking for trouble as your traffic grows.

We can fix it by loading CKEditor only when we really need it.

Loading CKEditor in the layout is still a good idea. But we need to do it on demand. We can do that by taking advantage of slots in the new.ejs template.

Just add this line to views/new.ejs:

<% slots.richTextEditor = true %>

Now we can look for this flag in the layout to decide whether to go to to the trouble of firing up CKEditor, like so:

  <% if (slots.richTextEditor) { %>
    <script type="text/javascript" src="/static/ckeditor/ckeditor.js"></script>
    <script type="text/javascript">
      $(function() {
        $('.rich-text-editor').each(function() {
          CKEDITOR.replace(this);
        });
      });
    </script>
  <% } %>

Launch the site again and take advantage of the "view source" feature of the browser. You'll see that CKEditor is only loaded on the one page that currently needs it: the "new post" form. If we need it on other pages, we can just add the same slots.richTextEditor = true code to those pages as well.

Next Up: Summaries and Caching

Rich text posts are all very well, but right now, visitors to our blog's home page don't see much. All they get are the titles of posts. Most users appreciate a bigger clue of what's inside each post. In the next installment we'll look at how to summarize the individual blog posts on the home page without displaying the full text. And since accessing and summarizing every blog post on every visit to the home page threatens to slow the server down a bit, we'll look at how to cache the result so that the home page is not rebuilt until a new post is added. Caching can lead to unpredictable behavior if it is not done well. But it is also an important tactic for popular sites running in production. We'll see how Node's single-threaded nature makes caching dramatically easier than in most environments.

 

 

blog comments powered by Disqus