On CBS.com: No Line On The Horizon by U2

Catch errors before they go live by linting your HTML with Perl

Tags: Scripting languages, Programming languages, Development tools, Contributor Melonfire, HTML::Lint, Perl, HTML

  • Save
  • Print
  • Recommend
  • 0

Takeaway: The CPAN module called HTML::Lint module, built atop the popular HTML::Parser module, is designed to verify your markup against W3C standards and point out errors that could cause it to "break" or render badly in client browsers.

If you're coding in Perl, it's pretty obvious when there's an error in your code -- the parser will spew all kinds of error messages over your screen, alerting you to the problem and letting you take immediate action to fix it. If you're developing HTML pages, though, no such early warning system exists -- any error in your markup will usually be silently ignored by the browser. Worse, some browsers even attempt to "automatically" correct for common markup errors, introducing a whole set of new problems in the process.

The simplest solution, then, is to check (or "lint") your HTML before putting it live. And that's where a useful little CPAN module called HTML::Lint comes in. This Perl module, built atop the popular HTML::Parser module, is designed to verify your markup against W3C standards and point out errors that could cause it to "break" or render badly in client browsers.

This document explores some of HTML::Lint's capabilities, using it to check HTML pages and display the errors it finds. To begin with, download and install the module (if you don't already have it) by running the following commands at your Perl prompt:

perl> perl -MCPAN -e "install HTML::Lint"

Linting Files

Once you've got it installed, create and save the following HTML file (as abc.html):

<html>
<head></head>
A is for apple, B is for baby
</body>
</html>

As you can see, this file has a deliberate error -- the opening <body> tag is missing. It's pretty obvious here, but if you had a larger and more complex file, such missing tags would be harder to detect. That's why the next step is to write some Perl code to detect this error using HTML::Lint.

Create and save the following script (as linter.pl):

#!/usr/bin/perl

# initialize linter
use HTML::Lint;
$lint = HTML::Lint->new();

# parse file
$lint->parse_file("abc.html") or die("Cannot find file!");

# check for errors
($lint->err) ? print "Your code stinks!" : print "Your code rocks!";

This is pretty simple: the script initializes a HTML::Lint object and then uses the object's parse_file() method to parse the HTML file created previously. Errors, if any, are stored in the @err array and an error message is printed to the console.

Here's the output you might see:

shell> ./linter.plYour code stinks!

Of course, this is somewhat impractical if you have a large number of files to lint. In such cases, you'd want to pass the HTML file name and path to the script at run-time, rather than hard-coding it into the script. Listing A is a revision of the previous script which lets you do just this.

Listing A


#!/usr/bin/perl

# read file name from command line
if (!$ARGV[0]) { die ("ERROR: No file name provided"); }

# initialize linter
use HTML::Lint;
$lint = HTML::Lint->new();

# parse file
$lint->parse_file($ARGV[0]) or die("ERROR: Cannot find file");

# check for errors
($lint->err) ? print "Your code stinks!\n" : print "Your code rocks!\n";

# print error count
print "Errors found: ", scalar($lint->err);

In this case, the script expects a file path as the first argument to the script; this is stored in the special Perl @ARGV array. The script then looks for this file, parses it and displays a message depending on whether or not it found an error. The last line in the script is new: it prints a count of the errors found by the parser, based on the size of HTML::Lint's @err error array.

And here's how to use it:

shell> ./linter.pl /tmp/abc.html
Your code stinks!
Errors found: 1

Handling errors


While it should now be clear how HTML::Lint can find errors in your code, there's still one problem -- printing messages about how much your code stinks is amusing, but not really helpful in diagnosing the problem. What you'd really like are detailed error messages that indicate both the nature of the error and the line number on which they occurred.

Fortunately, HTML::Lint stores this information in the @err array, and it's easy to extract and display it. The next example (Listing B) builds on the previous one to display more detailed error information.

Listing B


#!/usr/bin/perl

# read file name from command line
if (!$ARGV[0]) { die ("ERROR: No file name provided"); }

# initialize linter
use HTML::Lint;
$lint = HTML::Lint->new();

# parse file
$lint->parse_file($ARGV[0]) or die("ERROR: Cannot find file");

# check for errors
($lint->err) ? print "Your code stinks!\n" : print "Your code rocks!\n";

# print detailed error list
foreach $e ($lint->err) {
      print $e->where(), ": ", $e->errtext() , "\n";
}

In this case, the @err array is processed using a foreach() loop. Each individual error object is then extracted and the location and detailed error information is printed using the object's where() and errtext() methods.

Here's an example of what you might see:

shell> ./linter.pl /tmp/abc.html
Your code stinks!
(4:1): </body> with no opening <body>

Of course, you can modify the script above to lint multiple files at once, log errors to a log file instead of displaying them, or even filter out all but the most critical errors. For examples of these and other tricks, visit the HTML::Lint page. Take a look when you have a minute, and happy coding!

  • Save
  • Print
  • Recommend
  • 0

Print/View all Posts Comments on this article

I wonder how that got by QA? Mark W. KaelinTechrepublic Moderator | 07/05/06
More QA goodness` SnoopDoug | 01/04/07
why not Jaqui | 01/04/07

What do you think?

White Papers, Webcasts, and Downloads

Article Categories

Security
Security Solutions, IT Locksmith
Networking and Communications
E-mail Administration NetNote, Cisco Routers and Switches
CIO and IT Management
Project Management, CIO Issues, Strategies that Scale
Desktops, Laptops & OS
Windows 2000 Professional, Microsoft Word, Microsoft Excel, Microsoft Access, Windows XP,
Data Management
Oracle, SQL Server
Servers
Windows NT, Linux NetNote, Windows Server 2003
Career Development
Geek Trivia
Software/Web Development
Web Development Zone, Visual Basic, .NET

SmartPlanet

advertisement
Click Here