Catch errors before they go live by linting your HTML with Perl
Takeaway: The CPAN module called HTML::Lint module, built atop the popular HTML::Parser module, is designed to verify your markup against W3C standards and point out errors that could cause it to "break" or render badly in client browsers.
If you're coding in Perl, it's pretty obvious when there's an error in your code -- the parser will spew all kinds of error messages over your screen, alerting you to the problem and letting you take immediate action to fix it. If you're developing HTML pages, though, no such early warning system exists -- any error in your markup will usually be silently ignored by the browser. Worse, some browsers even attempt to "automatically" correct for common markup errors, introducing a whole set of new problems in the process.
The simplest solution, then, is to check (or "lint") your HTML before putting it live. And that's where a useful little CPAN module called HTML::Lint comes in. This Perl module, built atop the popular HTML::Parser module, is designed to verify your markup against W3C standards and point out errors that could cause it to "break" or render badly in client browsers.
This document explores some of HTML::Lint's capabilities, using it to check HTML pages and display the errors it finds. To begin with, download and install the module (if you don't already have it) by running the following commands at your Perl prompt:
perl> perl -MCPAN -e "install HTML::Lint"Linting Files
Once you've got it installed, create and save the following HTML file (as abc.html):
<html>
<head></head>
A is for apple, B is for baby
</body>
</html>
As you can see, this file has a deliberate error -- the opening <body> tag is missing. It's pretty obvious here, but if you had a larger and more complex file, such missing tags would be harder to detect. That's why the next step is to write some Perl code to detect this error using HTML::Lint.
Create and save the following script (as linter.pl):
#!/usr/bin/perl# initialize linter
use HTML::Lint;
$lint = HTML::Lint->new();
# parse file
$lint->parse_file("abc.html") or die("Cannot find file!");
# check for errors
($lint->err) ? print "Your code stinks!" : print "Your code rocks!";
This is pretty simple: the script initializes a HTML::Lint object and then uses the object's parse_file() method to parse the HTML file created previously. Errors, if any, are stored in the @err array and an error message is printed to the console.
Here's the output you might see:
shell> ./linter.plYour code stinks!Of course, this is somewhat impractical if you have a large number of files to lint. In such cases, you'd want to pass the HTML file name and path to the script at run-time, rather than hard-coding it into the script. Listing A is a revision of the previous script which lets you do just this.
Listing A
#!/usr/bin/perl
# read file name from command line
if (!$ARGV[0]) { die ("ERROR: No file name provided"); }
# initialize linter
use HTML::Lint;
$lint = HTML::Lint->new();
# parse file
$lint->parse_file($ARGV[0]) or die("ERROR: Cannot find file");
# check for errors
($lint->err) ? print "Your code stinks!\n" : print "Your code rocks!\n";
# print error count
print "Errors found: ", scalar($lint->err);
In this case, the script expects a file path as the first argument to the script; this is stored in the special Perl @ARGV array. The script then looks for this file, parses it and displays a message depending on whether or not it found an error. The last line in the script is new: it prints a count of the errors found by the parser, based on the size of HTML::Lint's @err error array.
And here's how to use it:
shell> ./linter.pl /tmp/abc.htmlYour code stinks!
Errors found: 1
Handling errors
While it should now be clear how HTML::Lint can find errors in your code, there's still one problem -- printing messages about how much your code stinks is amusing, but not really helpful in diagnosing the problem. What you'd really like are detailed error messages that indicate both the nature of the error and the line number on which they occurred.
Fortunately, HTML::Lint stores this information in the @err array, and it's easy to extract and display it. The next example (Listing B) builds on the previous one to display more detailed error information.
Listing B
#!/usr/bin/perl
# read file name from command line
if (!$ARGV[0]) { die ("ERROR: No file name provided"); }
# initialize linter
use HTML::Lint;
$lint = HTML::Lint->new();
# parse file
$lint->parse_file($ARGV[0]) or die("ERROR: Cannot find file");
# check for errors
($lint->err) ? print "Your code stinks!\n" : print "Your code rocks!\n";
# print detailed error list
foreach $e ($lint->err) {
print $e->where(), ": ", $e->errtext() , "\n";
}
In this case, the @err array is processed using a foreach() loop. Each individual error object is then extracted and the location and detailed error information is printed using the object's where() and errtext() methods.
Here's an example of what you might see:
shell> ./linter.pl /tmp/abc.htmlYour code stinks!
(4:1): </body> with no opening <body>
Of course, you can modify the script above to lint multiple files at once, log errors to a log file instead of displaying them, or even filter out all but the most critical errors. For examples of these and other tricks, visit the HTML::Lint page. Take a look when you have a minute, and happy coding!
Print/View all Posts Comments on this article
|
|
|
|
|
|
White Papers, Webcasts, and Downloads
- VMware Infrastructure: A Guide to Bottom-Line Benefits VMware Frustrated by the high cost of maintaining or building ever-larger data centers? Get the facts you need to formulate your Virtualization Action Plan. Download Now
- Why Isn't Server Virtualization Saving Us More? A Few Small Changes May Dramatically Increase Your Efficiency VMware Ever wonder why your company isn't saving more from its server virtualization? Making a few small changes could dramatically increase your efficiency. Download Now
- Building the Virtualized Enterprise with VMware Iinfrastructure VMware VMware virtualization software has been adopted by over 120,000 enterprise ... Download Now
- The True Costs of Virtual Server Solutions VMware Discover ways to streamline and simplify your assessment of the total acquisition costs of a server virtualization environment. Download Now
- The Impact of Virtualization Software on Operating Environments VMware Today's use of virtualization technology allows IT professionals to ... Download Now
Article Categories
- Security
- Security Solutions, IT Locksmith
- Networking and Communications
- E-mail Administration NetNote, Cisco Routers and Switches
- CIO and IT Management
- Project Management, CIO Issues, Strategies that Scale
- Desktops, Laptops & OS
- Windows 2000 Professional, Microsoft Word, Microsoft Excel, Microsoft Access, Windows XP,
- Data Management
- Oracle, SQL Server
- Servers
- Windows NT, Linux NetNote, Windows Server 2003
- Career Development
- Geek Trivia
- Software/Web Development
- Web Development Zone, Visual Basic, .NET

