Monday, February 22, 2010

Optimizing HTML

Why clean markup?

Client-side optimization is getting a lot of attention lately, but some of its basic aspects seem to go unnoticed. If you look carefully at pages on the web (even those that are supposed to be highly optimized), it’s easy to spot a good amount of redundancies, and inefficient or archaic structures in their markup. All this baggage adds extra weight to pages that are supposed to be as light as possible.

The reason to keep documents clean is not so much about faster load times, as it is about having a solid and robust foundation to build upon. Clean markup means better accessibility, easier maintenance, and good search engine visibility. Smaller size is just a property of clean documents, and another reason to keep them this way.

In this post, we’ll take a look at HTML optimization: removing some of the common markup smells; reducing document size by getting rid of redundant structures, and employing minification techniques. We’ll look at currently available minification tools, and analyze what they do wrong and right. We’ll also talk about what can be done in a future.
Markup smells

So what are the most common offenders?
1. HTML comments in scripts

One of the gross redundanies nowadays is inclusion of HTML comments — — in script blocks. There’s not much to say here, except that browsers that actually need this error-prevention measure (such as ‘95 Netscape 1.0) are pretty much extinct. Comments in scripts are just an unnecessary baggage and should be removed ferociously.
2. sections

Another often needless error-prevention measure is inclusion of CDATA blocks in SCRIPT elements:



It’s a noble goal that falls short in reality. While CDATA blocks are a perfectly good way to prevent XML processor from recognizing < and & as start of markup, it is only the case in true XHTML documents — those that are served with “application/xhtml+xml” content-type. Majority of the web is still served as “text/html” (since, for example, IE doesn’t understand XHTML to this date), and so is parsed as HTML by the browsers, not as XML.

Unless you’re serving documents as “application/xhtml+xml”, there’s little reason to have CDATA sections hanging around. Even if you’re planning to use xhtml in a future, it might make sense to remove unnecessary weight from the document, and only add it later, when actually needed.

And, of course, an ultimate solution here is to avoid inline scripts altogether (to take advantage of external scripts caching).
3. onclick=”…”, onmouseover=”“, etc.

There are some valid use cases for intrinsic event attributes, such as for performance reasons or to target ancient browsers (although, I’m not aware of any environment that would understand event attributes — onclick="...", and not property-based assignments — element.onclick = ...). Besides well-known reasons to avoid them, such as separation of concerns and reusability, there’s a matter of markup pollution. By moving event logic to external script, we can take advantage of that script’s caching. Event handler logic doesn’t need to be transferred to client every time document is requested.
4. onclick=”javascript:…”

An interesting confusion of javascript: pseudo protocol and intrinsic event handlers results in this redundant mix (with 106,000 (!) occurrences). The truth is that entire contents of event handler attribute become a body of a function. That function then serves as an event handler (usually, after having its scope augmented to include some or all of the ancestors and element itself). “javascript:” addition merely becomes an unnecessary label and rarely serves any purpose.
5. href=”javascript:void(0)”

Continuting with “javascript:” pseudo protocol, there’s an infamous href="javascript:void(0)" snippet, as a way to prevent default anchor behavior. This terrible practice of course makes anchor completely inacessible when Javascript is disabled/not available/errors out. It should go without saying that ideal solution is to include proper url in href, and stop default anchor behavior in event handler. If, on the other hand, anchor element is created dynamically, and is then inserted into a document (or is hidden initially, then shown via Javascript), plain href="#" is a leaner and faster alternative to “javascript:” version.
6. style=”…”

There’s nothing inherently wrong with style attribute, except that by moving its contents to an external stylesheet, we can take advantage of resource caching. This is similar to avoiding event attributes, mentioned earlier. Even if you only need to style one particular element and are not planning to reuse its styles, remember that style information has to be transferred every time document is requested. Moving style to external resouce prevents this, as stylesheet is transferred once and then cached on a client.
7.

The thing is that charset attribute only really makes sense on “external” SCRIPT elements — those that have “src” attribute. HTML 4.01 even says:

Note that the charset attribute refers to the character encoding of the script designated by the src attribute; it does not concern the content of the SCRIPT element.

Testing shows that actual browsers behavior also matches specs in this regard.

Searching for this pattern, reveals about 2000 occurrences. Not suprising, given that even popular apps like Textmate include wrong usage of charset.



Copyright http://perfectionkills.com/optimizing-html/

No comments:

Post a Comment