HTML vs XHTML

July 2002 by Philipp Lenssen

XHTML is the most recent reccomendation on how to write web pages. I will discuss what differs in the syntax, and why you might want to switch.

First, there was HTML3.2

Let's skip a bit of history. The world of commercial, complex web pages pretty much saw the light with HTML3.2. HTML3.2 is what some call the deprecated, table layout way of doing things. And it's still in use today on most web pages.

And then came HTML4.0/ HTML4.1

Actually, HTML4 was a step backward -- and a step forward at the same time. It goes back to things that should have been (and, if not for the browser wars, probably would have been) have already popular with HTML3.0. That is, HTML for structuring the content, and CSS for suggesting a presentation.

Last not least: XHTML1.0/ XHTML1.1

Is XHTML yet another approach? Hardly so. It's just based on a slightly different syntax. It's just as accessible (and downwards-compatible) as HTML4 was, so if you don't care about the syntax details, you won't find a reason to switch to it if you feel comfortable with HTML4.0/ HTML4.1.
Instead of SGML, which was the basis for HTML4, XHTML is now based on XML.

However, some technical reasons to indeed make the change are (you can check if they apply to you):

Syntactical differences

Document Type Declaration

Let's start with the DTD, the Document Type Declaration, referencing a Document Type Definition (yes, also called DTD, to add to confusion):

HTML4:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN">
<html>
...
</html>

XHTML1:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
...
</html>

Closing tags & empty elements

Now, that was easy. What might take a bit more of getting-used to is to close what once were optional end tags. Take the <p>-Element, for instance:

HTML4:
<p>This is the first paragraph ...
<p>And this is the second paragraph

XHTML1:
<p>This is the first paragraph ...</p>
<p>And this is the second paragraph</p>

Why is this important in XML? The answer is easy: the document tree can be built without knowing the document type. That is, wether or not the closing tag is optional and if the element can contain content. For example in HTML, if you didn't know what a <br>-Element was, you didn't know if it needed to be closed or could contain other tags (and as a result, if the following elements would be child elements of the br, or siblings).
In XML/ XHTML, this becomes obvious from the document itself:

HTML4:
<p>This is the first paragraph ... and here's a break.<br>
And this is the second line</p>

XHTML1:
<p>This is the first paragraph ... and here's a break.<br />
And this is the second line</p>

As you can see, there's an additional spacing and slash ("/") following the br. The spacing is just to not confuse older browsers that don't know XML/ XHTML. The "/"-character tells the browser this element is empty.

Lower-case/ Upper-case

In HTML, you could write element names in upper- or in lower-case letters, and they would still be the same. In XHTML, there is a difference between those two ways of writing it, because XML is case-sensitive. The creators of XHTML decided on lower-case because they had to decide on one thing, and so now XHTML elements and attributes have to be written all lower-case:

HTML4:
<ADDRESS ID="company">This is an address.</ADDRESS>

XHTML1:
<address id="company">This is an address.</address>

Omitted attribute names

In HTML4, one could use an attribute-name-value shortcut and simply omit the attribute-name. In XHTML1, attributes must always appear in full attribute-value-pair form. Also, the quotations are always necessary (as single, or double quotes), not just sometimes as it was the case with HTML4:

HTML4:
<option name=usa selected>USA</option>
<option name=germany>Germany</option>

XHTML1:
<option name="usa" selected="selected">USA</option>
<option name="germany">Germany</option>

Who wins the fight?

HTML vs XHTML, who wins the fight? In short, the answer is: you do.
Because you have two equally powerful choices, depending on your needs and working environment you are likely to prefer one over the other. If you don't have a preference today, my suggestion is to go for XHTML because it's the most recent suggestion from the W3C. Other than that, decide what available tools you have, and which HTML-flavor best suits your tools & taste.