WWW FAQs: What are HTML and XHTML?

2003-09-04: XHTML, which stands for Extensible HyperText Markup Language, is a simple markup language used to make web pages.

Although all modern word processors and many specialized tools can be used to make web pages without learning XHTML at all, learning XHTML itself is a useful way to learn more about the web and provides more control over the results. Luckily, XHTML is very simple and quite easy to learn.

What's this XHTML stuff? What happened to HTML?

XHTML is the latest generation of HTML. HTML was originally intended to be an instance of SGML, a general-purpose markup language. But many HTML pages do not comply with the requirements of SGML, which makes HTML tougher for computers to work with in useful ways.

In more recent years, the World Wide Web Consortium has taken steps to correct the problem. SGML has been largely replaced by XML (Extensible Markup Language), a new general-purpose markup language that is easier to work with than SGML. And XHTML, which replaces HTML, is a newer standard which complies fully with the requirements of XML but remains compatible with older web browsers.

A Simple Example

Here is a simple example of a valid XHTML document. To try this out for yourself, simply create a new file called mypage.html with any text editor, such as Windows notepad. Paste in the HTML below, make any changes that please you, and save the document. Then pick "open" from the File menu of your web browser, locate the file you have just made, and open it. If you make further changes, you will need to "save" again and then click "reload" or "refresh" in your browser to see the results.

Of course, this is just a simple example. XHTML can do far, far more than this. A complete tutorial can be found at Dave's HTML Guide.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
<title>Title of My Page Goes Here</title>
<h1>Heading Of My Page Goes Here</h1>
<a href="http://news.google.com/">Follow this link to Google News</a>
Here is a picture of my cat:
<img src="cat.jpg" alt="Photograph of my cat"/>

What's That DOCTYPE About?

Good grief! Most of this looks friendly enough, but what's that scary "DOCTYPE" line all about?

The DOCTYPE tells the web browser what version of XHTML we're using. In this case I've specified XHTML 1.0 Strict, because this code is 100% compliant with the rules of XHTML. You don't need to understand this line in detail - just know that you should include it if you plan to write standards-compliant web pages. And you should.

Those who must use HTML elements that aren't included in strict XHTML can use the "transitional DTD" (Data Type Declaration) instead:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

Understanding XHTML: A Basic Introduction

The XHTML elements in the page above are nearly self-explanatory. All elements that describe the page but are not actually part of the content appear inside the head element. All of the elements that actually make up the visible page itself are part of the body element. Everything between the opening <head> "tag" and the closing </head> "tag" is considered a part of the head element. The same goes for body. And everything should be contained within a single html element.

The text between <h1> and </h1> is displayed as a "level one heading," which is typically a very large, bold font.

The p element encloses a paragraph. In strict XHTML, most elements such as images and links must be enclosed in a paragraph or another "block-level" element.

The text between the opening and closing <a> and </a> "tags" becomes a link to another web page; the URL of the web page to be linked to is found in the HREF attribute of the <a> element as shown in the example above.

The <img> element includes an image in the page; the image is displayed at that point in the page, as long as the image file specified by the URL in the src attribute actualy exists. Since the src attribute I used here contains a simple filename, the cat picture will be shown as long as the file cat.jpg is in the same directory as the page. The same trick can be used in href attributes in <a> elements, to conveniently link to pages in the same directory. For more information about images and how to create them in formats appropriate for the web, see the image file formats entry.

The alt attribute of the img element contains text to be displayed to blind users. XHTML requires it, and since this text is also read by search engines like Google, it's important to include it - Google probably won't know your page is about cats if there is no text about cats on the page!

The "alt text" should describe the image in a useful way for those (including both computers and people) who cannot otherwise see it.

The <img> element has a / before the > to signify that it is not a container and that no closing </img> is expected.

What To Do With Your XHTML Page

Of course, a web page sitting in a file on your own computer is not yet visible to anyone in the outside world. See the setting up websites entry to learn more about how to create websites that others can see.

Legal Note: yes, you may use sample HTML, Javascript, PHP and other code presented above in your own projects. You may not reproduce large portions of the text of the article without our express permission.

Got a LiveJournal account? Keep up with the latest articles in this FAQ by adding our syndicated feed to your friends list!