The Document Tree
The Document Tree is the relational structure of the HTML/XHTML document. It is not a standard implementation of what is normally referred to as a "Tree Structure" in most programming languages, but is rather implemented differently in several aspects (how it differs from other tree structures may be rather technical to describe here, I just didn't want to give the impression that if you're familiar with one of the standard tree structure that the information here will be completely redundant). Elements within an HTML/XHTML document form a document tree, that "links" them together in different relations. Every element within a document tree has exactly one
parent, except for the
root of the tree, which has none. So, in HTML, the root would be the
HTML element, which wraps all other elements, but is wrapped by none.
A visual representation helps a lot in this case. Here's a sample of HTML within a page:
<div id="container">
<h1>Main Heading</h1>
<p>Most programmers <em>rely</em> on caffine</p>
<p>Most programmers like Anime</p>
<div class="lister">
<h2>The 'P' Interpreted Languages</h2>
<ul>
<li>Perl</li>
<li>PHP</li>
<li>Python</li>
</ul>
</div>
</div>
Now, here's the visual representation of the document tree of the above HTML:
As you can see, the document tree, in relation to HTML, is all about which elements are contained within which other elements. The above example doesn't include the
BODY,
HEAD, or
HTML elements, so this isn't a complete representation of an HTML document, but that doesn't matter: a document tree is still valid when it's just a "branch" of elements that was cut off from a main tree. You could say that this document has a
pseudo root element, since the upper-most
DIV has no parents within this context, but we know that it can't just float in space, and that in a valid HTML/XHTML document, it must be contained within other elements, and/or eventually the
BODY of the page.
The Different Relationships
There are a set of different relationships that link the different elements together. Let's go over them one by one:
- Ancestor - An ancestor is an element that is connected to any other element, directly, or through any other element, but is above it in the document tree. All connecting steps must point downward. In the example above, the
DIV of id="container" is an ancestor to all of the other elements in that tree. Likewise, the DIV of class="lister" is an ancestor to the UL element, the three LI elements, and the H2 element.
- Descendant - A descendant is considered the opposite of the ancestor relationship. It refers to any element that is connected to any other element, directly, or through any other element, but is below it in the document tree. All connecting steps must point upward. In the example above, all of the other elements in that tree are the descendants of the
DIV of id="container". As well, the UL element, the three LI elements, and the H2 element, are all descendants of the DIV of class="lister".
- Parent - The parent is the element that is directly connected to the element below it. In the document tree, every element must have one parent, and only one parent, apart from the root element, which has none. As mentioned above, in HTML/XHTML, the root element is the
HTML element, which has no parent. In the example above, one of the P elements is a parent of an EM element, and the DIV of class="lister" is a parent to the H2, and the UL elements. Note: a parent element is always also an ancestor element. However, if it's both, then it's considered a parent first, and an ancestor later.
- Child - A child is similar to an opposite of the parent relationship, except that a parent can have multiple children. A child is an element that is directly connected to an element above it. In the example above, the
UL element has three LI children elements, and one of the P elements has an EM child element. Note: a child element is always also an descendant element. However, if it's both, then it's considered a child first, and an descendant later.
- Sibling - Siblings are a group of two or more elements that share the same parent. These elements must not be of the same type, they merely have to be children of the same parent element. In the example above, the two
P elements, the H1 element, and the DIV of class="lister", are all siblings, since they're all children of the DIV of id="container". An easier set of siblings to spot are of course the three LI elements, that share the same parent -- the UL element.
Here's another visual representation before we move on:
The above diagram brings into light another aspect:
multiple relationships. One element can have more than one relationship with its surrounding elements, and often does. For example, the
UL element is a child of the
DIV of
class="lister", a descendant of the
DIV of
id="container", has three
LI children elements, and is a sibling of the
H2 element. That's not just many relationships, it's many
types of relationships. Simply having many relationships is easy: The
DIV of
id="container" has 4 children and 6 descendants.
More Subtle Relatioships
There are some more subtle relationships, some are just semantic, and some are specific to the document tree.
Here are two semantic ones:
- Direct Descendant - This is somewhat of a semantic definition that means child. All it means is that it's a descendant that is directly connected to the element, which, by definition, makes that element a child element.
- Direct Ancestor - Another semantic definition, and you guessed it -- parent. This time it means is that it's an ancestor that is directly connected to the element, which, by definition, makes that element a parent element.
The reason for these are mainly for the purpose of clarity in definitions. For example, instead of saying
"The DIV of id="container" has 4 children and 6 descendants.", you could say:
"The DIV of id="container" has 10 descendants, 4 of which are direct-descendants".
The following relationships are what really set the Document Tree apart from other tree structures - especially the
Preceding/Following relationships, which go against many other standards, and may sometimes be difficult to understand. While these relationships bleed a bit into the Document Object model, they all apply to CSS in one way or another. Even the Following/Prededing apply when relating to the
margin property, because of the margins-collapse effect. With the exception of the Prededing/Following relationships, they all deal solely with siblings, so this will narrow down the scope of thought as you go through them:
- Preceding Sibling - This is sometimes referred to as the Immediate Prededing Sibling, or the Previous Child. The prededing sibling element is the immediate previous element, if the immediate previous element is also a sibling. For example, the
H1 element in the example above is the preceding sibling of the first P element, but not the second P element. Also, the UL is the preceding sibling of the H2 element.
- Following Sibling - This is sometimes referred to as the Immediate Following Sibling, or the Next Child. The following sibling element is the immediate followng element, if the immediate following element is also a sibling. For example, the last
P element in the example above is the following sibling of the first P element, and the last LI element is the following sibling of the second LI element, but not the first LI element.
- Adjacent - An adjacent element is best described when you already understand what the Preceding Sibling and Following Sibling relationships mean -- because it's simply either one of them. If an element is either a following sibling, or a preceding sibling of any other element, it is also adjacent to that element. In other words, if an element immediately follows, or is followed immediately by an element, and that element is also its sibling, than the two elements are ajdacent. In the example above, the first
P element is adjacent to the second P element, and to the H1 element. However, the H1 element and the second P element are not adjacent, since they have an element between them.
- Preceding - The preceeding relationship is sometimes difficult to understand at first glance. It's basically like the Preceding Sibling relationship, just without the Sibling part. If you think about it, what other preceding element could there be, that isn't a sibling? Well, in the example above, take a look at the first
LI element. In the HTML, what precedes it? That is to say, what opening tag comes before it? Technically, it's the UL. The UL element may contain the three LI elements, but it begins right before the first LI element. An element "appears" in the document tree where it begins, not where it ends. So the UL element may not be a Preceeding Sibling of the first LI element, but it is the Preceding element of the first LI element.
Possibly a quote from the W3C website would clarify better than I have: "An element A is called a preceding element of an element B, if and only if (1) A is an ancestor of B or (2) A is a preceding sibling of B.". The quote is from the "Conformance: Requirements and Recommendations" page of the "CSS 2.1 Specification" section, which can be found here.
- Following - The following relationship is described using the preceding relationship: Element one is considered to be a following element of element two, if element two is the preceding element of element one. So we can reverse the example above: Since the
UL element is the preceding element of the first LI element, then the LI element is the following element of the UL element.
- First Child - This is an easy one, and you may have already guessed what it means. It's simply the first child element of any parent in the document tree. In a group of two or more siblings, the first sibling to appear in the document is the first child. There exists some ambiguity in respect to "only children" in this respect. Technically, if there is only one child element, then it is both the first and the last child. However, depending on the implementation, it is sometimes only the first child. As far as CSS or the DOM is concerned, this shouldn't bother you. Any real-world situation you'll encounter will accomodate all possible relational options. This is really more of a lower-level software development debate.
- Last Child - Yep, you guessed it -- the last element within a group of siblings. Nothing much to add here, it's simply the opposite of the first child relationship -- it's the last child element in a document of any parent element.
Wrapping Up
I hope this cleared up the relational structures that are present in the document tree. This is required in quite a few fields that have to do with W3C document formats, such as XML, XHTML, the DOM, and other derived document formats. It is also required in order to properly implement CSS selectors, and many JavaScript functions/methods that relate to the DOM. By the way, if you really want to dig into the DOM spec, you can find the entire thing
here. It may be a bit intimidating at first, and rather abstract at times, but if you deal with many web-related standards then you'll be amazed at how many things it applies to, and how much of it you're already familiar with.