April 16, 2003

Using HTML tags properly to help external search results

There are some essentials to building Web pages that get found with external search engines. Understanding the tags in HTML and how they are (rather should be) used is important. The main tags for most popular search engines are the title, heading (h1, h2, etc), paragraph (p), and anchor (a). Different search engines have given some weight in their ranking to metatags, but most do not use them or have decreased their value.

Google gives a lot of weight to the title tag, which is often what shows in the link Google gives its user to click for the entry. In the title tag the wording is important too, as the most specific information should be toward the front. A user searching for news may find a weblog toward the top of the search ahead of CNN, as CNN puts its name ahead of the title of the article. A title should echo the contents of the page as that will help the ranking of the pages, titles that are not repeated can get flagged for removal from search engines.

The headings help echo what is in the title and provide breaking points in the document. Headings not only help the user scan the page easily, but also are used by search engines to ensure the page is what it states it is. The echoing of terms are used to move an entry to the top of the rankings as the mechanical search engines get reinforcement that the information is on target for what its users may be seeking.

The paragraph tags also are used to help reinforce the text within them.

The anchor tags are used for links and this is what the search engines use to scrape and find other Web pages. The text used for the links is used by the search engines to weight their rankings also. If you want users to find information deep in your site put a short clear description between the anchor tags. The W3C standards include the ability to use a title attribute which some search tools also use. The title attribute is also used by some site readers (used by those with visual difficulties and those who want their information read aloud to them, because they may be driving or have their hands otherwise occupied) to replace the information between the anchor tags or to augment that information.


The application I built to manage this weblog section is build to use each of these elements. This often results in high rankings in Google (and relatedly Yahoo), but this is not the intent, I am just a like fussy in that area. It gets to be very odd when my posting weblog posting review of a meal at Ten Penh is at the top or near the top of a Google Ten Penh search. The link for the Ten Penh restaurant is near the bottom of the first page.

Why is the restaurant not the top link? There are a few possible reasons. The restaurant page has its name at "tenpenh" in the title tag, which is very odd or sloppy. The page does not contain a heading tag nor a paragraph tag as the site is built with Flash. The semantic structure in Flash, for those search engines that scrape Flash. Equally the internal page links are not read by a search engine as they are in Flash also. A norm for many sites is having the logo of the site in the upper left corner clickable to the home page of the site, which with the use of the alt attribute in a image tag within an anchor link allow for each page to add value to the home page rant (if the alt attritute would have "Ten Penh Home" for example).

Not only does Flash hinder the scapeing of information the use of JavaScript links wipes out those as means to increase search rankings. Pages with dynamic links that are often believed to ease browsing (which may or may not prove the case depending on the site's users and the site goals in actual user testing) hurt the information in the site for being found by external search engines. JavaScript is not scrapable for links or text written out by JavaScript.

Posted Comments

Great post. I had recently seen a design firm mentioned in the "who's what, where" section of a newspaper's help wanteds (it's blurbs about new hires and promotions at local companies). So I did a Google search for the firm's name but couldn't find their webpage directly because it's built entirely in Flash. (I ended up finding it indirectly, by looking at other pages that had mentioned the firm's name.)


