new "ignore this content" tag for robots

Submitted by greggles on Sun, 2006-11-05 22:53

One of the problems that search engines hit on is that a page is just a blob of text to them. And when they look for keywords in that text they aren't 100% sure if they are looking in the "content" of the page or in links to other pages.

This can cause serious problems!

Poor little search engines

Here's an example from my site. Right now, a search for mp3 transfer wind energy returns my page as the top example. Not surprising because that's not exactly a difficult set of keywords to rank on. What is surprising is that the page that ranks is one where those words appear together, but not in the content. "mp3 transfer" is a popular page relating to a service I offer to transfer cds to mp3s if someone buys themself a new mp3 device or something. The wind power is an article I wrote...about wind power in Colorado. So, on the wind power page, you get a list of popular content which includes the mp3 transfer page. And that confuses the search engines. Bummer!

Solution: noindex html tag

The solution is to let the search engines know "here's some stuff that's not worth it for you to index. It could be implemented as a span or div so that we don't have to introduce a whole new html tag, but it would be a valuable little tool!

Gaming the tool

Ok, great, so now people would start gaming this tag. They would stick parts of their page that they don't want indexed (something that might bring down their ranking somehow?) into the noindex tag. But I'm not sure I see this as a real problem...The nofollow tag already lets the search engines know not to index a link. I'm just having a hard time seeing how an expanded version of nofollow that did whole blocks of content could really be worse.

Category:

People Involved:

timeline: