The
GOOGLE Search Engine
Introduction
Google
runs on a unique combination of advanced hardware and software. The speed you
experience can be attributed in part to the efficiency of our search algorithm
and partly to the thousands of low cost PC's that are networked together to
create a super fast search engine.
PageRank
The
heart of their software is PageRank™, a system for ranking web pages. PageRank
relies on the uniquely democratic nature of the web by using its vast link
structure as an indicator of an individual page's value. In essence, Google
interprets a link from page A to page B as a vote, by page A, for page B. But,
Google looks at more than the sheer volume of votes, or links a page receives;
it also analyzes the page that casts the vote. Votes cast by pages that are
themselves "important" weigh more heavily and help to make other pages
"important."
Important,
high-quality sites receive a higher PageRank, which Google remembers each time
it conducts a search. Of course, important pages mean nothing to you if they
don't match your query. So, Google combines PageRank with sophisticated
text-matching techniques to find pages that are both important and relevant to
your search. Google goes far beyond the number of times a term appears on a page
and examines all aspects of the page's content (and the content of the pages
linking to it) to determine if it's a good match for your query.
Design
and Content Guidelines:
Make
a site with a clear hierarchy and text links. Every page should be reachable
from at least one static text link.
Offer
a site map to your users with links that point to the important parts of
your site. If the site map is larger than 100 or so links, you may want to
break the site map into separate pages.
Create
a useful, information-rich site and write pages that clearly and accurately
describe your content.
Think
about the words users would type to find your pages, and make sure that your
site actually includes those words within it.
Try
to use text instead of images to display important names, content, or links.
The Google crawler doesn't recognize text contained in images.
Make
sure that your TITLE and ALT tags are descriptive and accurate.
Check
for broken links and correct HTML.
If
you decide to use dynamic pages (i.e., the URL contains a '?' character), be
aware that not every search engine spider crawls dynamic pages as well as
static pages. It helps to keep the parameters short and the number of them
small.
Keep
the links on a given page to a reasonable number (fewer than 100).
Technical
Guidelines:
Use
a text browser such as Lynx to examine your site, because most search engine
spiders see your site much as Lynx would. If fancy features such as
Javascript, cookies, session ID's, frames, DHTML, or Flash keep you from
seeing all of your site in a text browser, then search engine spiders may
have trouble crawling your site.
Allow
search bots to crawl your sites without session ID's or arguments that track
their path through the site. These techniques are useful for tracking
individual user behavior, but the access pattern of bots is entirely
different. Using these techniques may result in incomplete indexing of your
site, as bots may not be able to eliminate URLs that look different but
actually point to the same page.
Make
sure your web server supports the If-Modified-Since HTTP header. This
feature allows your web server to tell Google whether your content has
changed since we last crawled your site. Supporting this feature saves you
bandwidth and overhead.
Make
use of the robots.txt file on your web server. This file tells crawlers
which directories can or cannot be crawled. Make sure it's current for your
site so that you don't accidentally block the Googlebot crawler.
If
your company buys a content management system, make sure that the system can
export your content so that search engine spiders can crawl your site.
When
your site is ready:
Once
your site is online, submit it to Google.
Make
sure all the sites that should know about your pages are aware your site is
online.
Submit
your site to relevant directories such as the Open Directory Project and
Yahoo.
Periodically
review Google's Webmaster section for more information.
Quality
Guidelines - Basic principles:
Make
pages for users, not for search engines. Don't deceive your users, or
present different content to search engines than you display to users.
Avoid
tricks intended to improve search engine rankings. A good rule of thumb is
whether you'd feel comfortable explaining what you've done to a website that
competes with you. Another useful test is to ask, "Does this help my
users? Would I do this if search engines didn't exist?"
Don't
participate in link schemes designed to increase your site's ranking or
PageRank. In particular, avoid links to web spammers or "bad
neighborhoods" on the web as your own ranking may be affected adversely
by those links.
Don't
use unauthorized computer programs to submit pages, check rankings, etc.
Such programs consume computing resources and violate our terms of service.
Google does not recommend the use of products such as WebPosition Gold™
that send automatic or programmatic queries to Google.
Quality
Guidelines - Specific recommendations:
Avoid
hidden text or hidden links.
Don't
employ cloaking or sneaky redirects.
Don't
send automated queries to Google.
Don't
load pages with irrelevant words.
Don't
create multiple pages, subdomains, or domains with substantially duplicate
content.
Avoid
"doorway" pages created just for search engines, or other
"cookie cutter" approaches such as affiliate programs with little
or no original content.
These quality guidelines cover the most common forms of deceptive or manipulative behavior, but Google may respond negatively to other misleading practices not listed here, (e.g. tricking users by registering misspellings of well-known web sites. It's not safe to assume that just because a specific deceptive technique isn't included on this page, Google approves of it. Webmasters who spend their energies upholding the spirit of the basic principles listed above will provide a much better user experience and subsequently enjoy better ranking than those who spend their time looking for loopholes they can exploit.