Minggu, 20 Juli 2008

Block or remove pages using a robots.txt file

You can use a robots.txt file to block Googlebot from crawling pages on your site.

For example, if you're manually creating a robots.txt file, to block Googlebot from crawling all pages under a particular directory (for example, lemurs), you'd use the following robots.txt entry:

User-agent: Googlebot
Disallow: /lemurs/

To block Googlebot from crawling all files of a specific file type (for example, .gif), you'd use the following robots.txt entry:

User-agent: Googlebot
Disallow: /*.gif$

To block Googlebot from crawling any URL that includes a ? (more specifically, any URL that begins with your domain name, followed by any string, followed by a question mark, followed by any string):

User-agent: Googlebot
Disallow: /*?

While we won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web. As a result, the URL of the page and, potentially, other publicly available information such as anchor text in links to the site, or the title from the Open Directory Project (www.dmoz.org) can appear in Google search results. However, no content from your pages will be crawled, indexed, or displayed.

To entirely prevent a page from being added to the Google index even if other sites link to it, use a noindex meta tag, and ensure that the page does not appear in robots.txt. When Googlebot crawls the page, it will recognize the noindex meta tag and drop the URL from the index.

artikel asli dapat dibaca penuh di http://www.google.com/support/webmasters/bin/answer.py?answer=35303

Tidak ada komentar: