How to control indexing of your website in search engines
Posted by: Prince in SEO, Solutions Add commentsHow to remove webpage of website from search engines using meta tags?
This approach is suitable when user does not have root access of server and user is not able to create “robots.txt” file.
To prevent all robots from indexing a page on your site, place the following meta tag into the <head> section of your page:
<meta name=”robots” content=”noindex”>
To allow other robots to index the page on your site, preventing only Google’s robots from indexing the page:
<meta name=”googlebot” content=”noindex”>
When google see the noindex meta tag on a page, Google will completely drop the page from search results, even if other pages link to it. Other search engines, however, may interpret this directive differently. As a result, a link to the page can still appear in their search results.
If the content is currently in google index, Google will remove it after the next time crawl the site. To expedite removal, use the URL removal request tool in Google Webmaster Tools.
What is a Robot Meta Tag?
You can use a special HTML <META> tag to tell robots not to index the content of a page, and/or not scan it for links to follow.
For example:
<html>
<head>
<title>Test Page</title>
<META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”>
</head>
There are two important considerations when using the robots <META> tag:
- robots can ignore your <META> tag. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
- the NOFOLLOW directive only applies to links on this page. It’s entirely likely that a robot might find the same links on some other page without a NOFOLLOW (perhaps on some other site), and so still arrives at your undesired page.
How to write a Robots Meta Tag?
Where to put it:
Like any <META> tag it should be placed in the HEAD section of an HTML page, as in the example above. You should put it in every page on your site, because a robot can encounter a deep link to any page on your site.
What to put into:
robots meta tag have two attributes “NAME” and “CONTENT” attribute.
The “NAME” attribute must be “ROBOTS”.
Valid values for the “CONTENT” attribute are: “INDEX”, “NOINDEX”, “FOLLOW”, “NOFOLLOW”. Multiple comma-separated values are allowed, but obviously only some combinations make sense. If there is no robots <META> tag, the default is “INDEX, FOLLOW”, so there’s no need to spell that out. That leaves:
<META NAME=”ROBOTS” CONTENT=”NOINDEX, FOLLOW”>
<META NAME=”ROBOTS” CONTENT=”INDEX, NOFOLLOW”>
<META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”>
How to remove cached copies of web pages using robots meta tag?
Google automatically takes a “snapshot” of each page it crawls and archives it. This “cached” version allows a webpage to be retrieved for your end users if the original page is ever unavailable. The cached page appears to users exactly as it looked when Google last crawled it, and google display a message at the top of the page to indicate that it’s a cached version. Users can access the cached version by choosing the “Cached” link on the search results page.
Before you begin, you must do one of the following:
To update the cached version of a page:
change the content of the page. The next time Google crawls the page, It will update the cached version.
To removed cached versions of a page from Google’s index and prevent Google from caching the page in the future:
you must add a noarchive meta tag to that page. The next time we crawl that site, we’ll see the tag and remove the page.
To prevent all search engines from showing a “Cached” link for your site, place this tag in the <HEAD> section of your page:
<meta name=”robots” content=”noarchive”>
To prevent only Google from displaying one, use the following tag:
<meta name=”googlebot” content=”noarchive”>
Once this is complete, you can use the URL removal tool in Webmaster Tools to request expedited removal of the cached content for a minimum of six months.
How to remove snippets that appear below web pages in Google search results and describe the content of your page?
A snippet is a text excerpt that appears below a page’s title in our search results and describes the content of the page.
To prevent Google from displaying snippets for your page, place this tag in the <HEAD> section of your page:
<meta name=”googlebot” content=”nosnippet”>
Note: Removing snippets also removes cached pages.
How to remove outdated pages from google index by returning proper server response?
Google updates its entire index regularly. When google crawl the web, it automatically find new pages, remove outdated links, and reflect updates to existing pages, keeping the Google index fresh and as up-to-date as possible.
If outdated pages from your site appear in the search results, ensure that the pages return a status of either 404 (not found) or 410 (gone) in the header. These status codes tell Googlebot that the requested URL isn’t valid.
How to remove images from Google Image Search using a robots.txt file?
To remove an image from Google’s image index, add a robots.txt file to the root of the server that blocks the image.
For example, if you want Google to exclude the logo.jpg image that appears on your site at www.yoursite.com/images/logo.jpg, add the following to your robots.txt file:
User-agent: Googlebot-Image
Disallow: /images/logo.jpg
To remove all the images on your site from google index, place the following robots.txt file in your server root:
User-agent: Googlebot-Image
Disallow: /
Additionally, Google has introduced increased flexibility to the robots.txt file standard through the use asterisks. Disallow patterns may include “*” to match any sequence of characters, and patterns may end in “$” to indicate the end of a name. To remove all files of a specific file type (for example, to include .jpg but not .gif images), you’d use the following robots.txt entry:
User-agent: Googlebot-Image
Disallow: /*.gif$
March 13th, 2010 at 10:21 am
I read a article under the same title some time ago, but this articles quality is much, much better.
April 1st, 2010 at 9:33 am
Great post thx!
April 12th, 2010 at 9:08 pm
Thanks for the great post. I always like to save concrete or construction related posts like this one.