Some time publisher creates new website and want to remove old website from search engines, Publisher can do this by the help of “robots.txt” file.
“robots.txt” file is the text file in website server root, “robots.txt” file is used to request search engines for remove your site and prevent robots from crawling it in the future.
To prevent all robots from crawling your site,
Create file name “robots.txt” in your server root and paste following content in the “robots.txt” file:
User-agent: *
Disallow: /
To remove your site from Google only and prevent just Googlebot from crawling your site in the future, paste following content in the file:
User-agent: Googlebot
Disallow: /
Each port must have its own robots.txt file. In particular, if you serve content via both http and https, you’ll need a separate robots.txt file for each of these protocols. For example, to allow Googlebot to index all http pages but no https pages, you’d use the robots.txt files below.
For your http protocol (http://yourserver.com/robots.txt):
User-agent: *
Allow: /
For the https protocol (https://yourserver.com/robots.txt):
User-agent: *
Disallow: /
Note: A robot can discovers your site by other means - for example, by following a link to your URL from another site - your content may still appear in our index and our search results. To entirely prevent a page from being added to the Google index even if other sites link to it, use a noindex meta tag.
Some More Examples:
Examp1:
The following example “/robots.txt” file specifies that no robots should visit any URL starting with “/India/delhi/” or “/test/”, or /prince.html:
# robots.txt for http://www.princejain.com/
User-agent: *
Disallow: /India/delhi/ # This is an infinite virtual URL space
Disallow: /test/ # these will soon disappear
Disallow: /prince.html
Examp2:
This example “/robots.txt” file specifies that no robots should visit any URL starting with “/India/delhi /”, except the robot called “Googlebot”:
# robots.txt for http://www.princejain.com/
User-agent: *
Disallow: /India/delhi / # This is an infinite virtual URL space
# Googlebot knows where to go.
User-agent: Googlebot
Disallow:
Examp3:
This example indicates that no robots should visit this site further:
# go away
User-agent: *
Disallow: /