PDF download Download Article PDF download Download Article

Search engines are equipped with robots, also known as spiders or bots, that crawl and index webpages. If your site or page is under development or contains sensitive content, you may want to block bots from crawling and indexing your site. Learn how to block entire websites, pages, and links with robots.txt files and block specific pages and links with <meta> </meta> html tags. Read on to discover how to block specific bots from accessing your content.

Method 1
Method 1 of 2:

Blocking Search Engines with robots.txt Files

PDF download Download Article
  1. A robots.txt file is a plain or ASCII text file that informs search engine spiders what they are allowed to access on your site. Files and folders listed in a robots.txt file may not be crawled and indexed by a search engine spiders. You may need a robots.txt file if:
    • You want to block specific content from search engine spiders.
    • You are developing a live site and are not prepared to have search engine spiders crawl and index the site
    • You want to limit access to reputable bots.
  2. To create the file, launch a plain text editor or a code editor. Save the file as: robots.txt. The file name must be all lowercase.
    • Do not forget the “s.”
    • When you save the file, choose the extension “'.txt”'. If you are using Word, select the “Plain Text” option.
    Advertisement
  3. It is possible to block every reputable search engine spider from crawling and indexing your site with a “full-disallow” robots.txt. Write the following lines in your text file:
      User-agent: *
      Disallow: /
    • Using a “full-disallow” robots.txt file is not strongly recommended. When a bot, such as Bingbot, reads this file, it will not index your site and the search engine will not display your website.
    • User-agents : this is another term for search engine spiders, or robots
    • * : the asterisk signifies that the code applies to all user-agents
    • Disallow: / : the forward slash indicates that the entire site is off-limits to bots [1]
  4. Instead of blocking all bots, consider blocking specific spiders from certain areas of your site. [2] Common conditional-allow commands include:
    • Block a specific bot: replace the asterisks next to User-agent with googlebot , googlebot-news , googlebot-image , bingbot , or teoma . [3]
    • Block a directory and its contents:
      User-agent: *
      Disallow: /sample-directory/
    • Block a webpage:
      User-agent: *
      Disallow: /private_file.html
    • Block an image:
      User-agent: googlebot-image
      Disallow: /images_mypicture.jpg
    • Block all images:
      User-agent: googlebot-image
      Disallow: /
    • Block a specific file format:
      User-agent: *
      Disallow: /p*.gif$
  5. Many people want to welcome, instead of block, search engine spiders because they want their entire site indexed. To accomplish this, you have three options. First, you can opt out of creating a robots.txt file—when the robot does not find a robots.txt file, it will continue to crawl and index your entire site. Second, you can create an empty robots.txt file—the robot will find the robots.txt file, recognize that it is empty, and continue to crawl and index your site. Lastly, you can write a full-allow robots.txt file. Use the code:
      User-agent: *
      Disallow:
    • When a bot, such as googlebot, reads this file, it will feel free to visit your entire site.
    • User-agents : this is another term for search engine spiders, or robots
    • * : the asterisk signifies that the code applies to all user-agents
    • Disallow : the blank disallow command indicates that all files and folders are accessible
  6. After you have written the robots.txt file, save the changes. Upload the file to your site's root directory. For example, if your domain is www.yourdomain.com , place the robots.txt file at www.yourdomain.com/robots.txt .
  7. Advertisement
Method 2
Method 2 of 2:

Blocking Search Engines with Meta Tags

PDF download Download Article
  1. The robots meta tag allows programmers to set parameters for bots, or search engine spiders. These tags are used to block bots from indexing and crawling an entire site or just parts of the site. You can also use these tags to block a specific search engine spider from indexing your content. These tags appear in the head of your HTML file. [4]
    • This method is commonly used by programmers that do not have access to a website's root directory.
  2. It is possible to block all bots from indexing a page and or from following a page's links. This tag is commonly used when a live site is under development. Once the site is complete, it is strongly recommended that you remove this tag. If you do not remove the tag, your page will not be indexed or searchable via search engines. [5]
    • You may block bots from indexing the page and from following any of the links:
       < 
       meta 
       name 
       = 
       ”robots” 
       content 
       = 
       “noindex, 
       nofollow 
        
       > 
      
    • You may block all bots from indexing the page:
       < 
       meta 
       name 
       = 
       ”robots” 
       content 
       = 
       “noindex” 
       > 
      
    • You may block all bots from following the page's links:
       < 
       meta 
       name 
       = 
       ”robots” 
       content 
       = 
       “nofollow” 
       > 
      
  3. If you allow the bots to index the page, the page will be indexed; if you prevent the spiders from following the links, the link path from this specific page to other pages will break. [6] Insert the following line of code into your header:
       < 
       meta 
       name 
       = 
       ”robots” 
       content 
       = 
       “index, 
       nofollow 
        
       > 
      
  4. If you allow the bots to follow the links the link path from this specific page to other pages will remain in tact; if you restrict them from indexing the page, your web page will not appear in the index. [7] Insert the following line of code into your header:
       < 
       meta 
       name 
       = 
       ”robots” 
       content 
       = 
       “noindex, 
       follow 
        
       > 
      
  5. To hide a single link on a page, embed a rel tag within the <a href> </a> link tag. You may wish to use this tag to block links on other pages that lead to the specific page you want to block. [8]
       < 
       a 
       href 
       = 
       "yourdomain.html" 
       rel 
       = 
       "nofollow" 
       > 
      Insert Link to Blocked Page </ 
       a 
       > 
      
  6. Instead of blocking all bots from your web page, you may wish to prevent one bot from crawling and indexing the page. To accomplish this, replace “'robot”' within the meta tag with the name of a specific bot. [9] Examples include: googlebot , googlebot-news , googlebot-image , bingbot , and teoma . [10]
       < 
       meta 
       name 
       = 
       ”bingbot” 
       content 
       = 
       “noindex, 
       nofollow 
        
       > 
      
  7. If you want to ensure that your page will be indexed and its links will be followed, you can insert a follow-allow meta “robot” tag into your header. [11] Use the following code:
       < 
       meta 
       name 
       = 
       ”robots” 
       content 
       = 
       “index, 
       follow 
        
       > 
      
  8. Advertisement

Community Q&A

Search
Add New Question
  • Question
    What does the phrase 'Blocking Search Engines with Meta Tags' mean? iI's ambiguous, is it a) Use Metatags to block Search Engines, or b) Block Search Engines that have Metatags?
    Pingu
    Top Answerer
    This article describes using meta tags to block search engines. You can use this method to block search engines regardless of whether they have meta tags or not.
Ask a Question
      Advertisement

      Tips

      Submit a Tip
      All tip submissions are carefully reviewed before being published
      Name
      Please provide your name and last initial
      Thanks for submitting a tip for review!

      About This Article

      Thanks to all authors for creating a page that has been read 236,543 times.

      Is this article up to date?

      Advertisement