Download Article
Download Article
Search engines are equipped with robots, also known as spiders or bots, that crawl and index webpages. If your site or page is under development or contains sensitive content, you may want to block bots from crawling and indexing your site. Learn how to block entire websites, pages, and links with robots.txt files and block specific pages and links with <meta> </meta> html tags. Read on to discover how to block specific bots from accessing your content.
Steps
-
Understand robots.txt files. A robots.txt file is a plain or ASCII text file that informs search engine spiders what they are allowed to access on your site. Files and folders listed in a robots.txt file may not be crawled and indexed by a search engine spiders. You may need a robots.txt file if:
- You want to block specific content from search engine spiders.
- You are developing a live site and are not prepared to have search engine spiders crawl and index the site
- You want to limit access to reputable bots.
-
Create and save and robots.txt file. To create the file, launch a plain text editor or a code editor. Save the file as: robots.txt. The file name must be all lowercase.
- Do not forget the “s.”
- When you save the file, choose the extension “'.txt”'. If you are using Word, select the “Plain Text” option.
Advertisement -
Write a full-disallow robots.txt file. It is possible to block every reputable search engine spider from crawling and indexing your site with a “full-disallow” robots.txt. Write the following lines in your text file:
- Using a “full-disallow” robots.txt file is not strongly recommended. When a bot, such as Bingbot, reads this file, it will not index your site and the search engine will not display your website.
- User-agents : this is another term for search engine spiders, or robots
- * : the asterisk signifies that the code applies to all user-agents
- Disallow: / : the forward slash indicates that the entire site is off-limits to bots [1] X Research source
User-agent: * Disallow: /
-
Write a conditional-allow robots.txt file. Instead of blocking all bots, consider blocking specific spiders from certain areas of your site. [2] X Research source Common conditional-allow commands include:
- Block a specific bot: replace the asterisks next to User-agent with googlebot , googlebot-news , googlebot-image , bingbot , or teoma . [3] X Research source
- Block a directory and its contents:
User-agent: * Disallow: /sample-directory/
- Block a webpage:
User-agent: * Disallow: /private_file.html
- Block an image:
User-agent: googlebot-image Disallow: /images_mypicture.jpg
- Block all images:
User-agent: googlebot-image Disallow: /
- Block a specific file format:
User-agent: * Disallow: /p*.gif$
-
Encourage bots to index and crawl your site. Many people want to welcome, instead of block, search engine spiders because they want their entire site indexed. To accomplish this, you have three options. First, you can opt out of creating a robots.txt file—when the robot does not find a robots.txt file, it will continue to crawl and index your entire site. Second, you can create an empty robots.txt file—the robot will find the robots.txt file, recognize that it is empty, and continue to crawl and index your site. Lastly, you can write a full-allow robots.txt file. Use the code:
- When a bot, such as googlebot, reads this file, it will feel free to visit your entire site.
- User-agents : this is another term for search engine spiders, or robots
- * : the asterisk signifies that the code applies to all user-agents
- Disallow : the blank disallow command indicates that all files and folders are accessible
User-agent: * Disallow:
-
Save the txt file to the root of your domain. After you have written the robots.txt file, save the changes. Upload the file to your site's root directory. For example, if your domain is www.yourdomain.com , place the robots.txt file at www.yourdomain.com/robots.txt .
Advertisement
-
Understand HTML robots meta tags. The robots meta tag allows programmers to set parameters for bots, or search engine spiders. These tags are used to block bots from indexing and crawling an entire site or just parts of the site. You can also use these tags to block a specific search engine spider from indexing your content. These tags appear in the head of your HTML file. [4] X Research source
- This method is commonly used by programmers that do not have access to a website's root directory.
-
Block bots from a single page. It is possible to block all bots from indexing a page and or from following a page's links. This tag is commonly used when a live site is under development. Once the site is complete, it is strongly recommended that you remove this tag. If you do not remove the tag, your page will not be indexed or searchable via search engines. [5] X Research source
- You may block bots from indexing the page and from following any of the links:
< meta name = ”robots” content = “noindex, nofollow ” >
- You may block all bots from indexing the page:
< meta name = ”robots” content = “noindex” >
- You may block all bots from following the page's links:
< meta name = ”robots” content = “nofollow” >
- You may block bots from indexing the page and from following any of the links:
-
Allow the bots to index a page, but not follow its links. If you allow the bots to index the page, the page will be indexed; if you prevent the spiders from following the links, the link path from this specific page to other pages will break. [6] X Research source Insert the following line of code into your header:
< meta name = ”robots” content = “index, nofollow ” >
-
Let the search engine spiders follow the links but not index the page. If you allow the bots to follow the links the link path from this specific page to other pages will remain in tact; if you restrict them from indexing the page, your web page will not appear in the index. [7] X Research source Insert the following line of code into your header:
< meta name = ”robots” content = “noindex, follow ” >
-
Block a single outgoing link. To hide a single link on a page, embed a rel tag within the <a href> </a> link tag. You may wish to use this tag to block links on other pages that lead to the specific page you want to block. [8] X Research source
< a href = "yourdomain.html" rel = "nofollow" > Insert Link to Blocked Page </ a >
-
Block a specific search engine spider. Instead of blocking all bots from your web page, you may wish to prevent one bot from crawling and indexing the page. To accomplish this, replace “'robot”' within the meta tag with the name of a specific bot. [9] X Research source Examples include: googlebot , googlebot-news , googlebot-image , bingbot , and teoma . [10] X Research source
< meta name = ”bingbot” content = “noindex, nofollow ” >
-
Encourage bots to crawl and index your page. If you want to ensure that your page will be indexed and its links will be followed, you can insert a follow-allow meta “robot” tag into your header. [11] X Research source Use the following code:
< meta name = ”robots” content = “index, follow ” >
Advertisement
Community Q&A
Search
-
QuestionWhat does the phrase 'Blocking Search Engines with Meta Tags' mean? iI's ambiguous, is it a) Use Metatags to block Search Engines, or b) Block Search Engines that have Metatags?PinguTop AnswererThis article describes using meta tags to block search engines. You can use this method to block search engines regardless of whether they have meta tags or not.
Ask a Question
200 characters left
Include your email address to get a message when this question is answered.
Submit
Advertisement
Tips
Submit a Tip
All tip submissions are carefully reviewed before being published
Name
Please provide your name and last initial
Thanks for submitting a tip for review!
References
- ↑ https://support.google.com/webmasters/answer/6062596?hl=en
- ↑ https://support.google.com/webmasters/answer/6062596?hl=en
- ↑ https://www.elegantthemes.com/blog/tips-tricks/how-to-stop-search-engines-from-indexing-specific-posts-and-pages-in-wordpress
- ↑ https://searchenginewatch.com/sew/how-to/2067564/how-to-use-html-meta-tags
- ↑ https://searchenginewatch.com/sew/how-to/2067564/how-to-use-html-meta-tags
- ↑ https://searchenginewatch.com/sew/how-to/2067564/how-to-use-html-meta-tags
- ↑ https://searchenginewatch.com/sew/how-to/2067564/how-to-use-html-meta-tags
- ↑ https://css-tricks.com/snippets/html/meta-tag-to-prevent-search-engine-bots/
- ↑ https://css-tricks.com/snippets/html/meta-tag-to-prevent-search-engine-bots/
About This Article
Thanks to all authors for creating a page that has been read 236,471 times.
Advertisement