Is it only me, or do these terms on the title – Robots.txt and Meta Robots Tag sound scary? At the beginning of my career, I underwent my share of struggles to get used to these terminologies. Through this blog, I wish to help you understand these concepts better and teach you how to set them up.
I hope this blog helps you understand the why, what, and how of Robots.txt and meta robots tag.
First, let’s start with the basics.
Before discussing the terms robots.txt and meta robots tag, let’s talk about web crawlers. A web crawler ( a.k.a web spider or robot) is a bot that crawls and indexes on different websites to collect information. Each search engine has its own web crawler; Google has Googlebot, Bing has Bingbot, etc.)
Now, let’s get started!
What Are robots.txt?
robots.txt instructs the crawlers on what to do, where to go, and where to not go. It is a component of the Robots Exclusion Protocol (REP), a set of guidelines that govern how robots crawl and index content on the Internet. It may appear complicated and technical, but creating a robots.txt file is a simple process.
Let’s take a domain example: https://digitalscholar.in/
Now, when the web crawlers try to reach your website, they’ll go to https://digitalscholar.in/robots.txt
If there’s no file available on this webpage, the crawler will go ahead and scan your entire website. However, if you enter instructions, the crawlers will follow them. (It is possible that the web crawlers can ignore your instructions and access your information anyways. Big companies like Google and Microsoft follow these instructions. Still, smaller data aggregators and hackers may choose to ignore these instructions.)
Here’s what the minimum content of data aggregators looks like (In the examples below, “*” indicates that the instruction is applicable for all search engines.)
-
The following content allows the web crawlers access to all the pages of your website since you haven’t disallowed anything.
User-agent: *
Disallow:
-
For the next example, since there’s a “/” against disallow, you are forbidding the search engines from accessing your entire website.
User-agent: *
Disallow: /
-
For blocking a particular search engine from accessing your information, you can substitute the * with their name. For example, if you want to block only Googlebot, this is what you’d do:
User-agent: Googlebot
Disallow: /
-
If you want to disallow only images on your website, this is what you can do:
User-agent: *
Disallow: /images/
Best SEO Practices To Set Up robots.txt
-
Research and practice: Using another website or business’ robots.txt file will not help you. You need to research what works better for your website, what content you want to block, and what you’d like to share. Personally, I block duplicate pages, dynamic product and service pages, account pages, admin pages, shopping carts, chats, and thank you pages.
-
The second thing to remember is to place your file in the main root directory. If you fail to do so, the web crawlers won’t find your robots.txt page.
-
Check for mistakes: The only proper way to write the text is robots.txt. Any caps or special characters will make it invalid.
-
The other common mistake is trying to disallow multiple folders/files under one command. Each folder/file requires a separate “disallow” line. Similarly, for each bot you block, you’ll have to create a separate line.
Online digital marketing courses help you understand the step by step process of setting up robots.txt better. However, I hope you got a brief idea about the same.
Let’s dive into the concepts of Meta Robots Tag and understand it better!
Meta Robots Tag
What are Meta Robots tags?
Through meta robots tags, you can instruct search engines on how to index or crawl your page. Also called meta robots directives, they are HTML code snippets placed in the <head> section of your website.
There are two parts to a typical meta robots tag command:
-
name='” (this is where you identify the search engine you wish to command, e.g. Googlebot)
-
content=” (you can pass the instructions to the bots in this command)
So, a typical meta robots tag would look like this:
<meta name=”robots” content=”noindex” />
There are two types of tag: Meta robots tag and X-robots tag.
-
Meta robots tag: You can allow search engines to crawl specific areas of your website through the Meta robots tag. These are commonly used by SEO marketers and specialists. For example, if you want to stop Googlebot from indexing and following any backlinks, this is what your tag would look like:
<meta name=”googlebot” content=”noindex,nofollow”>
-
X-robots-tag: The X-robots-tag allows for more functionality than the meta robots tag. You can block a particular image or video instead of blocking an entire page through X-robots-tag.
We’ve discussed what Meta robots tags are, and we’ve discussed their types. Now, let’s understand the different types of commands you can give under the Meta Robots tag.
-
Index – To allow indexing access for all search engines.
-
Follow – To allow search engines to follow the internal links on your website.
-
Noindex – To prevent indexing access for all search engines.
-
Nofollow – To prevent search engines from following internal links on your website. This is not the same as the rel= “nofollow” link attribute.
-
No archive – To make sure that cached copies of the page don’t appear in the SERPs
-
Nosnippet –The page will not be cached, and descriptions will not appear underneath the page in the SERPs.
-
NOODP – Prevents the page’s description from being overwritten by the Open Directory Project description.
-
Noimageindex – Stops the photos on the website from being indexed by Google.
-
Notranslate – Stops the photos on the website from being indexed by Google.
Best SEO Practices To Set Up robots.txt
-
Stick to lowercase: Both uppercase and lowercase properties, values, and parameters are recognized by search engines. I recommend using lowercase letters to make your code easier to read. Furthermore, if you’re an SEO marketer, you should get into the practice of employing lowercase.
-
To keep away any conflicts in your code, avoid using multiple <meta> tags. If you do use multiple tags in a code, use it in this way:
<meta name=“robots” content=“noindex, nofollow”>.
-
To avoid indexing errors, avoid using incompatible meta tags. If you have numerous code lines with meta tags like meta name=”robots” content=”follow”> and meta name=”robots” content=”nofollow”>, only “nofollow” will be considered. This is because robots prioritize limiting values.
That’s it about Meta Robots Tag! Let’s see how you should work with these two together.
robots.txt And Meta Robots Tag: How to Work With Them Together?
We already understood how to work with robots.txt and Meta Robots tag as unique codes. However, it is more important to know how to work with them together. You’ll start noticing problems if your file and tag contradict each other. The robots.txt file, for example, prevents the page from being indexed, whereas the meta robots tags do the opposite.
Google generally listens to the robots.txt file in such a case, but it’s always better to check your file and tag thoroughly.
Summing Up:
A robots.txt file is best for blocking an entire portion of a site, such as a category. Still, a meta tag is better for blocking individual files and pages. Although neither the meta robots tag nor the robots.txt file has authority over the other, “noindex” always has authority over “index” requests.
In general, I recommend utilizing a “Noindex” meta tag rather than a robots.txt directive to deindex a page or directory from Google’s Search Results. This technique will deindex your page the next time you’re site is crawled, eliminating the need to make a URL removal request. Using the ‘follow’ command, using a meta robots tag also ensures that your link equity is not lost.
I hope this blog gives you more clarity on robots.txt and meta robots tag – and you can protect your website like a pro!