What-Is-A-Robots.txt-File-and-Why-Does-It-Matter-For-SEO-On-My-Website-Feature

How to Find and Use the Robots.TXT File on Your Website.

In today’s solo episode, Jesse explains the use of robots.txt file and its value in helping you to meet your SEO goals. Through this episode, Jesse explains what robots.txt can (and in some ways cannot) do to protect your website, while also giving useful tips to show how using robots.txt can minimize the effect of web crawling bots on your analytics. Whether you’re a long time listener or a first-timer checking us out, this episode has great insights that will help you improve your web presence! Watch or listen to it today!
If you find this show provides great value to you, leave us a review on your podcast platform of choice! We appreciate the feedback and we read a 5-star review every week on the show. Thanks for checking us out!

Don’t miss an episode – listen on iTunes, Google Podcasts, SpotifyStitcher, Android Apps, or RSS!

What you’ll learn

  • What is in robots.txt that makes it relevant to SEO.
  • How to use robots.txt effectively.
  • How to check for your robots.txt file.

Transcript For What is a Robots.TXT File and Does it Matter for My SEO? – 111;

Caleb Baumgartner: Welcome to Local SEO Tactics, where we bring you tips and tricks to get found online. I am producer Caleb Baumgartner and in today’s episode, Jesse explains the robot.txt file. And in what way this file is associated with crawler bots like those used with Google and Bing. You’ll discover what this file is, how to locate it, and how to effectively use it to improve your reporting and even your security. Find our show provides great value and insights for your business? Leave us a five star review on the platform of your choice. Thank you for listening and enjoy the show.

Jesse Dolan : Welcome back to Local SEO Tactics, where we bring you tips and tricks to get found online. I’m your host, Jesse Dolan coming at you here today with another episode all by my lonesome, no partner, no Bob, no Sue here, but that’s just fine. We’ve got a topic that’s probably not super exciting for them to chime in on anyways here. We’re going to be talking today about the robot text file or robots.txt file on your website. What is this thing? Why is it important for you? What do you need to know about it? And what can it do to impact your SEO? Before we get into that, I want to mention our free instant SEO audit tool go onto our website, localseotactics.com, click on the yellow button in the top corner. And that is our tool to provide a free instant SEO audit on your website.

This tool runs page by page. You type in the keyword you want to optimize for, or in this case kind of grade your page against. Type in the keyword, type in the page, the tool is going to run and it’s going to produce a PDF for you. And it’s going to send you an email with that. That’s going to be a great punch list on what do you need to attack to optimize that page for that keyword? And it’s pretty easy to follow guide if you want to implement some changes to improve your SEO. That is a great place to start. And of course, totally free. Use it as many times as you want. Check it out, localseotactics.com.

All right here. So we’re talking about the robot.txt file. So first things first, let’s talk about what is this thing. Okay. Most websites out there are going to have this automatically, especially if you’re using a WordPress website. It’s something that’s probably already created. Now, WordPress will create this virtually meaning, it’s not a page you can just pull open if you log into your WordPress dashboard and edit it like any other page. There is the ability to edit this, which we’re not going to get into in this episode here. This is more of an overview about what is it? Again, how to leverage it? And then things it can do for how to edit the robot.txt file, or how to create one, how to load it onto your site. If you need help in any of those areas to reach out to us. We can you some resources or some guidance on that. And maybe if there’s enough questions there, maybe do another episode later that gets a little bit more into the weeds on those technical details.

So first, what is this? It’s a plain text file, literally like the .txt extension. It’s just a plain file. If you’re on Windows and you just open up the notepad that comes with your computer by default, type something and save it. It’s going to be the same thing, a .txt file. So it’s that type of a file that is part of your website. And it’s added to their root of your website, meaning like for Local SEO Tactics, it’ll be localseotactics.com/robots with an S, plural, /robots.txt. The purpose of this file is to communicate to bots, crawlers, things like that, about what they should and shouldn’t access on your website.

Now this is important to have on your website for multiple reasons, but it is important to know that this is kind of a voluntary thing, right? So you have this text file that’s on your website that crawlers like Googlebot and Bingbot are supposed to read. And we’re going to link into the show page for this episode, some links to Google, Google developer docs about the stuff we’re talking about here today. What is the robot.txt? What are some kind of things you can put in there, how to use it? But I thought it was pretty funny that like right on that page from Google, they make it clear that this is kind of voluntary, right? Like robots, crawlers, things like that can follow this and it can do this thing or it may do that thing. It’s completely voluntary. Right. There’s no protocol that is mandated for these things to follow.

So everything we’re going to talk about here with how it works and what it does is voluntary. Okay. That being said, you should have the expectation that it does work like this. I don’t mean to mislead you or make it sound like that’s a false truth. It’s not. And there’s some very purposeful and intentful things that we’re going to do with it.

So this file helps to control which files have of crawlers and the bots may access on your website. Not all robots cooperate with that. It’s important to know that malicious things, right, email scraping bots, spam bots, things looking to inject malware, things like that, they’re probably going to disregard. If you say, hey, don’t go to this part of my website to the bots, they’re probably going to ignore that. And in some cases they may even use that as places to seek out, right, to try to attack.

So this is a good spot to take a pause and give you a reminder for some best practices, especially if you’re on WordPress or any kind of online content management system, make sure you have a good username and password, and that you’re not sharing that with 17 other people. That wherever possible, everybody has their own unique username and password to log in. And that these passwords are challenging, right? It’s not password one or one, two, three, four, three, two, one. Did I say that right backwards? Hope so. Because the easier it is to crack these things or obtain these things, anybody with malicious intent out there, any of these bots can compromise your website and gain access.

Okay. Disclaimer aside, reminder aside, these friendly bots, if you will, right? Googlebot, Bingbot do abide by this. This is a standard protocol out there. Robot exclusion is the standard. Again, we’ll link to this in the show notes to the, I believe there’s a Wikipedia page, if memory serves correct that talks about what that is. So they do honor this. They do abide by it. And it helps Googlebot, helps Bingbot and your website for the same reasons, which I’ll get to here in just a minute. The other thing I want to mention is there’s the polar opposite, which is a robot inclusion standard.

And that is more akin to what your site map is on your website, right? So robot.txt file kind of acts mainly as a list of here’s what we want you to avoid if you’re crawling our website. And then here’s the areas to allow. Where a site map for the robot inclusion is very explicitly a list of, hey, here’s the pages we want you to explore and we want you to visit. Here’s a dang list and an index, right, a table of contents of these pages. So at its basic core, that’s what the robot.txt file is. Again, lives on your website. It’s at the root of your website. It’s just a basic text file. And it contains some phraseology and some encoding that helps direct the crawlers and the bots on what content to access and discover on your site and what to avoid.

Now, what this is not, robot.txt file should not be relied upon to exclude content on your website. You may have some pages on your website. Maybe it’s your thank you pages, that’s something we look for a lot that you don’t want to be indexed in Google. If you have thank you pages on your website and the thank you page would be like, if you came to your website, you had a call to action, register for this, sign up for that, get a quote, whatever, somebody fills out a contact form. And after that, you push them to a thank you page. Thank you. This has been received. We’ll be in touch shortly or whatever your messages, usually you don’t want that thank you page discoverable within Google, because often people are tracking their analytics. Like I got a visit to my homepage, filled out a contact form. Boom. Went to the thank you page. That can be a goal, right? Or conversion.

If you’re getting people that are visiting your thank you page, it can kind of throw those stats off, right? 100 people hit my home page, opted in and X number hit my thank you page. It kind of needs to make sense. If your thank you page is being found in Google organically, and people are clicking on it and visiting it, that’s just going to get things out of whack. So that’s an example of a type of content you might not want to have indexed and show up in Google search. So when your robot.txt file, you would be telling bots to not crawl this page, not crawl this content area. Don’t include it in your index. Just ignore this part of my website.

But it’s important to note that even though you can do that with your robot.txt file, again, there’s no guarantee. It can still be crawled, although Google to be clear, Google and Bing, things like that, they’re going to abide by this, but that does not mean all robots will abide by it, right? So you shouldn’t rely on it as the method to exclude content. What you’re going to want to do is make sure you’re using like a no follow or no index within the HTML of those pages to call it out specifically, that you do not want this page indexed, right. Or things like that.

And then there is one more caveat there that even though you do that, call it out in your site map to avoid this area, disallow this area. If that page is linked to, from some other property on the web, right? If I am linking to your thank you page for some reason off of my website and Google crawls my website, it will discover your thank you page, right? And now be aware of your thank you page and possibly index your thank you page.

So again, trying to underscore here that even though you may provide guidance in your robot.txt file to not explore or access this page, you shouldn’t operate without a safety net. You shouldn’t trust that that’s the end all be all, that this page will never be discovered. That’s not true. There’s other ways for it to be discovered. Now that being said, you still want to leverage it for that. Because a big part of the SEO benefit in using a robot.txt file is to control what we call the crawl budget. So the Googlebot, which visits your website, that’s what its job is. It exists to scour the internet, to land on a webpage, to discover the links of that webpage, follow those links and just continue. Right. It just crawls, and crawls, and crawls to discover pages and content that can then be added to the Google index to show up in Google search results.

Now your website has a finite number of pages. Googlebot takes time to crawl those pages. Googlebot will not sit on your website forever and discover all of your content. It has a certain amount of time it may visit your website, a certain amount of frequency it may visit your website. So what we want to do with a robot.txt file is give Googlebots and other search bots guidance on what areas to avoid, to thus funnel it and focus it to the areas you want it to include. If you have a certain chunk of your website that you just don’t want in search, if it needs to exist, but you don’t want Googlebot wasting its time exploring that because that’s not important to your search results. You want to make sure Googlebot understands that so it can avoid those areas. And then in turn, focus on the content within your website that you do want it to explore and crawl. So when it comes back to your website, that’s what it’s looking for. That’s what it’s checking. That’s what it’s discovering and finding the changes and things like that.

So one of the key uses of your robot.txt file is to call out the content on your website that you want Googlebot to avoid, to thus manage your crawl budget. It’s a very good use of it. And that helps a lot. The other reason that we leverage it a lot is redundancy in communicating your site map to Google. Your site map, we’ve had other episodes that talked about your site map, how to create it, things like that. But again, like I said, on the front side, best episode, your site map is basically your table of contents or your index for all the pages, and posts, and media, and content on your website.

That is something you want Google to find. You want Google to understand this is my site map. This is my content. Please find it. Please digest it. Please add it to your search index so that it can be presented to clients and end users that are doing Google search, right? That’s something I think we all understand that we need and want. So what we’d like to do as a level of redundancy is add a link to your site map on your robot.txt file. And this basically ensures that Google will find and discover your site map, even though maybe you have it linked somewhere else on your website if you’re using Google search console, this is one of the main things that you want to do is submit your site map within Google search console. And again, we have some previous episodes that talking about that, how to create that site map, how to submit that site map, things like that.

One of the areas of redundancy, again, is to drop a link to that within your robot.txt file. If there’s one area that I promote a lot of redundancy on, it’s having your site map discovered, right? If we’re doing SEO, that means we’re trying to get our content found in the Google search. There’s not another area that you really should be as redundant on. Is making sure Google understands what content you have on your website, right? Because if it’s not able to discover your content, it’s not able to crawl and index your content, it’s not going to show up in search. Right. It’s virtually impossible. So for that reason, we also leverage that robot.txt file to include a link to the site map. We also recommend putting a link to that in the footer of your website, by the way, as another little pro tip.

So those are kind of the main points on why you do need to use it. And I should say too, in addition to blocking content for your crawl budgets, again, you can block that content for visibility to not show up in Google search, to keep bots and other programs away from certain areas of your website. For example, like backend of WordPress, right? Again, if it’s information, thank you pages, PDFs, things like that, stuff you don’t want showing up in search. Another use, we’ve had some clients that will have maybe duplicate content on their website, which is not necessarily a good thing, but maybe it’s needed there for even as a show registration or some kind of event. Right. Where you don’t want those redundant duplicate pages being shown in search. You can block those from accessibility too.

So definitely lots of reasons. It’s a simple thing. That’s a simple file, plain text file, but definitely a powerful thing for your SEO, for your website in general. Take advantage of it and leverage that. How to create that? I should also touch on that real quick. We’re going to link to some documentation from Google on that. Again, if you have a WordPress website it’s probably creating automatically for you. You can just do a quick test yourself right now, pop in your URL slash robots with an S plural .txt. See if it pops up. If it doesn’t, if it says 404 not found or something like that, then you’re going to create one. Again, go to our show page and check out the resources there, or just do a quick Google search. How to create a robot.txt file, right? Depending on if you’re using WordPress or not and some other factors, you may have to take some different routes to do that. As with anything else, If you get stuck on this and you need some professional help, reach out to us. Of course, we can help you take care of this on your website.

So I hope that helps everybody understand what that is. We get asked this question quite a bit from our clients. One of the things we do in our technical SEO audit when we engage with a new client is usually cover, do you have a robot.txt file? Are you linking into your site map? And some other things like that. And more often than not, the question comes up, like, what is this thing you’re even talking about on my website? I don’t have this file. So we wanted to address that here in an episode to make you all aware of it, whether you’re a business owner or a marketer yourself, again, check your website out to see if you’ve got that robot.txt file. See what information is on there, see if it needs any upkeep or modifications and just ensure that you have it.

And again, if nothing else, make sure you’re linking to your site map on that thing. That’s just a great way for that level of redundancy. So hopefully that helps you guys out and empowers you for this area and maybe demystifies it if you’ve heard this terminology throughout before for what the heck is this thing, and why is it important to me, hopefully that helps. Going to reread, excuse me, going to read a review here. If you have not left us a review yet and if you are enjoying the show, enjoying the content, and getting some value out of this, we would love for you to return that favor and throw us some value.

We would love for you to return the favor and throw us some value by leaving us a review, going out to localseotactics.com, scroll down to the bottom, click the button for reviews. And we’ve got, I think there’s Google my business, Facebook, Apple, iTunes, and some other popular portals there. Wherever you want to leave a review, we would really appreciate it. You can just drop in five stars, that’s cool. Even better is if you leave some kind of commentary or testimonial. If you do that, we’re going to read it on the show here. And I’ll give you a quick shout out.

This episode here, we’ve got one from Thea T Voss, hope I’m saying it right Thea. Says, I listened to many marketing podcasts and you guys are the most informative and useful for me and my clients. Great topics and information. Thank you. So glad you’re back with new episodes. This one’s a little bit old. When we had a bit of a hiatus there and yeah, we’d been back for quite some time and actually we even upped our level of content to multiple episodes per week. So Thea, I hope you’re still tuning in and enjoying these. And I appreciate the kind words and the good feedback there. I think that wraps it up for this episode, everybody. Again, hopefully you had some good value. We’d love to hear your review and read it on the show. And until then, we’ll catch you next time. Thank you.

To share your thoughts:

  • Send us a comment or question in the section below.
  • Share this show on Facebook.

To help out the show:

Your ratings and reviews really help and we read each one.

LINKS

MP3 Audio DOWNLOAD THE MP3 AUDIO FILE

Listen to the episode however you like with the audio file.