One of the best things about using Google Search Console is the valuable SEO feedback it can provide to those who know how to use it.
Google Search Console can tell you which pages on your website are working well, pages that Google ignores, as well as pages that can be improved—even going as far as telling you what errors need to be fixed, and how to fix them.
A new Google Search Console feature highlights this functionality in-depth: the Index Coverage report. The Index Coverage report shows a list of all the pages that Google has crawled and indexed, as well as the issues and errors that were encountered along the way.
Let’s dive deeper into what your Google index statuses mean:
How to Make Sense of the Google Index Status of Your Pages
When you navigate to the Index Coverage Report, you’ll notice a table in the top section of the page. This is called the top-level report, and it shows the Google index status of all pages that Google has attempted to crawl on your website, grouped by status and reason.
The Google index status of each page will be returned as one of the four following options:
- Valid: the page was indexed.
- Error: the page was not indexed.
- Warning (or Valid with Warnings): the page was indexed, but had an issue.
- Excluded: the page was excluded from the Google index for reasons outside of your control (perhaps it’s in the intermediate stage of indexing—discovered but not indexed or crawled but not indexed errors).
On the Index Coverage Report, the Google index status of each page is displayed with an accompanying reason, for easy troubleshooting.
If you notice a warning from an issue you’ve already fixed, check the crawl date. Google may not have recrawled your site yet. You can click on the ‘start fixing’ button if it is displayed as an option. Otherwise? You’ll just have to wait.
The 4 Major Reasons for a Google Index Status Message
The following represent possible reasons that you’ve been notified of a status message:
Server Error (5XX)
This means that your server returned a 500-level error when the page was requested. A 500-level error signifies that something is wrong with the server, preventing Google from access to fulfill the request.
How to fix it: Access the page to see if it loads. If it does, the issue may have resolved itself. But, you’ll still want to confirm uptime with your server, IT team, or hosting provider to see if there were any issues over the past few days, or if there is a configuration that prevented search engine crawlers from accessing the website.
A redirect error can be caused by any of these things:
- A redirect chain that was too long
- Redirect loop
- Redirect URL that exceeded the maximum URL length
- Bad or empty URL in the redirect chain
How to fix it: Fix your redirect loops. Google has a lot of pages to crawl and if it encounters one with a lot of redirects, it won't waste its time trying to make sense of it.
Submitted URL Blocked by Robots.txt
This happens when you submit your page for indexing but it is blocked by the robots.txt file.
The robots.txt file is a text file that is meant to instruct web crawlers as to which pages should be scanned or excluded. The problem arises when a specific line of code that tells Google not to crawl your page is included in a file with instructions on how to crawl your site.
How to fix it: Remove the line of code that prevents the website from being indexed. You can use Google Search Console’s robots.txt tester tool to check. You can also check your sitemap’s XML file to see if the URL in question is there. If you can confirm so, remove it.
Submitted URL Marked "noindex"
This happens when you submit your page for indexing but indexing is blocked by a "noindex" directive from either an HTTP response or meta tag.
How to fix it: Getting a noindex tag when you submit your URL for indexing is like a mixed signal: you’re telling Google you want to be indexed, then drive the web crawlers away. To fix this, you must remove the HTTP tag.
One suggestion is to check the page’s source code for "noindex". If you see it, you can remove it through your site’s CMS or change the page source code directly. You can also access the HTTP header response with an X-Robots tag.
Submitted URL That Seems to Be a Soft 404
This happens when you submit a page for indexing but it encounters a soft 404 error.
A soft 404 error is an error that tells the user that the page returned a successful launch code (200), but the response does not contain the expected content. This is bad because returning a success code (200) tells search engines that there’s a real page at that URL when, in fact, there’s none.
Google sees this as a waste of their resources. Google then takes note of the difference between actual 404 pages and soft 404 pages.
How to fix it: Configure your pages to return a 401 (Gone) or 404 (not found) response code if they are no longer available. You can also opt to create a custom 404 page. Technically, your server can return a standard 404 page but it’s not ideal. A custom 404 page is preferable because you can use it to include links to your most popular content and pages.
You might also choose to return a 301 (permanent redirect) when your page has moved. Use the Fetch as Google tool to determine if your page is returning the correct Google index status code.
Submitted URL Returns Unauthorized Request (401)
This usually happens when you submit a page to Google that they are not authorized to crawl, such as password protected pages.
How to fix it: To get rid of the error message, you must remove authorization requirements for the page, or allow Googlebot to access the page by verifying its identity.
Submitted URL Not Found (404)
This often occurs when you remove a webpage from your site but do not update your sitemap. To site crawlers, the URL does not exist for indexing.
How to fix it: Regular maintenance of your sitemap.
Submitted URL Has Crawl Issue
When you submit your URL but it has issues that aren't covered in the rest of this list.
How to fix it: Use the Fetch as Google tool to check for discrepancies between what you and Google see on the page.
The difference between warnings and errors is the severity of the case (errors require more attention). Pages that have warnings may or may not be indexed, but to be safe, it’s ideal to resolve the warning so that your page will be indexed.
Indexed, but Blocked by Robots.txt
As stated in the warning, the page was indexed despite being blocked by robots.txt (which can be overridden if they’re linked to from other pages). A warning was issued because, although Google respects robots.txt, it still wants to make sure that you intended to block the page from the search results.
How to fix it: If you really want to block the page from search results, robots.txt is not the way to go; instead, use “noindex” or prohibit anonymous access to the page using auth.
These refer to pages that were found but not indexed. These can be divided into two major categories: pages that you’ve explicitly told Google to exclude, and pages that you wanted to be indexed but Google didn't find valuable enough.
Could be blocked by either a “noindex” tag, page removal tool, robots.txt, or due to an unauthorized request (401). If you want Google to change the index status of the page, follow some of the tips under the “Errors” section.
- Crawl anomaly: An unspecified anomaly occurred when Google tried to fetch the URL. It could be caused by your server; the page may be part of a redirect chain; it could be a page that redirects to a page that returns a 404 error; or a page that no longer exists and is returning a 404. Use Fetch as Google to see if it encounters any fetch issues.
- Crawled but not indexed: Refers to a page that was crawled but not indexed. The page may or may not be indexed in the future. There’s no need to resubmit the page for indexing—Google will get to it.
- Discovered, currently not indexed: The page was found by Google but it has not been crawled yet (the website could have been overloaded—Googlebot wasn’t able to access the page in time and had to move on—or Google may not have found the page important enough to crawl). To fix this, ensure that it receives enough traffic from other pages on your website. External links to that page also help.
- Queued for crawling: Check back in a few days.
Page Removed Because of Legal Complaint
A soft 404, where the URL is a redirect and was not indexed.
Duplicates and Canonicals
These refer to pages that Google recognizes as duplicate but may or may not have canonical tags. For more in-depth analysis of these kinds of pages, check Google’s official documentation.
Google also puts out index status messages for valid URLs. You’d think valid URLs would be enough but Google offers some suggestions to make them better.
- Indexed, but not submitted in sitemap: The URL was discovered by Google and indexed. However, it did not appear on the sitemap. It is suggested that the sitemap is updated so it can be included.
- Indexed, consider marking as canonical: The URL was indexed, but because it has duplicate URLs, it is recommended that it be marked as canonical.
- Indexed and submitted: The ideal situation.
Final Thoughts: A Guide to Common Google Index Status Response Codes & DIY Fixes
One of the best things about Google Search Console is that it gives feedback about your URLs when it crawls your site. Thanks to the new Index Coverage report, you can see Google index status issues at-a-glance—or confirmation that you’re on the right track!
How do you use Google Search Console to help with SEO tasks? Tweet at @PathfinderSEO and we’ll share your best insights!
Overwhelmed by SEO? Try a
Maddy Osman creates engaging content with SEO best practices for marketing thought leaders and agencies that have their hands full with clients and projects. Learn more about her process and experience on her website, www.The-Blogsmith.com and read her latest articles on Twitter: @MaddyOsman.