3 Tips for Crawling Errors
2024-12-13 ยท en-j3PyPqV-e1s manual
MARTIN SPLITT: Pay attention to the responses your server gave to Googlebot, especially a high number of 500 responses, fetch errors, timeouts, DNS problems, and other things. [MUSIC PLAYING] You might have been wondering how Google Search interacts with your website, a process generally referred to as crawling. Let's dive into troubleshooting pages not getting into Google Search from this perspective. If you watched our How Search Works video series or read through our documentation on this topic, you already know that the first stage of getting your pages into Google Search is crawling. But if pages aren't getting into search, how can you troubleshoot it starting at the crawling stage? Here's my first tip. It's relatively well known, but still often forgotten, just because you can access a page in your browser doesn't mean that Googlebot can access it. This can have a bunch of reasons. Robots.txt might prevent a crawler from accessing a URL, or there might be a firewall or bot protection blocking Googlebot. There might also be networking or routing issues between Google's data centers and your web server, and many more. So opening the URL in the browser isn't quite a good test. Use the URL inspection tool in Google Search Console instead or the rich results test to see if Googlebot can access a page. It shows you the rendered HTML of that page. When you search for bits of the content in the rendered HTML and you can find it there, that's fine. And it's not a crawling problem. Otherwise, something didn't work out. Tip number two is to use the Crawl Stats report, more specifically the response section in that report, to see how your server responds to crawl requests. Pay attention to the responses your server gave to Googlebot, especially a high number of 500 responses, fetch errors, timeouts, DNS problems, and other things. These errors will sometimes happen transiently, so they go away without any need for intervention. But if they are pretty frequent or they spike up, you might want to investigate further. If your site is particularly large, more than millions of pages or so, errors in the 500 range might also slow down crawling. When you spot errors here, like the 500 error or fetch errors, you can check some sample URLs and see if they still produce these errors when you fetch them through the URL inspection tool in a live test. If Googlebot can now reach these URLs, there is no need to do anything else. But if the problem persists, you can use the URL inspection tool to find out more and dig deeper. The last step with regards to crawling issues is an advanced one, and you might need someone from your hosting company or development department to help you with this. But looking at your web server logs is not a basic thing to do, but it is a powerful way to get a better understanding of what's happening on your server. There you can see patterns, the amount and timing of your requests, and how your web server responded. Be mindful, though, that not everyone who claims to be Googlebot actually is Googlebot, so don't worry about the odd requests. They might be coming from some third party scrapers who pretend to be Googlebot. So to sum it up, check the URL inspection tool and take a look at the crawl stats report to find out what's going on with crawling on your website. Also, don't forget that the logs of your web server can be super useful to find out how your server responded to requests, but be aware that there's many Googlebots who aren't actual Googlebots. Leave us a comment if you want more technical content on Google Search Central and what topics we should cover in the future. Thanks a lot for watching, and see you soon. [MUSIC PLAYING] Ho, ho, ho. This is a core update. No, no, we're not doing that. No worries. [MUSIC PLAYING]