3 Tips for Crawling Errors

2024-12-13 · en-j3PyPqV-e1s manual
MARTIN SPLITT: Pay attention to the responses your server gave
to Googlebot, especially a high number of 500 responses,
fetch errors, timeouts, DNS problems, and other things.
[MUSIC PLAYING]
You might have been wondering how Google Search interacts
with your website, a process generally
referred to as crawling.
Let's dive into troubleshooting pages
not getting into Google Search from this perspective.
If you watched our How Search Works video series
or read through our documentation on this topic,
you already know that the first stage of getting your pages
into Google Search is crawling.
But if pages aren't getting into search, how can
you troubleshoot it starting at the crawling stage?
Here's my first tip.
It's relatively well known, but still
often forgotten, just because you
can access a page in your browser
doesn't mean that Googlebot can access it.
This can have a bunch of reasons.
Robots.txt might prevent a crawler from accessing a URL,
or there might be a firewall or bot
protection blocking Googlebot.
There might also be networking or routing
issues between Google's data centers
and your web server, and many more.
So opening the URL in the browser isn't quite a good test.
Use the URL inspection tool in Google Search Console instead
or the rich results test to see if Googlebot can access a page.
It shows you the rendered HTML of that page.
When you search for bits of the content in the rendered HTML
and you can find it there, that's fine.
And it's not a crawling problem.
Otherwise, something didn't work out.
Tip number two is to use the Crawl Stats
report, more specifically the response section in that report,
to see how your server responds to crawl requests.
Pay attention to the responses your server gave to Googlebot,
especially a high number of 500 responses,
fetch errors, timeouts, DNS problems, and other things.
These errors will sometimes happen transiently,
so they go away without any need for intervention.
But if they are pretty frequent or they spike up,
you might want to investigate further.
If your site is particularly large, more than millions
of pages or so, errors in the 500 range
might also slow down crawling.
When you spot errors here, like the 500 error or fetch errors,
you can check some sample URLs and see if they still
produce these errors when you fetch them
through the URL inspection tool in a live test.
If Googlebot can now reach these URLs,
there is no need to do anything else.
But if the problem persists, you can use the URL inspection tool
to find out more and dig deeper.
The last step with regards to crawling issues
is an advanced one, and you might need someone
from your hosting company or development department
to help you with this.
But looking at your web server logs is not a basic thing to do,
but it is a powerful way to get a better understanding of what's
happening on your server.
There you can see patterns, the amount and timing
of your requests, and how your web server responded.
Be mindful, though, that not everyone who
claims to be Googlebot actually is Googlebot, so don't
worry about the odd requests.
They might be coming from some third party scrapers who
pretend to be Googlebot.
So to sum it up, check the URL inspection tool
and take a look at the crawl stats report
to find out what's going on with crawling on your website.
Also, don't forget that the logs of your web server
can be super useful to find out how your server
responded to requests, but be aware
that there's many Googlebots who aren't actual Googlebots.
Leave us a comment if you want more technical content on Google
Search Central and what topics we should cover in the future.
Thanks a lot for watching, and see you soon.
[MUSIC PLAYING]
Ho, ho, ho.
This is a core update.
No, no, we're not doing that.
No worries.
[MUSIC PLAYING]