Crawling is the process by which search engine bots or spiders scan your website's web pages to determine their content and relevance.
These crawlers follow links on your website to find new pages to crawl and index.
Why is crawling important?
Crawling is essential because it enables search engines to understand your website's content and relevance. If a web page isn't crawled, it won't appear in search engine results pages, making it difficult for potential customers to find you.
“[Our crawlers] go from page to page and store information about what they find on these pages and other publicly accessible content in Google's Search index.”—Google
How does crawling work?
“Google’s crawlers start at your homepage. They then follow every single link on your homepage, effectively clicking on them. From each of those pages, they open every link again, and so on until they’ve visited every page they can.”—Ben Goodey, SEO Consultant
Google’s algorithm determines how frequently the search engine crawls your website, how many pages it crawls at a time, and how it prioritizes which pages to crawl.
Ensure your page is crawled by:
- Submitting it to Google via Google Search Console
- Adding links from your existing content (ensuring you avoid orphan pages)
Ensure your page is not crawled (and therefore not indexed in Google) by:
- Added a no-index HTML tag to the code of the page
- Update your robot.txt file to disallow the URL
What do most people get wrong about crawling in SEO?
Here are five things you need to know about crawling (that most people don’t know when starting out):
- Crawling is not the same as indexing
Crawling is the process of a search engine sending out bots, also known as spiders or crawlers, to discover and scan web pages. Indexing is the process of adding those pages to the search engine's database. Just because a page has been crawled doesn't mean it has been indexed.
- Crawling doesn't guarantee ranking
Just because a page has been crawled and indexed doesn't mean it will rank well in search engine results. There are many factors that influence ranking, including the content and structure of the page, the authority of the website, and the relevance of the page to the search query.
- Robots.txt can block crawling
If a website owner doesn't want certain pages to be crawled, they can use a robots.txt file to block search engine bots from accessing them. However, it's important to use robots.txt carefully and only block pages that should not be indexed.
- Broken links can hurt crawling
If a website has broken links, it can make it more difficult for search engine bots to crawl all the pages on the site. This can lead to some pages being left out of the index, hurting the site's overall visibility.
- Regular updates can help crawling
Regularly updating a website with fresh content can help search engine bots find and crawl new pages. This can also signal to search engines that the website is active and relevant, which can improve its overall visibility in search results.
Tips for optimizing crawling
“Internal linking is the best way to ensure your pages are regularly crawled by Google. The more internal links a page has, the more you tell Google “this page is important. Assess this page more often’. It goes a long way to improve a page's ranking.”—Ben Goodey, SEO Consultant
Now that you understand what crawling is and why it's important let's look at some tips for optimizing crawling on your website:
- Create a sitemap: A sitemap is a file that lists all the pages on your website, making it easier for search engine crawlers to find and index your pages.
- Optimize your website structure: Ensure that your website structure is well-organized and easy to navigate. This will help search engine crawlers find and crawl all the pages on your website.
- Use internal linking: Linking between your website pages makes it easier for search engine crawlers to discover and index new pages on your website.
- Monitor your crawl rate: Use Google Search Console to monitor your website's crawl rate and identify any crawl errors that may be hindering your website's visibility.
- Use meta tags: Include meta tags like the title tag and meta description to provide search engines with more information about your website's pages.
Related guide: Internal Linking Case Study (How Typeform Ranked #2 For "Form Builder")
FAQs for crawling in SEO
How often do search engine bots crawl my website?
A: It varies depending on several factors, including the size of your website and the frequency of updates. You can monitor your website's crawl rate using Google Search Console.
What are crawl errors?
A: Crawl errors occur when search engine bots encounter problems crawling your website's pages. They can include broken links, server errors, and other issues.
Can I block search engine bots from crawling my website?
A: Yes, you can use a robots.txt file to block search engine bots from crawling specific pages or sections of your website.
How long does it take for search engine bots to crawl my website?
A: It varies depending on the size of your website and the crawl rate set by the search engine. However, it typically takes a few days to a few weeks for search engine bots to crawl all the pages on your website.