What crawl budget actually is
Crawl budget is the rate at which Googlebot is willing to fetch URLs from your site within a given period. It is shaped by two underlying numbers Google calculates separately. The first is the crawl capacity limit: how many simultaneous connections your server can handle without slowing down. The second is the crawl demand: how much Google actually wants to crawl, based on the popularity of your URLs and how often the content changes.
The product of those two is your effective crawl budget. If your server is fast and your content updates often, the budget is generous. If your server is slow or your content rarely changes, the budget is leaner. Google does not publish an exact number. You see the approximate shape of it in Search Console's Crawl Stats report.
What crawl budget is not: a ranking factor. Google has been clear about this. Crawling a page more often does not improve its ranking. The link between crawling and ranking is one of opportunity, not influence. A page that does not get crawled cannot get indexed. A page that does not get indexed cannot rank. But beyond the binary "crawled or not", the frequency does not matter.
When crawl budget actually matters
For most Australian businesses, the honest answer is: it does not. Google itself has said publicly that sites with fewer than about 10,000 URLs almost never have a crawl-budget bottleneck. Below that threshold, Google can crawl your entire site every few days without trying.
It starts mattering above 10,000 URLs, and the threshold for "this is your dominant SEO problem" is somewhere around 100,000 URLs. The kinds of sites where it bites:
- E-commerce with faceted navigation. Every filter combination (size x colour x brand x price) generates a URL. A 500-product catalogue can balloon to 200,000 unique parameter URLs if nothing controls the filter combinations.
- Large publisher archives with paginated category pages, tag pages, author pages, date archives and search-results pages all indexable.
- Job boards, real-estate listings and marketplace sites where each listing has its own URL and there are tens of thousands of listings.
- Mining-services catalogues and industrial-supply sites with thousands of SKUs and variants.
- WordPress sites where Yoast or Rank Math is generating indexable URLs for every tag, every category, every author and every attachment.
If you are a Perth tradie with 30 pages, or a Mandurah accountant with 80, or a Fremantle cafe with a 12-page site, crawl budget is not your problem. The chapter you should read first is the technical audit, and then probably internal linking. Crawl budget is the wrong battle for you.
How Google sets your budget
The crawl capacity limit moves up and down based on how your server responds. Google's own documentation puts it this way: it crawls more when your server is fast and healthy, and backs off when latency rises or 5xx errors appear. The mechanism is deliberately conservative because Google does not want to take your site down with its own crawler.
The practical implications. If your server typically responds in 200ms with 200 status codes, Googlebot will crawl aggressively. If response times creep up to 2 seconds or you start returning a lot of 5xx errors during peak traffic, Googlebot will back off and your effective crawl budget drops. We have seen this happen to Australian e-commerce sites during Black Friday: traffic spikes, response time degrades, Googlebot slows down, and the post-sale catalogue updates take an extra week to get re-crawled.
Crawl demand is the other side. Google estimates how often your URLs change and how popular they are. Frequently updated and frequently linked URLs get crawled often. Stale, never-linked URLs might get visited once a month or once a quarter. This is why orphan pages (pages with no internal links pointing in) often sit uncrawled for months: Google's demand model has marked them as low priority.
Two side effects of how this works that matter for fixes. First, speeding up your site genuinely does increase crawl budget. The same Googlebot resource budget gets you more URLs per minute. Second, building strong internal links to your highest-value pages raises their crawl demand, which means Google re-crawls them faster after edits. Both are levers worth pulling.
Four ways to stop wasting crawl budget
If you genuinely do have a crawl-budget problem, the fix is almost never to ask Google to crawl more. Google decides that. The fix is to stop wasting what Google already gives you. Four levers:
1. Block the URLs you do not want crawled
Search filters, parameter URLs, internal search results, "sort by price" variants, calendar archives, paginated comment threads. Each of these is a URL Googlebot will visit if it can find it. Block them in robots.txt if you definitely do not want them crawled. Use parameter handling in Search Console (the Old Search Console URL parameter tool was retired, but URL parameter rules now sit inside the broader crawl ruleset) to tell Google certain parameters are irrelevant.
Beware: robots.txt blocks crawl, not indexing. If a parameter URL has external links, it can still appear in the index as a "URL only" result. To remove from the index, use a meta robots noindex tag, which means Googlebot has to be allowed to crawl the URL to see the tag. The two tools serve different purposes.
2. Fix the redirects
Every internal link that points to a redirect costs you a crawl. Googlebot has to fetch the redirect, follow it, then fetch the destination. Two fetches for one page. If your site has internal links to 4,000 redirected URLs, that is 4,000 extra crawls per cycle going nowhere. Update the internal links to point directly at the destination. Full guide in 301 vs 302 redirects.
3. Clean the sitemap
Your XML sitemap is the list of URLs you are explicitly asking Google to crawl. If it includes thank-you pages, draft pages, redirected URLs and parameter URLs, every one of those is a wasted crawl request when Google honours your list. The rule: only canonical, indexable, 200-OK URLs in the sitemap. Nothing else.
4. Speed up the server
Faster responses raise the crawl capacity limit. Lazy infrastructure, slow database queries, render-blocking resources on the server side, ungenerous caching: all of these slow the average response time and lower your budget. The work overlaps with Core Web Vitals and is covered in our speed optimisation service when it gets serious.
Common mistakes
- Check Crawl Stats in Search Console quarterly. Note any sudden drops in pages crawled per day.
- Only put URLs you want indexed into the XML sitemap.
- Block faceted navigation in robots.txt if you have an e-commerce site with thousands of filter combinations.
- Update internal links to skip the redirect hop.
- Use log file analysis on large sites to see exactly which URLs Googlebot is wasting time on.
- Worrying about crawl budget on a 50-page site. It is not your problem.
- Asking Google to "crawl more". That is not how the lever works.
- Blocking URLs in robots.txt and assuming they will fall out of the index. They might not.
- Indexing internal search-results pages. Almost never the right call.
- Letting WordPress generate indexable URLs for every tag, every attachment and every paginated archive. Audit your SEO plugin defaults.
Tools and checklists
- Search Console Crawl Stats. Free. Settings menu inside Search Console. Shows total crawl requests per day, average response time, and host status. The first place to look.
- Screaming Frog. Free up to 500 URLs. Lets you see how many indexable URLs exist on the site. If the crawl count is wildly larger than the sitemap, you have a parameter or duplication problem.
- Server logs. The honest record of every Googlebot fetch. The chapter on log file analysis for SEO walks through how to read them.
- Our free SEO audit tool. Pulls Lighthouse and indexability data for any URL. A useful first-pass to see whether your site has the broader hygiene issues that lead to crawl waste. Run a free audit.
Perth and WA context
The biggest crawl-budget problems we see in WA sit in three places.
First, Perth-based e-commerce running on Shopify, BigCommerce or WooCommerce with faceted navigation enabled by default. A boutique homewares store with 400 products can easily generate 30,000 parameter URLs through colour, size, price and "sort by" combinations. None of those parameter URLs deserve to be indexed. All of them get crawled until someone tells Googlebot to stop.
Second, WA mining-services catalogues and industrial-supply sites running on legacy CMS platforms. SKUs in the thousands, variant URLs for every product line, and a sitemap that lists everything. The mining SEO industry page covers the pattern. The Karratha and Port Hedland client work we have done usually starts with a brutal sitemap cleanup.
Third, large real-estate and property sites serving Perth metro. Every listing is a URL, every suburb is a URL, every price band is a URL, every property type is a URL. The combinations multiply and the canonical strategy decides whether Google sees a coherent catalogue or a soup of duplicates.
One pattern crosses all three. The site was set up by a developer or platform that ships with crawl-friendly defaults that are not friendly to your specific catalogue. Every audit finds the same handful of robots.txt and canonical fixes. None of them take more than a day to ship. All of them lift indexation of the URLs that actually matter.
Related guides
- Back to the Technical SEO pillar for the 12-chapter index.
- Robots.txt and meta robots. The primary tool for blocking crawl waste.
- XML sitemaps explained. The other side of the crawl-direction coin.
- Canonical tags explained. How to handle parameter URLs without blocking them.
- Log file analysis for SEO. The advanced diagnostic that shows exactly where the waste is.
- 301 vs 302 redirects. Redirect hygiene as a crawl-budget lever.
- Crawling, indexing and ranking. The conceptual prequel.
- Internal linking strategy. Internal links shape crawl demand.