Technical SEO · Advanced · 10 min read

Crawl budget explained. The metric most Perth sites do not need to worry about.

Crawl budget is the most over-discussed and under-understood metric in SEO. Most sites do not have a problem with it. The ones that do are usually wasting it on URLs nobody wanted indexed. Here is the honest version.

By Oliver Wood · Founder & Managing Director

Reviewed by Harrison Sharrett · 20 May 2026 · Last reviewed 20 May 2026

What crawl budget actually is

Crawl budget is the rate at which Googlebot is willing to fetch URLs from your site within a given period. It is shaped by two underlying numbers Google calculates separately. The first is the crawl capacity limit: how many simultaneous connections your server can handle without slowing down. The second is the crawl demand: how much Google actually wants to crawl, based on the popularity of your URLs and how often the content changes.

The product of those two is your effective crawl budget. If your server is fast and your content updates often, the budget is generous. If your server is slow or your content rarely changes, the budget is leaner. Google does not publish an exact number. You see the approximate shape of it in Search Console's Crawl Stats report.

What crawl budget is not: a ranking factor. Google has been clear about this. Crawling a page more often does not improve its ranking. The link between crawling and ranking is one of opportunity, not influence. A page that does not get crawled cannot get indexed. A page that does not get indexed cannot rank. But beyond the binary "crawled or not", the frequency does not matter.

When crawl budget actually matters

For most Australian businesses, the honest answer is: it does not. Google itself has said publicly that sites with fewer than about 10,000 URLs almost never have a crawl-budget bottleneck. Below that threshold, Google can crawl your entire site every few days without trying.

It starts mattering above 10,000 URLs, and the threshold for "this is your dominant SEO problem" is somewhere around 100,000 URLs. The kinds of sites where it bites:

E-commerce with faceted navigation. Every filter combination (size x colour x brand x price) generates a URL. A 500-product catalogue can balloon to 200,000 unique parameter URLs if nothing controls the filter combinations.
Large publisher archives with paginated category pages, tag pages, author pages, date archives and search-results pages all indexable.
Job boards, real-estate listings and marketplace sites where each listing has its own URL and there are tens of thousands of listings.
Mining-services catalogues and industrial-supply sites with thousands of SKUs and variants.
WordPress sites where Yoast or Rank Math is generating indexable URLs for every tag, every category, every author and every attachment.

If you are a Perth tradie with 30 pages, or a Mandurah accountant with 80, or a Fremantle cafe with a 12-page site, crawl budget is not your problem. The chapter you should read first is the technical audit, and then probably internal linking. Crawl budget is the wrong battle for you.

How Google sets your budget

The crawl capacity limit moves up and down based on how your server responds. Google's own documentation puts it this way: it crawls more when your server is fast and healthy, and backs off when latency rises or 5xx errors appear. The mechanism is deliberately conservative because Google does not want to take your site down with its own crawler.

The practical implications. If your server typically responds in 200ms with 200 status codes, Googlebot will crawl aggressively. If response times creep up to 2 seconds or you start returning a lot of 5xx errors during peak traffic, Googlebot will back off and your effective crawl budget drops. We have seen this happen to Australian e-commerce sites during Black Friday: traffic spikes, response time degrades, Googlebot slows down, and the post-sale catalogue updates take an extra week to get re-crawled.

Crawl demand is the other side. Google estimates how often your URLs change and how popular they are. Frequently updated and frequently linked URLs get crawled often. Stale, never-linked URLs might get visited once a month or once a quarter. This is why orphan pages (pages with no internal links pointing in) often sit uncrawled for months: Google's demand model has marked them as low priority.

Two side effects of how this works that matter for fixes. First, speeding up your site genuinely does increase crawl budget. The same Googlebot resource budget gets you more URLs per minute. Second, building strong internal links to your highest-value pages raises their crawl demand, which means Google re-crawls them faster after edits. Both are levers worth pulling.

Four ways to stop wasting crawl budget

If you genuinely do have a crawl-budget problem, the fix is almost never to ask Google to crawl more. Google decides that. The fix is to stop wasting what Google already gives you. Four levers:

1. Block the URLs you do not want crawled

Search filters, parameter URLs, internal search results, "sort by price" variants, calendar archives, paginated comment threads. Each of these is a URL Googlebot will visit if it can find it. Block them in robots.txt if you definitely do not want them crawled. Use parameter handling in Search Console (the Old Search Console URL parameter tool was retired, but URL parameter rules now sit inside the broader crawl ruleset) to tell Google certain parameters are irrelevant.

Beware: robots.txt blocks crawl, not indexing. If a parameter URL has external links, it can still appear in the index as a "URL only" result. To remove from the index, use a meta robots noindex tag, which means Googlebot has to be allowed to crawl the URL to see the tag. The two tools serve different purposes.

2. Fix the redirects

Every internal link that points to a redirect costs you a crawl. Googlebot has to fetch the redirect, follow it, then fetch the destination. Two fetches for one page. If your site has internal links to 4,000 redirected URLs, that is 4,000 extra crawls per cycle going nowhere. Update the internal links to point directly at the destination. Full guide in 301 vs 302 redirects.

3. Clean the sitemap

Your XML sitemap is the list of URLs you are explicitly asking Google to crawl. If it includes thank-you pages, draft pages, redirected URLs and parameter URLs, every one of those is a wasted crawl request when Google honours your list. The rule: only canonical, indexable, 200-OK URLs in the sitemap. Nothing else.

4. Speed up the server

Faster responses raise the crawl capacity limit. Lazy infrastructure, slow database queries, render-blocking resources on the server side, ungenerous caching: all of these slow the average response time and lower your budget. The work overlaps with Core Web Vitals and is covered in our speed optimisation service when it gets serious.

Common mistakes

What to do

Check Crawl Stats in Search Console quarterly. Note any sudden drops in pages crawled per day.
Only put URLs you want indexed into the XML sitemap.
Block faceted navigation in robots.txt if you have an e-commerce site with thousands of filter combinations.
Update internal links to skip the redirect hop.
Use log file analysis on large sites to see exactly which URLs Googlebot is wasting time on.

What to drop

Worrying about crawl budget on a 50-page site. It is not your problem.
Asking Google to "crawl more". That is not how the lever works.
Blocking URLs in robots.txt and assuming they will fall out of the index. They might not.
Indexing internal search-results pages. Almost never the right call.
Letting WordPress generate indexable URLs for every tag, every attachment and every paginated archive. Audit your SEO plugin defaults.

Tools and checklists

Search Console Crawl Stats. Free. Settings menu inside Search Console. Shows total crawl requests per day, average response time, and host status. The first place to look.
Screaming Frog. Free up to 500 URLs. Lets you see how many indexable URLs exist on the site. If the crawl count is wildly larger than the sitemap, you have a parameter or duplication problem.
Server logs. The honest record of every Googlebot fetch. The chapter on log file analysis for SEO walks through how to read them.
Our free SEO audit tool. Pulls Lighthouse and indexability data for any URL. A useful first-pass to see whether your site has the broader hygiene issues that lead to crawl waste. Run a free audit.

Perth and WA context

The biggest crawl-budget problems we see in WA sit in three places.

First, Perth-based e-commerce running on Shopify, BigCommerce or WooCommerce with faceted navigation enabled by default. A boutique homewares store with 400 products can easily generate 30,000 parameter URLs through colour, size, price and "sort by" combinations. None of those parameter URLs deserve to be indexed. All of them get crawled until someone tells Googlebot to stop.

Second, WA mining-services catalogues and industrial-supply sites running on legacy CMS platforms. SKUs in the thousands, variant URLs for every product line, and a sitemap that lists everything. The mining SEO industry page covers the pattern. The Karratha and Port Hedland client work we have done usually starts with a brutal sitemap cleanup.

Third, large real-estate and property sites serving Perth metro. Every listing is a URL, every suburb is a URL, every price band is a URL, every property type is a URL. The combinations multiply and the canonical strategy decides whether Google sees a coherent catalogue or a soup of duplicates.

One pattern crosses all three. The site was set up by a developer or platform that ships with crawl-friendly defaults that are not friendly to your specific catalogue. Every audit finds the same handful of robots.txt and canonical fixes. None of them take more than a day to ship. All of them lift indexation of the URLs that actually matter.

Back to the Technical SEO pillar for the 12-chapter index.
Robots.txt and meta robots. The primary tool for blocking crawl waste.
XML sitemaps explained. The other side of the crawl-direction coin.
Canonical tags explained. How to handle parameter URLs without blocking them.
Log file analysis for SEO. The advanced diagnostic that shows exactly where the waste is.
301 vs 302 redirects. Redirect hygiene as a crawl-budget lever.
Crawling, indexing and ranking. The conceptual prequel.
Internal linking strategy. Internal links shape crawl demand.

Frequently asked

What is crawl budget in simple terms?

Crawl budget is the number of URLs Googlebot is willing to crawl on your site within a given time. It is set by Google based on how fast your server responds and how often your content changes. For most small business sites it is effectively unlimited. For large sites with thousands of URLs it can be the bottleneck that decides which pages rank.

Does my site have a crawl-budget problem?

Probably not, unless you have more than 10,000 URLs. The rough threshold Google itself flags: under 10,000 URLs and a healthy site speed, crawl budget is not your bottleneck. Above that, especially with faceted navigation or parameter URLs, it can be. Check Search Console's Crawl Stats and compare the URLs crawled to the URLs you actually want indexed.

How do I increase my crawl budget?

You do not really increase it directly. You stop wasting it. Block low-value URLs from being crawled, remove duplicate paths, fix internal links that point to redirected URLs, and speed up your server so each crawl request finishes faster. Google rewards faster sites with more crawling, not less.

Will robots.txt stop crawl-budget waste?

Yes, for crawl. Robots.txt prevents Googlebot from fetching the blocked URLs at all. But it does not stop the URLs from being indexed if other sites link to them. For full removal from the index, use a meta robots noindex tag and let Google crawl the URL to see it (which means you cannot also block it in robots.txt).

How can I see what Googlebot is actually crawling?

Search Console's Crawl Stats report gives you the high-level numbers. For the real detail, you need server logs filtered to Googlebot user agent. The log file analysis chapter covers how to extract and interpret this data.

See how your site stacks up

Get a free SEO audit of your site.

30 seconds. Real Lighthouse scores, real keyword data, real backlink profile, AI-generated quick wins. Free, no sales pitch.

Get a Free SEO Audit

Or call 0483 910 555

What crawl budget actually is

When crawl budget actually matters

How Google sets your budget

Four ways to stop wasting crawl budget

1. Block the URLs you do not want crawled

2. Fix the redirects

3. Clean the sitemap

4. Speed up the server

Common mistakes

Tools and checklists

Perth and WA context

Related guides

Frequently asked

Keep reading

Robots.txt and meta robots

XML sitemaps explained

Canonical tags

Log file analysis

The technical SEO audit

Crawling, indexing and ranking

Get a free SEO audit of your site.