What is SEO? · Beginner · 11 min read

Crawling, indexing and ranking explained.

Three gates. Every page on your site has to clear all three before a single visitor lands from Google. Most "we are not ranking" problems are actually crawl or index problems wearing a ranking costume.

Three gates, in order

Most people think SEO is about Google's ranking algorithm. It is not. It is about three sequential decisions Google has to make about your page, and each one is its own gate.

  1. Can I find this page? (Crawling.)
  2. Is this page worth keeping in my library? (Indexing.)
  3. For this query, which kept pages do I show? (Ranking.)

Skip a gate, lose the game. A page that is not crawled cannot be indexed. A page that is not indexed cannot be ranked. A page that is not ranked never gets seen. The order is fixed.

This sounds obvious, but it is also why so much SEO work gets wasted. Agencies write 2,000-word articles for pages Google has quietly dropped from its index three months ago. Owners spend money on backlinks for pages Googlebot has not crawled since 2024. The dollars vanish because the work is happening at gate three when the problem is at gate one.

If you have not yet read the parent pillar, start at what is SEO. For the mechanics of the search engine itself, how search engines work sits next to this guide and covers the technical pipeline in more depth.

Stage 1: Crawling, how Google finds your pages

Crawling is the simplest of the three stages and the easiest to break. Google sends an automated program called Googlebot to your site. It requests a URL, downloads the HTML, scans it for links, then queues up any new URLs it finds. The cycle repeats roughly forever.

How Googlebot discovers a URL

Three discovery channels feed Google's crawl queue. In order of strength:

  1. Internal links from pages it already crawls. If your homepage links to your new blog post, Googlebot will find it next time it visits the homepage.
  2. External backlinks from other sites. If someone else links to your URL, Google's crawler usually follows that link too. This is why a single mention in the local news section often gets a new page indexed within days.
  3. Your sitemap and Search Console submissions. The XML sitemap at /sitemap.xml is your formal request list. The URL Inspection tool's Request Indexing button is a one-off nudge.

Orphan pages, the ones with no internal links and no backlinks, are the silent killer here. Google may know they exist via your sitemap, but it deprioritises them because nothing inside or outside the site is pointing at them. We have walked into Perth sites with 200 product pages and found 60 of them orphaned from the main navigation. None of those 60 ranked, because most of them barely got crawled.

What stops Googlebot mid-crawl

A handful of technical settings can shut Googlebot down before it even reads your page:

  • A blocking robots.txt. A Disallow: / line tells Googlebot to leave. We see this most often on freshly migrated sites where the staging robots.txt got copied across.
  • A server returning 5xx errors. If your hosting times out or returns a 500 status repeatedly, Google slows its crawl and eventually backs off entirely.
  • JavaScript that hides the content. Googlebot does render JavaScript, but it has a budget. If the main content only appears after a five-second client-side fetch, Google may give up before it sees anything worth indexing.
  • Login walls. Anything behind authentication is invisible to Googlebot.
  • Infinite URL spaces. Faceted filters, calendar widgets and search results pages that generate unlimited unique URLs eat your crawl budget on rubbish.

Crawl frequency, and why it varies

Not all sites get crawled equally. A news domain that publishes hourly might get visited every few minutes. A small business homepage might get crawled once a week. A buried PDF on a low-authority site might get crawled twice a year. Crawl frequency is a function of two things: how often Google has historically found new content on the URL, and how much it trusts the rest of the site.

You can see your own crawl rate in Search Console under Settings → Crawl Stats. For most Perth sites we audit, a healthy crawl rate is between a few hundred and a few thousand requests per day. If you suddenly see it halve, something is wrong.

Stage 2: Indexing, the editorial decision Google makes

Once a page is downloaded, Google does not automatically keep it. It runs through a quality pass first. This is the bit people forget: Google is making an editorial decision about your page every single time it visits.

What happens during indexing

  1. Render. Google's renderer (a headless Chrome) executes the JavaScript, builds the final DOM, takes a snapshot. The rendered page is what gets evaluated, not the raw source.
  2. Parse. Natural-language processing reads the headings, the body copy, any schema markup, the image alt text, the internal link anchor text.
  3. Canonicalise. If multiple URLs look similar, Google picks one as the canonical and folds the others into it. You can suggest the canonical with a tag; Google decides whether to honour it.
  4. Quality threshold. Pages that look thin, duplicate, scraped, low-effort or off-topic for the rest of the site get dropped. This is the Helpful Content system in action.
  5. Store. Surviving pages get added to the index along with a list of features Google extracts from them.

If you want a concrete number: industry estimates suggest Google indexes about 60 to 80 percent of the URLs it crawls on a typical website. The remaining 20 to 40 percent get judged not worth keeping. On a poorly-built site that ratio can flip the other way.

Why pages get dropped from the index

The Index Coverage report in Search Console (now called Pages) gives you the exact reasons. The common ones for Australian sites:

  • Discovered, currently not indexed. Google knows the URL exists but has decided not to crawl it yet. Usually a signal that the site looks low-quality or the URL has no inbound links.
  • Crawled, currently not indexed. Google read the page and chose not to keep it. The harshest verdict. Almost always a content quality call.
  • Duplicate without user-selected canonical. Two or more pages look too similar. Google picked one to keep and dropped the rest. Common with parameter URLs, www vs non-www, http vs https mix-ups.
  • Alternate page with proper canonical tag. You told Google another URL was the master, and Google obeyed. Not a problem unless you got the canonical wrong.
  • Soft 404. The page returned a 200 OK status but the content looked empty or like an error page. Often caused by JavaScript rendering failures.
  • Excluded by noindex tag. Self-inflicted. Usually a leftover from staging.

The most painful failure mode is crawled, currently not indexed. There is no error to fix. The page simply did not earn its place. The remedy is to make the page genuinely better: deeper content, stronger internal links, real expertise. Republishing the same thin page does not help.

Stage 3: Ranking, where the SERP gets built

Now you are indexed. Now Google has to decide whether to show your page when someone searches. This is the bit most articles obsess about, and it is also the bit you can do least about until the first two gates are sound.

What happens in the ~300 milliseconds after a query

  1. Query understanding. Google parses the query for entities, intent, location modifiers, freshness signals. "best plumber near me" carries five different intents bundled into four words.
  2. Candidate retrieval. Google pulls a few hundred to a few thousand pages from its index that could plausibly answer the query.
  3. Initial scoring. Each candidate gets scored on relevance, content quality, link authority, freshness, page experience and dozens of other signals.
  4. Machine learning re-rank. Systems like RankBrain, BERT and neural matching look for semantic relevance, not just keyword matches.
  5. Personalisation. Your location, language, search history and device tweak the order.
  6. SERP feature assembly. Local pack, AI Overview, featured snippet, video carousel, ads. Google chooses which features to surface for this specific query.
  7. Render. The final SERP is built and sent to the user.

Why an indexed page might still not rank

Being in the index is necessary but not sufficient. The most common reasons an indexed page sits at position 47 forever:

  • Intent mismatch. Your page targets the keyword but does not answer the intent. A query like "best CRM for tradies" is commercial-comparison intent. A 400-word product description page will not win it. The content depth guide unpacks how to match intent without padding for word count.
  • No internal links. Your homepage and other pages are not pointing at the URL with sensible anchor text. Google has no signal that this is an important page to you.
  • Topical authority gap. Your site is known for one thing, and the page is about something else. A Perth plumber's page about kitchen design will struggle no matter how well written.
  • Slow page experience. Core Web Vitals in the red zone, an LCP above three seconds, layout shifts on load. Tiebreaker signals that decide which of two otherwise-equal pages wins.
  • No backlink authority. The wider web does not recognise the site as a credible voice on the topic.
  • Cannibalisation. You have three pages targeting the same query and Google can't pick one. Keyword cannibalisation dilutes the ranking signals across all three.

A 6-step diagnostic for any underperforming page

This is the exact sequence we run when a Perth client says "this page used to rank, what happened". Work it top to bottom. Stop at the first failure and fix that before moving on.

  1. Is the URL alive? Open the URL in an incognito window. If it 404s, redirects somewhere unexpected, or shows an error page, you have your answer.
  2. Is robots.txt allowing it? Append /robots.txt to your domain and check for any Disallow rule that catches the URL. Free tools like the robots tester in Google Search Console let you confirm.
  3. Is there a noindex tag? View source on the page. Search for noindex. If it is there, that is why it is gone. Remove it, request reindexing.
  4. Is it crawlable? In Search Console, run the URL Inspection tool. Hit "Test live URL". Read the screenshot Google rendered. If the body content is missing or the page looks broken, you have a rendering problem.
  5. Is it indexed? In the same tool, the indexing status appears at the top. "URL is on Google" means yes. Any other status, follow the explanation Google gives.
  6. Is the content competitive? Search the target query in incognito. Compare your page against the top three results for depth, structure, freshness, schema, internal links and brand authority. If yours is thinner than theirs, that is your ranking problem.

Nine out of ten "ranking issues" we see resolve at step 2, 3 or 5. The actual algorithm-level ranking problem is rarer than it feels.

Common mistakes at each stage

Help the pipeline
  • Keep a single canonical sitemap submitted in Search Console and updated on publish.
  • Link every important page from the main navigation or a relevant hub page. No orphans.
  • Server-render the main content where possible. Use JavaScript for enhancements, not for the body copy.
  • Use canonical tags on parameter pages, paginated archives and duplicate-prone URLs.
  • Monitor the Pages report in Search Console weekly and act on new errors fast.
Sabotage the pipeline
  • Push a site live with a staging robots.txt that blocks the whole domain. We see this monthly.
  • Hide blog posts behind a "load more" button that requires a JavaScript click for Googlebot to see anything new.
  • Republish identical product descriptions on 200 SKU pages and wonder why none of them index.
  • Block image and CSS folders in robots.txt. Google needs them to render the page.
  • Rebuild the site every two years without 301 redirects from the old URLs.

Tools to check each stage

One free toolkit covers all three gates. You do not need a paid stack for this.

Stage What to check Tool
Crawl Is Googlebot reaching the URL? Any blocking rules? Search Console URL Inspection + robots.txt tester
Render Does Google see the body content after JavaScript runs? URL Inspection → Test live URL → View screenshot
Index Is the URL stored? Which reason was given for any drops? Search Console → Pages report
Site-wide crawl Where are the orphans, the 4xx, the duplicate titles? Screaming Frog (free to 500 URLs) or Sitebulb
Rank What queries does the page actually rank for, and where? Search Console Performance report

For a one-click overview that pulls all of these into one report on your domain, run our free SEO audit tool.

If your site is bigger than 500 pages and you want a full crawl, indexing and rendering audit done properly, the website audit service is the human version. Same checks, with a real person reading the output.

Perth and WA context

A few patterns specific to Australian and Perth sites:

  • Tradies with multi-location pages. A Perth electrician spinning up 30 suburb pages ("electrician Joondalup", "electrician Fremantle", "electrician Cockburn") almost always hits the index threshold. Google looks at the pages, sees the same 80 percent boilerplate with only the suburb name swapped, and drops most of them as duplicates. The fix is real differentiation: a local job example, a suburb-specific call-out box, a unique heading. Our Joondalup, Fremantle and Cockburn service pages show what enough differentiation looks like for a service-area business.
  • WooCommerce stores. Faceted filters on category pages generate hundreds of crawl-eligible URLs that Google then drops as duplicates. The fix is canonical tags pointing every filter URL back to the main category, or noindexing the filters entirely. The ecommerce SEO guide goes deeper.
  • Mining services landing pages. Lots of niche, low-volume queries. The risk here is the opposite: tiny but useful pages get categorised as "thin content" because they are short. The fix is depth around the entity, not padding for word count. Mining SEO covers it.
  • Government and regulated industries. If you operate in a regulated space (legal, financial planning, medical), JavaScript-heavy compliance widgets often hide your actual content. Render-blocking that compliance disclaimer means Google never sees the page. Legal SEO and healthcare SEO address this in more detail.

For the Perth-specific case on why investing in any of this is worth the effort, see why SEO matters for Australian businesses. For the full technical-SEO breakdown of crawl budget and rendering, the technical SEO pillar is where this trail continues.

Frequently asked

What is the difference between crawling and indexing?
Crawling is Google discovering and downloading your page. Indexing is Google deciding the page is worth keeping, processing it and storing a version in its searchable database. A page can be crawled and still never get indexed if Google judges it as duplicate, thin or low value.
How do I know if my page is indexed?
Open Google Search Console, paste the URL into the URL Inspection tool at the top of the screen, and read the status. If it says "URL is on Google", you are indexed. If it says "URL is not on Google", click "Test live URL" to see what Google sees right now and why it was skipped.
Why is my page indexed but not ranking?
Being indexed is the entry ticket, not the prize. If the page is indexed but ranks beyond page three, the usual culprits are weak content depth, no internal links pointing to the page, slow page speed, a mismatched search intent, or a missing topical authority signal across the rest of the site. Read our what an SEO actually does guide for the fix list.
What is crawl budget and should I worry about it?
Crawl budget is the soft limit Google sets on how many URLs from your site it will fetch in a given window. For sites under about 1,000 URLs it almost never matters. For larger sites, it matters a lot, and the fix is usually killing duplicate URLs, parameter pages and orphan pages so Googlebot spends its budget on the pages that should rank.
Can I force Google to index a page?
You can request indexing inside Search Console using URL Inspection, and you can ping your sitemap. Neither guarantees indexing. If Google still refuses, the page itself has a quality problem. Improve the content, add internal links and try again.
How long does it take Google to crawl a new page?
On an established site with a healthy crawl rate, expect new pages to be crawled within a few days and indexed within one to two weeks. On a new domain it can take four to eight weeks for Google to start crawling regularly. Submitting the URL in Search Console speeds the first visit but does not guarantee indexing.
See how your site stacks up

Get a free SEO audit of your site.

30 seconds. Real Lighthouse scores, real keyword data, real backlink profile, AI-generated quick wins. Free, no sales pitch.

Get a Free SEO Audit

Or call 0435 462 205