What's the difference between crawling and indexing?

Crawling is when a search engine discovers and fetches a page. Indexing is when it decides to store and consider that page for search results. A page can be crawled but not indexed — for example if it's marked noindex or judged too thin to be useful.

Does my site need to work without JavaScript?

The important content should be present in the initial HTML so it doesn't depend on JavaScript running. Search engines can render JavaScript, but it's slower, less reliable and entirely avoidable. Server-rendering your core content removes the risk.

Technical SEO

Crawlability 101: how search engines actually read your site

Before a page can rank, a search engine has to find it, read it and file it. Here's how that actually works — and where it goes wrong.

Technical SEO3 March 20265 min readBy Tony Dhawan

Illustration for Crawlability 101: how search engines actually read your site

Ranking gets all the attention, but it's the last step in a chain. Before a search engine can rank a page, it has to do three quieter things first: crawl it, render it, and index it. When a site won't show up in search, the failure is almost always somewhere in that chain — not in the ranking itself. Understanding it removes most of the mystery from SEO.

How a crawler moves through your site

A search engine crawler is, at heart, a very patient reader that follows links. It arrives at a page, reads the HTML, notes every link it finds, and adds those to a queue of pages to visit next. Then it repeats, endlessly. This has one enormous implication: if there's no path of links to a page, a crawler may never find it. Pages that aren't linked from anywhere — orphan pages — are effectively invisible, no matter how good they are.

Robots, sitemaps and the signals you control

You aren't a passive participant in this. You get to send the crawler clear signals:

robots.txt tells crawlers which parts of the site they may and may not request. It's powerful and easy to misuse — one wrong line can hide an entire site.
An XML sitemap hands the crawler a tidy list of the URLs you care about, so discovery doesn't rely on luck. Every site we build ships one automatically.
Meta robots tags let you say, page by page, "index this" or "leave this out" — which is how you keep thin and duplicate pages out of search.

Common things that block crawlers

When we audit a site that won't index, the culprit is nearly always on this list: a leftover noindex from the staging site, a robots.txt that disallows too much, important pages buried so deep nothing links to them, or content that simply isn't in the HTML. None of these are exotic. They're the boring, preventable mistakes that quietly cost businesses their visibility.

Rendering: the JavaScript trap

Here's the one that catches modern sites. Many websites send the browser a near-empty page and then build the actual content with JavaScript. A person doesn't notice. A crawler very much does. Rendering JavaScript is slow and resource-intensive, and not every crawler does it reliably or immediately.

This is the cardinal rule we build by, and it's worth stating plainly: your important content should be readable with JavaScript switched off. If you disable JS and your page is blank, you've made your content optional, and search engines are within their rights to skip it. We build sites that server-render their content for exactly this reason — it's the foundation of how we approach web development, and it's why our builds and our technical SEO work are really the same discipline.

Internal links are crawl paths

Because crawlers move along links, your internal linking is your crawl structure. A clear, logical set of links between related pages does two jobs at once: it helps people navigate, and it shows search engines which pages matter and how they relate. Pages you link to often, from relevant places, read as important. Pages you link to from nowhere read as unimportant — or never get read at all.

Get crawling, rendering and indexing right and ranking becomes a fair fight you can actually win. Get them wrong and the best content in the world sits in the dark. If you suspect something in that chain is broken, our guide to invisible websites is a good next read — or ask us to take a look.

Related service

Technical SEO

Crawlability, indexable copy, render, schema and Core Web Vitals.

Explore technical seo

Written by

Tony Dhawan · Founder & Director

Tony Dhawan founded The Marketing District in 2016 and stays close to every build. He has spent the best part of a decade designing, developing and ranking fast websites for UK e-commerce and service businesses — and writes here to explain, in plain English, how search actually works.

Keep reading