What Web Crawlers See
Web Crawlers are also called Automatic Indexers, Bots, Web Spiders or Web Robots.
How does a web crawler work?
Many people seem to be under the impression that a web crawler or web spider crawls along the Internet and establishes itself on the web servers it finds and read the contents of the hard disk to find what it wants. This is what a virus would do - establishing itself on a machine and then executing various actions. Most web servers, ideally all, wouldn't allow such a thing to happen. It breaches all security concepts. So, how does a web crawler work?
Very simple: the executable code never leaves the machine from where it operates. It sends requests to web servers and is served web pages, or other resources, by the web server. If it finds links, it sends requests to the pages to which these links point. That's it. The page served to the web crawler is exactly the same as the one served to your browser. It doesn't matter if the data on the page is static data on the page or dynamic data loaded into the page from a database, the web crawler sees it.
Does the web crawler see what I see in my browser?
So what does the web crawler see?
It sees plain old text, whether delivered from a database or as static contents of a web page. This includes alt tags of images and meta-data elements in the header. To see a web page much as a cralwer would see it, download the Lynx Browser, which is a text only browser and use it to look at your website.
What do I take from this?
You do want your site to feature on search engines. Therefore you do some Search Engine Optimisation. What you take from this is that what you want the search engine to know about on your pages must be included as text on your pages after the page has been served to the web crawler. You don't have to be afraid that text loaded from a database won't be visible to the web crawler.
Let us know if you have anything to add or any remarks. There is much I haven't said about web crawlers, but the idea was to let you know what they see and take into account and what they don't see. The form for sending mail is below.
If you find what you learned on this page useful, please use the social media widgets at the bottom and pin, tweet, plus-one or whatever this page.
Submit a comment
Use and empty line to separate paragraphs in the "Comment" text area.
Links and html markup are not allowed.