Factors of Crawl Allocation


by Andrew Hallinan - Date: 2010-06-07 - Word Count: 534 Share This!

Google, Yahoo!, and Bing all have heavy duty ways to crawl each and every website on the internet. This is a huge job and it takes up tons of resources - which is why they don't ever want to make sure that they don't "over" crawl any one website. They simply don't want to over burden their already resource intensive crawls. For that reason, most search engines only spend a limited amount of time crawling any one website. Here are some factors that can influence your crawl allocation:

1. Server response times
The search engines, with Google leading the pack, are trying now more than ever to increase the speed of the internet as a whole. If your server is slower than your competition's, and your website responds to requests too slowly, the search engine spiders may slow their crawls of your website down to make sure that they are not overloading the server.

2. Page load times
It's simple, really - the faster your individual pages load, the more pages of your website the spiders can crawl! If you have a 100,000 page website and the crawler takes a second per page, that's way too long. You can actually monitor your own page load times in your Google Webmaster Tools accounts.

3. Your content
You content MUST be unique. Autoblogs, automatic RSS feeds, and other forms of using dynamically distributed content are great - but if they are the only way your website gets traffic, you'll never dominate the search engines. You must have unique content that is relevant for the searcher and search phrase. If there is too much duplicate content, or you have too many pages with thin content, the search engines won't be crawling your website too often.

4. URLs, redirects, and missing pages
For whatever reason, there can be issues with the crawler crawling your website. It can get stuck in a redirect loop or have any other number of problems with crawling your website. You can view your crawl report and diagnose/troubleshoot problems from your Google Webmaster Tools account. Chances are pretty good that if Google has had problems crawling your website, Yahoo! and Bing will have problems crawling as well.

5. Server efficiency
You can lessen your server's resources that the spiders are allowed to use by creating compressed files and if-modified-since methods on the server. This is a great way to reduce your bandwidth. This isn't a problem for small websites, but when it comes to a website with 100,000 pages or unique products, the bandwidth can be very costly. If you use the if-modified-since portion of your server, it will return a 304 (not modified) response to the bot when it's requesting a web page that has not been modified since the last time it's contents were indexed. You can find out much more about this by visiting http://janeandrobot.com/library/managing-robots-access-to-your-website

6. Bot efficiency
You can adjust the crawl times of both Bing and Yahoo!'s bots by using a crawl-delay setting in your robots.txt file. If either of these seem too slow, see if an entry in this file exists. Another good way of being able to tell if the other bots are indexing too slowly is by checking on Google's own crawl speed - it may be a good indicator as well.


Andrew Hallinan is the owner of Tampa Search Engine Optimization company, and is Tampa Bay's leading Search Marketing Specialist. Andrew Hallinan has more free tips and advice at his blog.n
n Your Article Search Directory : Find in Articles

© The article above is copyrighted by it's author. You're allowed to distribute this work according to the Creative Commons Attribution-NoDerivs license.
 

Recent articles in this category:



Most viewed articles in this category: