Online Business - How Search Engines Work
- Date: 2007-04-26 - Word Count: 688
Share This!
Internet search engines are special sites on the Web that are designed to help people find information stored on other sites. There are differences in the ways various search engines work, but they all perform three basic tasks:
- They search the Internet -- or select pieces of the Internet - based on important words.
- They keep an index of the words they find, and where they find them.
- They allow users to look for words or combinations of words found in that index.
Early search engines held an index of a few hundred thousand pages and documents, and received maybe one or two thousand inquiries each day. Today, a top search engine will index hundreds of millions of pages, and respond to tens of millions of queries per day.
Spidering
Before a search engine can tell you where a file or document is, it must be found. To find information on the hundreds of millions of Web pages that exist, a search engine employs special software robots, called spiders, to build lists of the words found on Web sites.
When a spider is building its lists, the process is called Web crawling.
In order to build and maintain a useful list of words, a search engine's spiders have to look at a lot of pages. How does any spider start its travels over the Web? The usual starting points are lists of heavily used servers and very popular pages. The spider will begin with a popular site, indexing the words on its pages and following every link found within the site. In this way, the spidering system quickly begins to travel, spreading out across the most widely used portions of the Web.
Indexing
Once the spiders have completed the task of finding information on Web pages, the search engine must store the information in a way that makes it useful. There are two key components involved in making the gathered data accessible to users:
-The information stored with the data
-The method by which the information is indexed
In the simplest case, a search engine could just store the word and the URL where it was found. In reality, this would make for an engine of limited use, since there would be no way of telling whether the word was used in an important or a trivial way on the page, whether the word was used once or many times or whether the page contained links to other pages containing the word. In other words, there would be no way of building the ranking list that tries to present the most useful pages at the top of the list of search results.
To make for more useful results, most search engines store more than just the word and URL. An engine might store the number of times that the word appears on a page. The engine might assign a weight to each entry, with increasing values assigned to words as they appear near the top of the document, in sub-headings, in links, in the meta tags or in the title of the page. Each commercial search engine has a different formula for assigning weight to the words in its index. This is one of the reasons that a search for the same word on different search engines will produce different lists, with the pages presented in different orders.
An index has a single purpose: It allows information to be found as quickly as possible. There are quite a few ways for an index to be built, but one of the most effective ways is to build a hash table. In hashing, a formula is applied to attach a numerical value to each word. The formula is designed to evenly distribute the entries across a predetermined number of divisions. This numerical distribution is different from the distribution of words across the alphabet, and that is the key to a hash table's effectiveness.
The Search Engine Program
The search engine software or program is the final part. When a person requests a search on a keyword or phrase, the search engine software searches the index for relevant information. The software then provides a report back to the searcher with the most relevant web pages listed first.
- They search the Internet -- or select pieces of the Internet - based on important words.
- They keep an index of the words they find, and where they find them.
- They allow users to look for words or combinations of words found in that index.
Early search engines held an index of a few hundred thousand pages and documents, and received maybe one or two thousand inquiries each day. Today, a top search engine will index hundreds of millions of pages, and respond to tens of millions of queries per day.
Spidering
Before a search engine can tell you where a file or document is, it must be found. To find information on the hundreds of millions of Web pages that exist, a search engine employs special software robots, called spiders, to build lists of the words found on Web sites.
When a spider is building its lists, the process is called Web crawling.
In order to build and maintain a useful list of words, a search engine's spiders have to look at a lot of pages. How does any spider start its travels over the Web? The usual starting points are lists of heavily used servers and very popular pages. The spider will begin with a popular site, indexing the words on its pages and following every link found within the site. In this way, the spidering system quickly begins to travel, spreading out across the most widely used portions of the Web.
Indexing
Once the spiders have completed the task of finding information on Web pages, the search engine must store the information in a way that makes it useful. There are two key components involved in making the gathered data accessible to users:
-The information stored with the data
-The method by which the information is indexed
In the simplest case, a search engine could just store the word and the URL where it was found. In reality, this would make for an engine of limited use, since there would be no way of telling whether the word was used in an important or a trivial way on the page, whether the word was used once or many times or whether the page contained links to other pages containing the word. In other words, there would be no way of building the ranking list that tries to present the most useful pages at the top of the list of search results.
To make for more useful results, most search engines store more than just the word and URL. An engine might store the number of times that the word appears on a page. The engine might assign a weight to each entry, with increasing values assigned to words as they appear near the top of the document, in sub-headings, in links, in the meta tags or in the title of the page. Each commercial search engine has a different formula for assigning weight to the words in its index. This is one of the reasons that a search for the same word on different search engines will produce different lists, with the pages presented in different orders.
An index has a single purpose: It allows information to be found as quickly as possible. There are quite a few ways for an index to be built, but one of the most effective ways is to build a hash table. In hashing, a formula is applied to attach a numerical value to each word. The formula is designed to evenly distribute the entries across a predetermined number of divisions. This numerical distribution is different from the distribution of words across the alphabet, and that is the key to a hash table's effectiveness.
The Search Engine Program
The search engine software or program is the final part. When a person requests a search on a keyword or phrase, the search engine software searches the index for relevant information. The software then provides a report back to the searcher with the most relevant web pages listed first.
Related Tags: money, marketing, internet marketing, affiliate marketing, free, business, home based, ppc, working at home, earn money online, tim yu, maillinglist
Tim Yu is the owner of www.Online2Biz.com providing information on Creating Your Own Business Online. To get Free "Internet Marketing Guide" & "49 Ways To Find A Profitable Niche Market Instantly!" course, go to www.Online2Biz.com Your Article Search Directory : Find in Articles
Recent articles in this category:
- Genuine Ways To Promote A Website
Internet is overloaded with millions of websites on different topics and subjects, here being notice - Simple Ways To Promote Your Website
Website promotion is the conjunctive outcome of Business, Customers and Internet. The best web promo - Seopressor Review Wordpress Plugin
Yes, a different SEO tool that can do everything including washing your old dirty laundry and bring - Japanese Seo - Break Into The Lucrative Japanese Market
Are you looking to drive traffic to your website? Do you want to successfully cater to the Japanese - Search Engine Optimization: Constantly Changing
There are plenty of cheaters out there who find ways around just about everything Google sets up and - Local Seo - What It Is & How To Use It Successfully In Your Search Strategy
Local SEO. Do you really know what it is? It's a term you often hear thrown around at parties by guy - Overview Of Pay Per Click
Pay per Clicking (PPC) advertising is a form of online marketing that drives targeted leads to your - Use Blog Commenting Service For High Traffic
You must be visiting to internet in your day to day life, and must be coming across different types - Professional Link Building Services Improving Search Engine Ranking
If you have collected all the needed facts about link building and its services then definitely you' - Link Baiting And Link Building Techniques
Search engine optimization is a very vast field, it contains many terminologies and techniques. Link
Most viewed articles in this category:
- Google Adsense Best Ads Placement
There are lots of stratigies, and ways of thinking, and I guess all of them has been tried at some p - Search Engine Tips & Techniques
As you are building your site or getting your site built, you need to do as much as you can to ensur - Social Media Optimization Gives you Online Business That Extra Boost
Social media optimization is nothing but the various methods that are utilized for making a site eas - Google PageRank Update Analysis
For those of you not yet aware, Google is currently updating the PageRank they are displaying in the - Common Search Engine Optimization Misktakes and Solutions
7 Search Engine Optimization Mistakes and SolutionsTo many websites, webmasters discover that major - Search Engines Secrets - Easy To Follow
1) Before you start, you must find the right keywords. If you optimize your WebPages for the wrong k - Keywords and Keyword Density
One of the best ways to insure that your site is being properly designed is to insure that keyword d - How you Can Make Money With Google Adsense
Google adsense is an advertising program that can help you earn lots of money from advertising, if u - The Great Search Engine Experiment Revisited Who is the Coolest Guy in the Universe
A recent Search Engine Experiment Demonstrates how by combining Key Word Rich Web Pages and Blog Ent - SEO Contests: Good or Bad?
As a webmaster you probably already know what a SEO Contest is or you surely came across some of the