What is Latent Semantic Indexing (LSI): A Rethink


by Peter Nisbet - Date: 2007-01-13 - Word Count: 1503 Share This!

What is latent seamtic indexin - how often have I heard that question! It would be easier to state what Latent Semantic Indexing is not, than try to explain what it is. Even a statistical mathematician would find it extremely difficult to explain the concept of LSI, sometimes referred to as latent semantic analysis, to a layman in just a few words!

I have written previously on LSI and made the same mistake as most people involved with SEO make: confusing a concept with an entity that can be designed into a website. My scientific background compels me to correct what I have written in previous articles, having completed the deeper research into LSI that I should have carried out prior to commenting on the concept. Research, incidentally, which those charging others for SEO work involving 'LSI' should also be carrying out!

LSI is not what most SEO experts claim it to be. It is certainly not a concept that can be used by the average web designer or webmaster to improve their search engine listings, and is not what many people, including myself, has written it to be. However, first some background.

The term 'semantics' is applied to the science and study of meaning in language, and the meaning of characters, character strings and words. Not just the language and words themselves, but the true meaning being conveyed in the context in which they are being used.

In 2002 a company called Applied Semantics, an innovator in the use of semantics in text processing, launched a program known as AdSense, which was a form of contextual advertising whereby adverts were placed on website pages which contained text that was relevant to the subject of the adverts.

The matching up of text and adverts was carried out by software in the form of mathematical formulae known as algorithms. It was claimed that these formulae used semantics to analyze the meaning of the text within the web page. In fact, what it initially seemed to do was to match keywords within the page with keywords used in the adverts, though some further interpretation of meaning was evident in the way that some relevant adverts were correctly placed without containing the same keyword character string as used on the web page.

Google launched its own contextual advertising system in March 2003, and subsequently acquired Applied Semantics just over a month later. Adsense as we know it was launched and webmasters could make considerable sums of money by attracting visitors to web pages specifically designed for the purpose. Every click on an advert earned cash from Google for the owner of the website displaying it.

It became commonplace for websites to comprise hundreds, and even thousands, of software-generated pages containing repetitions of keywords and long-tailed key phrases, but little else. Thousands of pages could be generated, the only difference between them being the keyword or phrase used, with no content whatsoever for the visitor. Such software is still being sold on the internet in spite of all the attention given to the so-called LSI algorithm.

Google searched each webpage that was registered for the Adsense system and determined the theme of the page my means of semantic analysis. At this time there was no differentiation made in the analysis between sites using only the same keyword repeatedly and those with genuine content relevant to the theme. Adverts related to this theme were then added to the page by Google.

These pages were ranked highly on the search engines due to their high keyword density, since search engine listings were based on the density of the search phrase used (keyword) rather than any associated content. There were so many websites generated by the software that only a small proportion needed to become visible in the listings for their owners to make a lot of money from the adverts that Google placed on them. These sites could generate several thousands of dollars for their owners every single day without contributing any worth to the internet at all.

In order to control this 'spamming' of its search indices with worthless websites, Google decided to add what it termed LSI, or latent semantic indexing, to its indexing algorithm, very similar to what it was using to determine the theme of Adsense pages. What this claims to do is to analyze the semantic content of websites and determine the true value of the site to any visitor using a specific search term.

This value was analyzed by searching for words and phrases similar in meaning to the keywords used, rather than only the keywords themselves. In this way, pages containing keywords with little other contextually similar content were rooted out and the pages either de-listed or demoted down Google's search index for these keywords.

For example, in a website devoted to cars, whereas prior to so-called LSI the use of 'cars' throughout the page would be liable to secure a high listing, it will no longer. The use of associated terms such as 'vehicles', 'motors',and 'autos' will be given more value than simply using 'cars' alone. This analysis can distinguish between the use of 'cars' in reference to racing or to toy 'cars', whereas previously it could not: in 'racing cars' and 'toy cars' only the word 'cars' was recognized.

LSI is now regarded as being a major means of optimizing webpages to conform to the requirements of the Google algorithms. Minimal use of keywords, and more use of synonyms and phrases relevant to the contextual meaning of the keyword relating to the page, became the way to use LSI to achieve higher listings. Or so the SEO experts informed us. In fact, the concept of latent semantic indexing has been known in statistical analysis for decades, and is not something that can be 'used' as such on a website.

There are many SEO websites suggesting that they can provide a service to make our website LSI friendly, or meet 'LSI requirements'. One way of doing this, it is suggested, is to stuff the page full of synonyms and other related terms. I have written articles myself about how this can be done, and tried to suggest the correct way to use LSI. Although my suggested 'use' of LSI was erroneous in scientific terms, the ideas introduced are nevertheless good practice and will help you to produce webpages containing genuine content.

Having taken the time to do some research into what latent semantic indexing, or analysis, really means, I now know that webmasters cannot use LSI as such; to suggest otherwise is blatant nonsense. Up to date, I have not seen any explanation from SEO experts as to what latent semantic indexing truly is. I have read several LSI papers and reviews, written by mathematical statisticians, that attempt to explain the subject to the layman. This was achieved with extreme difficulty, and I doubt that anyone who is not an expert on semantics or statistical analysis fully understands what the term means.

It appears to be commonly used in SEO as a general definition for the way that the mathematical detection of synonyms, and how certain words are related to others in a piece of text, is applied to the indexing of webpages by search engines. It has little to do with 'latency', more to do with the actual usage of semantics within a text.

Too many people, me included, have professed to understand its use by Google and other search engines, without fully understanding what the term itself means. While it may be necessary for an SEO expert to able to explain to clients what the concept of LSI means to them, it is difficult to see to what practical use it can be put.

It is far better for people to forget about trying to manipulate the use of language, and to concentrate on writing honest and relevant content, while spending more time on building an intelligent and useful marketing campaign. One of the better ways to achieve this is to use article directories to promote their website through the publication of well written and relevant articles.

Stop trying to beat the search engines unless you get a kick out of it. You will never succeed because you will always have to follow them, and by the time you think you have caught up, they will be a further step ahead.

Write content that you, yourself, would find interesting to read and use article directories to publish your thoughts to the world. Wait a day or two after writing prior to submission, and make sure that you enjoy reading what you have written. This is what will ultimately bring you results.


What is latent semantic indexing? Forget it - just write!
------------------------

Peter is a professional freelance writer, currently ghostwriting website articles which are much in demand. He operates from his website Article Writing Services and writes on any topic, uniquely for you. Not only is no one else sold your article, but you can get two versions on request - one for your site and one for submission to directories. He is a busy man but approachable for advice from

Related Tags: seo, article writing, search engines, lsi, site optimization, latent semantic idexing, writing artilces

Your Article Search Directory : Find in Articles

© The article above is copyrighted by it's author. You're allowed to distribute this work according to the Creative Commons Attribution-NoDerivs license.
 

Recent articles in this category:



Most viewed articles in this category: