Duplicate Content: What You Ought to Know About
Penalties for having duplicate content can be really harmful. This is not just a downgrade in rankings but a move to supplementary results which are hardly visible to the most of the web users. Normally it is expected that Google would select one URL over another to display in SERPs, while duplicates could be found in supplemental results. Unfortunately this is not always so. In the thread "Duplicate content observation" in the WebmasterWorld.com forum you can read about a case when an original high quality and authoritative page was removed from Google's index together with its duplicates. Considering that this can happen even to the most honest webmaster, one can imagine the amount of attention this issue gets on any SEO forum.
Types of Duplicate Content
Duplicate content has a wider definition than the 'copy-paste' plagiarism; it is not just content scrapped from a competitor's site, a SERP or a RSS feed. Apart from this there are few more aspects that are generally referred to as duplicate content.
Circular Navigation
Jake Baille from TrueLocal vaguely defines circular navigation as having multiple paths across website. This can be understood as the same content being accessible via different URLs. An example of the circular navigation could be an article that is retrieved by links like – example.com/articles/1/ , – mysite.com/article1/ – mysite.com/articles.php?id=1
Another legitimate use of multiple URLs is forum threads. Each thread can be accessible by a link like myforum.com/index.php/topic.1201.html , and each message within the tread has a URL like myforum.com/index.php/topic.1201.msg.01.html . In the eyes of a search engine all the links lead to different pages with identical content. Solution? Think of a consistent way of linking, or apply robot.txt exclusion rules.
This can also be the case when other people link to you using differently looking URLs. Since these external links are out of your control, you should create a 301 redirect to the canonical URL you choose to be displayed.
Printer-Friendly Versions
Making a printer friendly version is a common practice and it adds value to the visitors. But printer-friendly version is also a prominent example of duplicate content! Fortunately a simple solution like adding a 'noindex' meta tag to your print pages solves the issue.
Product-Only Pages
Product pages looking similar are common among online stores. Typically they are created using a single template. Often two different product pages share a description that varies in just few words or numbers, which causes them to be filtered out as duplicate content. This issue has no easy solution. Either you rewrite robot.txt to allow only one product description to be crawled and lose SE traffic to the rest of them, or you roll up your sleeves and add something different to each product page, like testimonials, which is time consuming or nearly impossible depending on the number of product types in your stock.
How Do Duplicate Content Filters Work?
There are several algorithms in data mining aiming to detect similar text passages. The one claimed to be used by search engines is w-shingling. Each document has a unique fingerprint or shinglings – the contiguous subsequences of tokens (blocks of text). The ratio of magnitude of union and intersection of two documents' shinglings can be used to determine their resemblance. Another algorithm that can be used for duplicates detection is Levenshtein's distance
It is naturally to expect from a duplicate content filter to be able to discover the origin and rank it higher. The simplest way to detect the origin would be comparing the date of indexing implying that the original source is uploaded and crawled earlier than its copies. But with the advent of the RSS feeds the new content can be distributed instantaneously and this approach is no longer valid.
Concerning the origin's right to be ranked higher – this is not always implemented. J.S.Cassidy in her article 'Duplicate Content Penalties Problems with Googles Filter' published at SEOChat.com tells about an experiment of an article distribution. An article was syndicated twice scoring as many as 19000 copies. After some time Google, Yahoo and MSN have purged their indices leaving just few of the duplicates. MSN's filter managed not only to discover the origin but also put it to the top of the search results. Yahoo has also discovered the origin, but in the results page to the title of the article, the origin's position fluctuated obviously responding to the way Yahoo counts relevancy and authority.
To the tester's amusement Google's refined index did not include the original at all! Evidently Google featured only those pages with copies of the same article which it considered relevant and authoritative with no regard to the original source of the content! I've already mentioned a thread where a similar problem is discussed. The both stories took place in 2005 and early 2006 and so far I found no evidence that this issue is resolved.
Related Tags: seo, google, search engines, duplicate content
Oleg Ishenko, MCSE, MCDBA, BScGet more useful info on SEO at our Search Engine Optimization Research Your Article Search Directory : Find in Articles
Recent articles in this category:
- Genuine Ways To Promote A Website
Internet is overloaded with millions of websites on different topics and subjects, here being notice - Simple Ways To Promote Your Website
Website promotion is the conjunctive outcome of Business, Customers and Internet. The best web promo - Seopressor Review Wordpress Plugin
Yes, a different SEO tool that can do everything including washing your old dirty laundry and bring - Japanese Seo - Break Into The Lucrative Japanese Market
Are you looking to drive traffic to your website? Do you want to successfully cater to the Japanese - Search Engine Optimization: Constantly Changing
There are plenty of cheaters out there who find ways around just about everything Google sets up and - Local Seo - What It Is & How To Use It Successfully In Your Search Strategy
Local SEO. Do you really know what it is? It's a term you often hear thrown around at parties by guy - Overview Of Pay Per Click
Pay per Clicking (PPC) advertising is a form of online marketing that drives targeted leads to your - Use Blog Commenting Service For High Traffic
You must be visiting to internet in your day to day life, and must be coming across different types - Professional Link Building Services Improving Search Engine Ranking
If you have collected all the needed facts about link building and its services then definitely you' - Link Baiting And Link Building Techniques
Search engine optimization is a very vast field, it contains many terminologies and techniques. Link
Most viewed articles in this category:
- Google Adsense Best Ads Placement
There are lots of stratigies, and ways of thinking, and I guess all of them has been tried at some p - Search Engine Tips & Techniques
As you are building your site or getting your site built, you need to do as much as you can to ensur - Social Media Optimization Gives you Online Business That Extra Boost
Social media optimization is nothing but the various methods that are utilized for making a site eas - Google PageRank Update Analysis
For those of you not yet aware, Google is currently updating the PageRank they are displaying in the - Common Search Engine Optimization Misktakes and Solutions
7 Search Engine Optimization Mistakes and SolutionsTo many websites, webmasters discover that major - Search Engines Secrets - Easy To Follow
1) Before you start, you must find the right keywords. If you optimize your WebPages for the wrong k - Keywords and Keyword Density
One of the best ways to insure that your site is being properly designed is to insure that keyword d - How you Can Make Money With Google Adsense
Google adsense is an advertising program that can help you earn lots of money from advertising, if u - The Great Search Engine Experiment Revisited Who is the Coolest Guy in the Universe
A recent Search Engine Experiment Demonstrates how by combining Key Word Rich Web Pages and Blog Ent - SEO Contests: Good or Bad?
As a webmaster you probably already know what a SEO Contest is or you surely came across some of the