Duplicate Content

Learn SEO
Home Page OnsiteDuplicate Content

What is Duplicate Content

Duplicate content is the content that is repeated on the internet. When the same content gets replicated, the situation becomes complex as the search engines become unable to decide which version is more relevant to the given search query. Due to the presence of duplicate content, site owners suffer from ranking problems and loose page traffic as search engines provide less relevant results.

There are three biggest issues involved with duplicate content which are as follows:

  • Search engines don't know which version to include/exclude from their indices
  • Search engines don't know whether to direct the link metrics to one page, or keep it separated between multiple versions
  • Search engines don't know which version to rank for query results

Causes of duplicate content

The content gets duplicated due to the following reasons:

  1. URL Parameters:
    URL parameters such as click tracking and some analytics code may result in duplicating the content.
  2. Printer-Friendly:
    When multiple versions of pages of printer friendly content get indexed, it causes duplication in the content.
  3. Session IDs:
    Session IDs are the common causes of duplicate content. Duplication in the content occurs when each user who visits a website is assigned different session ID.

Some duplicate content may cause pages to be filtered when search engine serves them as search result to the user, hence there remains no guarantee of which page is being displayed at the result list and which version won't. Duplicate content may also cause some pages or sites not to be indexed by the search engines which further leads in instructing the crawling program to stop indexing of pages as search engine finds multiple copies of same page under different URLs. Repetitive or duplicate content may also degrade the performance of search engines as they involve their resources in indexing same copies of similar pages. If search engines don't want to show similar content at the search result list, they have to filter the content consuming big amount of time.

Where search engines see duplicate content

Under the following circumstances, the search engines find duplicate content

  • When product descriptions from manufacturers, publishers, and producers are reproduced by a number of different distributors in large ecommerce sites
  • Alternative print pages
  • When pages start reproducing syndicated RSS feeds through a server side script
  • Canonicalization issues, where a search engine may see the same page as different pages with different URLs
  • When pages share too many common elements including title, Meta descriptions, headings, navigation, and text or closely resemble with each other.
  • Copyright infringement
  • Use of the same or very similar pages on different subdomains or different country top level domains (TLDs)
  • Article syndication
  • Mirrored sites