PageTraffic SEO Blog

Subscribe To Page Traffic Blog
Subscribe via RSS
Subscribe via Email

Getting Rid of Duplicate Content Issues Once and For All: PubCon Las Vegas 2008, Day 3

November 13th, 2008 | 1,091 Views RSS Feed



If you're new here, you may want to subscribe to our Full RSS feed to get a daily digest of news around search engine industry.

Duplicate content is becoming a major issue not only for all the search engines, but also for Webmasters from all around the world. In the session, the representatives of the big three, namely Google, Yahoo! Search and MSN Live, explain some of their strategies with regard to duplicate content.

Moderator:

  • Rand Fishkin

Speakers:

  • Ben D'Angelo, Software Engineer of Google
  • Derrick Wheeler, Senior Search Engine Optmization Architect of Microsoft
  • Priyank Garg, Director Product Management of Yahoo! Search

The session was initiated by Ben D'Angelo, who started of by pointing out the crucial duplicate content issue of multiple URLs pointing to the same page or quite similar pages. Duplicate content is also found across other websites as syndicated or scraped content. The perfect situation is when one URL would be simply leading to one piece of content.

There are a number of examples of duplicates, such as www & no www, session IDs, URL parameters, print version pages, CNAMEs, etc. Then there are also similar content on different URLs as well as sites in different countries with same content.

Ben want on to explain how Google handles duplicate content. They basically cluster pages together and choose the page that best represents the search. Google employs different kinds of filters for the different kinds of duplicate content. But this is simply a filter and not any kind of penalty

So how to prevent this from happening with you. You can take some of the following measures:

  • To prevent exact duplicates, one could use a 301 redirect.
  • To prevent near duplicates, one could use robots.txt.
  • A different language us not a duplicate. One could use unique content specific to the countries.
  • Don't put extraneous parameters in the URLs.

But there is a chance that other sites would cause duplicate content. In case you are syndicating your content out then ensure that there is a link back to your original article or content. One could also give a short summary about the same. In case you are syndicating another's content then you could do the reverse.

It would be an extremely rare case that scrapers would be impacting you or your content. However, one can't rule out the possiblity and in case the same happens, then one should file a DMCA and/or Spam Report.

Ben was followed by Priyank Garg, who explained how Yahoo! Search deals with the same. Yahoo! uses duplicate filters through all the steps in the pipeline. He went on to showcase some examples and stated that most duplicate content is accidental. A large number of duplicates come from soft 404, not the real 404s. Many of them are also abusive forms just as scrapers.

The final speaker of the session was Derrick Wheeler of Microsoft. Derrick made no bones in making it clear that duplicate content was Microsoft's worse nightmare. Microsoft follow the methodology of CIRTS, which goes as:

  • C= Crawl
  • I= Index
  • R= Rank
  • T= Traffic
  • A= Action

He offered the following tips on how to handle the problem of duplicate content:

  • Try to detect when an engine comes to your site.
  • Sometimes, such as in the case of session Ids, it can also be helpful.
  • Be fully aware of your parameters.
  • Make sure that you link to your parameters in a consistent order.
  • Exclude any form of duplicates using robots.txt, noindex, nofollow, etc.
  • Never assume that search engines can't find JavaScript.
  • Try to get hold of a tool that can crawl your site. This will enable you to see how an engine will be looking at your site.
  • Always focus on the strong URLs of your website first

Click here to subscribe to our RSS feed to get a daily digest of news around search engine industry. PageTraffic SEO Blog is updated four times a day and is ranked as one of the best search engine resources blog by Pandia!


 


Comments

One Response to “Getting Rid of Duplicate Content Issues Once and For All: PubCon Las Vegas 2008, Day 3”

  1. Roger Hamilton Says:

    I wonder how this is going to turn out. Hope we get to see something..

Leave a Reply

Hire Full Time SEO Consultant


Subscribe To Our SEO Blog


Enter your email address:

Delivered by FeedBurner



The Associates

SEO Blogs - Blog Catalog Blog Directory

Back to Top

Copyright © 2006-2009 PageTraffic SEO Blog. All rights reserved.

RSS feeds. WordPress Theme by Candid Software.

Feedback Form