How and Why to Check for Duplicate Content

Posted by Emery Pearson on Jun 11, 2020 3:27:53 PM


You already know by now that the content on your website is one of the most -- if not the most -- important components of search engine optimization (SEO). Google heavily relies on fresh, interesting content that responds to user intent and fits their definitions of E-A-T (expertise, authority, and trustworthiness). 

This also means that sites with duplicate content find themselves unable to rank well or garner much traffic. That's because Google does not look kindly upon plagiarized or duplicated content. Today we'll take a look at what duplicate content is, how it affects your site, and what you can do to find duplicate content that needs to be updated.  

What is duplicate content?

There are a few different types of content that we call "duplicate" when it comes to web pages. These include:

  • The same content on multiple pages on your site
  • Content that is copied (plagiarized) or scraped from other sites
  • Content that is "spun" from other sites

The first is a common mistake that often comes from url variations like blog tag pages, http and https pages, or other url parameter issues. 

The second two are obviously intentional acts of copying other sites' content. Sometimes this is flat-out stealing; sometimes writers use other sites to research their own content and don't do enough to put everything into their own words. Article spinning is an outdated, black-hat method that some sites still employ. It is a process (usually done through software) that takes existing articles or other content and "spins" it so that it's slightly different. This type of content is typically hard to understand as the software does things like swapping out words for synonyms. It's the same content, offering the same information, and it's a terrible strategy.

But sometimes, duplicate content is simply a definition, a product description, or another small amount of content that is used across multiple sites for various reasons. 

The problem is, there's not a Google human parsing through all these pages and figuring out why content might be similar or the same as another site or page. Therefore, it's in everyone's best interest that all content is unique and issues with pagination or url parameters are solved if they are impacting your site. 

With all that in mind, let's take a look at what actually happens with duplicate content.  

How does duplicate content impact your site's rankings?

It's very difficult to have a large site that's completely free of duplicate content. It's also not a realistic goal to ensure you have completely unique content across your entire site. Many of the tools you can use to check will pick up similar sentences or word strings and call it "duplicate." Essentially, you want to have as much unique and useful content as possible, but you shouldn't stress yourself out if you have a bit of duplicate content.

However, if your site has a history of scraping or copying, if you have outsourced content writing and have never checked on their work, or if you aren't up to date on the history of your site, it's a good idea to take a look at the content. 

You'll see a lot of talk on SEO blogs or websites about duplicate content not being that big of a deal, or dismissing the idea that sites can take a hit if they have a lot of duplicate content. There are a few reasons why that's not accurate, which we'll talk about next. 

Is there a duplicate content penalty?

Technically, there is not a duplicate content penalty that will show up in Search Console. However, Google says

Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results. If your site suffers from duplicate content issues, and you don't follow the advice listed above, we do a good job of choosing a version of the content to show in our search results.

However, if our review indicated that you engaged in deceptive practices and your site has been removed from our search results, review your site carefully. 

This implies that manipulative behaviors will result in action taken by Google -- even if there isn't a specific manual action called "duplicate content." 

In most cases, however, the issues related to duplicate content and a loss of rankings or traffic come from other actions such as de-indexing, where Google removes a page or a site from the index, or from Google choosing a canonical that isn't your page. This means that Google chooses which page of duplicate content is the original and removes other iterations from the index. 

And, while there is no penalty per se, plenty of SEOs have seen sites with lots of duplicate content rank poorly and lose traffic. And we have seen sites address duplicate content issues and start to rank better. 

Google algorithms continue to improve and to take into account Google's ideals, including E-A-T, as we described at the beginning of this post. That means that Google wants the best content served to users depending on their query, and a site or writer with a good amount of E-A-T that has unique, relevant content is going to do better overall. 

How to check for duplicate content

If you're not sure how much of your site's content is fresh, there are several tools that can help you assess it for duplicate content. Here are just a few. 

  • Siteliner: Plug in a URL and see duplicate content as well as broken links and more. The free version will run up to 250 pages, with a premium version available if you need more. 
  • Grammarly: Grammarly's plagiarism checker is favored by teachers, but you can use it for websites too, as the app checks billions of web pages in addition to academic work. 
  • Screaming Frog and SEMrush are two options for checking for duplicate internal pages. 
  • Google Search Console: Search Console is your best bet to see exactly which pages Google isn't indexing because of duplicate content. Go to Coverage and Excluded to get a complete list.

Make sure new content is always fresh. It should be unique and add value, and it shouldn't be borrowed, stolen, or spun from someone else's content. Google will know and it won't be good. 

New call-to-action

Emery Pearson

Written by Emery Pearson

Emery is the content strategist at Tribute Media. She has an MA in rhetoric and composition from Boise State University, and she is currently an MFA candidate in creative writing at Antioch University. She lives in southern California with a bunch of creatures and many plants.