Duplicate content is not plagiarism as you might think of it because Google’s focus is on providing relevant answers to users’ search queries.
What is duplicate content?
The title of this post is a riff on the infamous “Not Your Father’s Oldsmobile” ads of the 1980s. In the ’80s, duplicate content was synonymous with plagiarism. In the eyes of Google, and because of the way the Internet works, that is not so today. Moz defines duplicate content as “content that appears on the Internet in more than one place.” Google views content as duplicate when it is “appreciably similar.” In general, duplicate content is repetition of certain types of content (in contrast to plagiarism, which is outright appropriation). But the problem with “appreciably similar” content is that it complicates Google’s eternal quest to return the most relevant search results possible to users’ queries.Risks of duplicate content
Any lawyer engaged in creating web content should be aware of the risks, which include:- Failing to appear in results for users’ search queries
- Being blacklisted by Google, effectively rendering you a non-entity
- Marketing agencies making false claims that your site fails the “Copyscape test”
- Loss of reputation (and possible litigation) for engaging in plagiarism
Teen busted for beer, Mom searches for a defense lawyer
Imagine the anxious mother of a wayward teen, sleepless at night in front of the screen, plugging “best criminal defense lawyers underage drinking” into the search bar. Whose content does she find? In an ideal world, Google returns search results that prominently feature the local trial veteran, who just so happens to be the author of a preeminent treatise on underage drinking law. This trial veteran has excerpted her treatise and integrated portions of it into her marketing copy. The copy demonstrates her undeniable expertise. There are other tangible benefits as well, such as authoritative inbound links leading to this content, in large part because of its quality and this lawyer’s excellent reputation. The trial veteran, however, is dismayed to find that Google also serves the “appreciably similar” content of the greenhorn who just last week began growing a beard to look more grizzled than his half year in practice would suggest. In some cases, the greenhorn may appear above the trial veteran, prompting her to do some brief research on this young chap and poke around his site. She finds some of her appreciably similar content smattered across his site, cries plagiarism and begins to wonder whether she is losing business from it.Risk: Failing to appear prominently in search results
[Threat level: Minimal] In most cases – at least, in the vast majority of those I’ve reviewed as part of my duplicate content analysis – it’s not plagiarism. Forget about the irritating fact that the perpetrator is a greenhorn with a questionable beard; it’s not relevant. What is relevant is a lawyer’s ability to provide satisfactory answers to potential clients’ questions. Relevance is Google’s focus. In all likelihood, Google would not spotlight the greenhorn’s content because Google already knows that it is duplicate content. If the greenhorn nonetheless shows up on page one for some of the anxious mother’s searches, it’s not because of cribbed content alone. Google is sophisticated enough to distinguish between common marketing techniques and overt appropriation. Use of the phrase appreciably similar in how it defines duplicate content, rather than plagiarized, gives us a clue as to Google’s viewpoint. Ultimately, Google rests its case on the “good job” it does in serving the right version of content to users’ search queries.Take-away: Even if the greenhorn has shamelessly cribbed substantial portions of the trial veteran’s preeminent treatise and used it in his own marketing copy, Google usually knows what’s what. In an apples-to-apples comparison, Google would favor the trial veteran’s content in search results.
Risk: Being blacklisted by Google
[Threat level: Minimal] As Google says, “Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results.” Here are three examples:- Multiple URLs that all contain the same content, such as those built for tracking purposes, which dilute “link juice” and page authority
- Two pages with the same content: one with “www” in the domain name, the other without
- Printer versions of pages
Take-away: Duplicate content exists everywhere on the Internet. Google does not default to blacklisting sites that contain duplicate content, unless you’re trying to manipulate search results.
Risk: False claims that your site fails the ‘Copyscape test’
[Threat level: Minimal] Copyscape is a third-party tool that helps content owners locate instances (and assess the scope and severity) of duplicate content. Copyscape runs an algorithm that searches and returns results on potential duplicate content issues, but these results still require an actual human being to analyze. There is no such thing as a “Copyscape test” that a website either passes or fails. As we’ve seen with other online tools, like those that measure page speed, marketing agencies may make this claim to drive a competitive wedge and win business. The reality is that if you’ve been on the web for any appreciable length of time, you will have some duplicate content. Just the act of producing content – especially in competitive practice areas like criminal defense (“underage drinking may result in such-and-such charges and this-or-that penalty”) – will nearly always result in some degree of repetition, such as common branding and calls to action. But this does not excuse outright appropriation and adjustment of a competitor’s marketing copy as though it were a game of Mad Libs.Take-away: A Copyscape report, by its own existence, is not evidence of a duplicate content problem. In lawyer marketing, most duplicate content is of the permissible kind: portions of statutes, case law, publically available legal information and common navigational elements on websites, among others. These have a negligible impact on search results.