Why is it so important to have unique content?
In the early days of the internet web developers discovered it’s very easy to copy content from other websites via automated scripts. This became a huge problems for search engines, because when searching for a term the results would return literally millions of websites with exactly the same content.
From a searcher point of view there was no value in finding millions of pages with exactly the same content. Many search engines that could not deal with this problem fast enough quickly faded out of existence.
There were many techniques developed to filter duplicate content, but Google’s system is by far the most advanced. Google uses several methods in combination to detect duplicate content. However there are many pitfalls when trying to detect duplicate content and can lead to the wrong site being dropped or penalized.
Problems Search engines have when trying to filter duplicate content
Deciding which site to keep and which site to drop in cases of duplicate content.
How to detect the originator of content. Who created it first? (impossible but guessable)
How to deal with very similar but not identical text.
How to find text that was machine translated (copied) from another language.
How to deal with plagiarized text.
First Google will strip all consistent design elements from the page, anything that appears on multiple pages across the site gets stripped out. That includes navigation links, headers, footers, block statements ..etc.
Exact content filtering
Exact content filtering is the most obvious and widely used method, if two or more sites contain exactly the same data then only one site is kept and others are dropped.
Fingerprinting
Google will split the text up into paragraphs and sentences then create a “fingerprint image” based on the proximity and semantics of each block of text. This is to avoid template spammers and content plagiarization.
Are you a Doctor looking for locum work?? Do you believe in having yourself represented to your skills and attitude to work and not at your grade?Currently we are looking for a SHO, Acute Medicine Doctor to start ASAP.
Will have the same fingerprint as
proxmity content – shuffled sentences Are you a Doctor looking for locum work?? >Currently we are looking for a SHO, Acute Medicine Doctor to start ASAP. Do you believe in having yourself represented to your skills and attitude to work and not at your grade?
—-template content – word or phrase replacement Are you a >Nurse looking for locum work?? Do you believe in having yourself represented to your skills and attitude to work and not at your grade?Currently we are looking for a Obs & Gynae Nurse to start ASAP.
—-plagiarized content – semantically obvious rewording
Are you a Doctor looking for locum work?? Are your skills and attitude above your current grade? Currently we are looking for a SHO, Acute Medicine Doctor to start ASAP.
BackLink filtering
If two or more pages carry the same duplicated content, google will check the number of backlinks pointing each page, then keep the page with the highest number of backlinks. The reason is the page that most other webmasters link to is likely to be the most useful.
Site wide filters
Any site that contains a disproportionate amount of copied pages to unique content will have a penalty applied to it. This penalty will affect the whole website! including unique content as well as the duplicate content.
Historical profile
When a site is first launched google begins to build a historical profile of that site, a site that starts out spammy with duplicate content will find it much harder to rank in the future with clean unique content due to the previous history of spammy behaviour. The first 6 months of a new site is critical in building a nice clean history.
Hand check
Google will ocasionally send a human evaluator to your site to check for duplication that the algorithmn may have missed, if this happens and the site gets marked as a spam site.. it’s near impossible to fix for at least 90 days, and can completely destroy any future progress on the website.
As a rule it’s best to assume a human will be looking at your site so if you can spot duplicate content then a trained Google evaluator will spot it too.
Authority FreePass
Authority sites are trusted websites that have many links from other authorities pointing to them. With a very good historical record of clean non-spammy behaviour. Authority websites can get away with copying data and outranking the original content due to the trust built up over time in Google.
Sites like the BBC, MSN, .GOV (government), .EDU(education) are good examples of strong authority websites.
We try to build all our sites up over time to gain authority status. This takes careful link and content management over a year or two.
*Notes
Google is fairly lenient with copied data, you will sometimes see multiple sites ranking with the same content, this is because Google could not decide who originally created the content or the sites listed are all authorities in their own way. However you never want to rely on this as it’s unpredictable and can go badly wrong.
Write clean unique content for your visitors.
Don’t copy from elsewhere or use obvious templates to create content.
Never post your own content to other authority sites, they will outrank you and damage your future traffic.
Get as many backlinks as you can to each page.
Get authority sites linking to your site.
Asif
Although all our sites are build around Search Engine Optimization (SEO) few clients are actually aware of what this actually…
Why is it so important to have unique content? In the early days of the internet web developers discovered it’s…
A standard definition of marketing as: “The management process of identifying, anticipating and satisfying customer’s requirements at a profit.” GO…
There is a minority group of developers that have been evangelizing validation and markup. One of our most successful clients…