Content Similarity Checker
Detect near-duplicate pages within and across your domains.
Document A
Document B
Similarity
Word-shingle overlap. 8-grams catch verbatim copying.
Verbatim sentence matches (1)
Sentences appearing in both documents.
Longest common substring
73 characters in common.
Start here · What is content similarity?
Content similarity measures how much two pages or drafts overlap. High similarity can signal duplicate content, copied sections, boilerplate, or pages that should be merged.
This checker compares two documents using word shingles, exact sentence matches, and the longest common substring.
Use the verdict as a review signal. Similar pages are not always bad, but high overlap should make you ask whether both URLs deserve to exist.
When to use this tool
- Duplicate page review
Compare two location pages, product pages, or blog posts that look too similar.
- Rewrite QA
Check whether a rewrite is distinct enough from the old version or source material.
- Syndication checks
Compare partner content with your original before deciding whether to canonicalize, rewrite, or decline it.
- Consolidation planning
Use similarity evidence before merging near-duplicate articles.
Examples
Walk through these with the form above — they are practice scenarios, not live data.
Two city service pages
Try this
Fetch or paste the copy from plumber dallas and plumber fort worth pages.
What to look for
High 5-gram or 8-gram overlap suggests boilerplate. Add unique local proof or consolidate if the pages serve the same intent.
Guest post originality
Try this
Paste a submitted draft in Document A and a suspected source article in Document B.
What to look for
Exact sentence matches and long shared substrings deserve manual editorial review.
Short tutorial
Follow in order the first time you use the tool; later you can skip to the step you need.
- Step 1 - Add Document A
Paste text or fetch a page. Use the main body copy, not navigation or footer boilerplate when possible.
- Step 2 - Add Document B
Use the comparison page, old draft, competitor sample, or syndication source.
- Step 3 - Read the verdict
Near-duplicate and high-overlap results need closer review. Moderate overlap may be normal for templates.
- Step 4 - Inspect exact matches
Verbatim sentences and long shared substrings are stronger signals than a single summary percentage.
- Step 5 - Choose an action
Rewrite, add unique value, canonicalize, merge, redirect, or leave alone based on page purpose.
More detail
New here? Skim Start here first, then run one Examples scenario in the form above.
Content Similarity Checker does one job: detect near-duplicate pages within and across your domains. It lives under Content & Writing on SEOToolkits, where the beginner idea is simple: Content SEO is the practice of making a page useful, clear, and complete enough to satisfy a searcher.
FAQ
- Is duplicate content always a penalty?
- No. The bigger issue is confusion and wasted crawl attention. Search engines may choose one version and ignore the others.
- What are word shingles?
- Word shingles are short sequences of neighboring words. Matching longer shingles often means copied or very similar phrasing.
- Should templates count as duplicate content?
- Shared navigation, legal copy, and template text are normal. Focus on whether the main content provides unique value.
- Can this prove plagiarism?
- No. It provides overlap evidence for review. Human context and source history still matter.
Related tools
Same workflow cluster on SEOToolkits — open another module without leaving context.