Log File Analyzer
Parse server logs to see what Googlebot actually crawls.
Server log (Common Log Format)
Apache/nginx CLF lines. Bots are identified by User-Agent.
By bot
By status
Top crawled paths
Start here · Why parse log files here?
Server logs show what actually fetched your URLs, which matters when crawl diagnostics conflict with crawl simulations.
This analyzer expects Apache/nginx style CLF lines: IP, identity, user, timestamp, request verb + path + protocol, status, bytes, referrer, user-agent.
It classifies user-agents into major bots, aggregates status code totals, surfaces the busiest paths, and lists individual error hits for triage.
When to use this tool
- Crawl waste detection
See whether Googlebot repeatedly requests thin faceted paths or 404s before you tune robots or faceting rules.
- Launch monitoring
Paste a slice from launch day to confirm bots see mostly 200 responses.
- Third-party bot noise
AhrefsBot or others may spike; compare bot counts before blaming Google alone.
- Education
Use the bundled sample lines to teach how raw logs differ from UI crawl reports.
Examples
Walk through these with the form above — they are practice scenarios, not live data.
404 cluster
Try this
Include sample /sock-guide 404 lines and rerun after fixing the route.
What to look for
Errors stat should fall. Top crawled paths highlights recurring bad URLs.
Custom paste
Try this
Paste fifty lines from your CDN log download.
What to look for
If parsing yields zero hits, verify quoting and field order match CLF expectations.
Short tutorial
Follow in order the first time you use the tool; later you can skip to the step you need.
- Step 1 — Export logs
Grab plain text CLF or translate JSON logs into the classic pattern before pasting.
- Step 2 — Paste a representative window
Hours or days depending on traffic. Huge files may slow the browser; sample slices instead.
- Step 3 — Read bot and status cards
Confirm Googlebot volume looks sane relative to total hits.
- Step 4 — Inspect top paths
Look for parameter storms, accidental admin paths, or assets mis-returning 404.
- Step 5 — Feed findings into fixes
Pair with Crawl Budget Optimizer thinking or redirect tickets when waste is structural.
More detail
New here? Skim Start here first, then run one Examples scenario in the form above.
Log File Analyzer does one job: parse server logs to see what Googlebot actually crawls. It lives under Technical SEO on SEOToolkits, where the beginner idea is simple: Technical SEO keeps pages crawlable, indexable, fast enough, and understandable to search engines.
FAQ
- Does it support LTSV or JSON logs?
- Only regex-matched CLF-style lines parse today. Convert externally or extend your exporter.
- Can I trust bot names?
- Classification is user-agent substring based. Spoofed agents fall under human/other.
- Why zero hits?
- Lines that do not match the parser regex are skipped silently. Check quoting around the request.
- Is data uploaded?
- No. Parsing runs locally in the browser memory you paste into.
Related tools
Same workflow cluster on SEOToolkits — open another module without leaving context.
Crawl Budget Optimizer
Identify wasted crawl on low-value or duplicate URLs.
Broken Link Checker
Crawl a site for 4xx/5xx links across pages and assets.
Indexability Checker
Determine why specific URLs aren't getting indexed.
Robots.txt Analyzer
Test directives against URLs and user-agents at scale.