Informational notification.
This post is a trending issue.
Jul 16, 2020

How to get list of all 404 pages for a website

We are developing an app. We are trying to get all 404 pages already indexed by Google.
Is there any API to get the list of all 404 pages for a given website?
Locked
Informational notification.
This question is locked and replying has been disabled.
Community content may not be verified or up-to-date. Learn more.
All Replies (4)
Jul 16, 2020
Google doesnt index pages that return 404. If when crawling it found 404, it wouldnt index it. 
 
... so if Google has a page indexed, and it now returns 404, it means Gogoel crawled and indexed it BEFORE it came 404. 
 
 
So, you can't get a list of pages that has become 404 from Google since it indexed. Because well it doesnt KNOW about the 404 yet. As soon as it does it wont be indexed!
 
 
So would have to get a list of indexed URLs. And cross reference it with some OTHER list of URLs that are now 404. 
 
You can't directly get a list of ALL indexed URLs, but if have < 1000 can possible get them from the Coverage Report. Ie it will list upto 1000 in each status. 
... if have over 1000 urls indexed, then only get a sample from the console. Although sitemaps can be used to filter the coverage report. (but if you have a list in sitemaps, probably dont need the list from Google!) 
 
Jul 16, 2020
Can I get the 404 from Coverage Report using any API? Or need to get it manually?
Jul 16, 2020
I'd look at your log files over the last few months, they would record all 404s during that time period for users and bots. 
Jul 16, 2020
Ok, so wanting to get the list of URLs Google HAS found to be 404? 
as noted they are NOT indexed. ie Google treats 404 as 'Excluded' and doesnt index :)
 
... but no there is NO API for URLs from the coverage report. 
The closest is to use the manual 'Export' function in the Console UI. 
 
 
 
 
false
3895683344059513980
true
Search Help Center
true
true
true
true
true
83844
Search
Clear search
Close search
Main menu
false
false