While they’re both called 404, they are very different.
Consequently, it’s essential to understand the difference between the errors to fix them.
A webpage accessed by a browser responds with a status code that communicates whether the request was successful and, if not, why it wasn’t.
These responses are communicated with what is referred to as HTTP response codes, but officially they are called HTTP status codes.
A server provides five categories of response codes; this article is specifically about one response, the 404 page not found status code.
All codes within the 4xx series of responses mean the request could not be fulfilled because the page was not found.
The official definition is:
The 404 response is ambiguous as to whether the webpage might return.
The official documentation is clear about the ambiguity of whether a page is temporarily or permanently gone:
To summarize, the 404 page not found code means there was an error in the browser request because the requested page could not be found.
A soft 404 error is not an official status code. The server does not send a soft 404 response to a browser because there is no such thing as a soft 404 status code.
Soft 404 describes a situation when the server presents a webpage and responds with a 200 OK status code, indicating success when the webpage or content is actually missing.
A webpage is missing, and a server sends 200 OK status.
This kind of soft 404 happens when a page is missing, but the server configuration redirects the missing page to the home page or a custom URL.
The page is gone, but the publisher has done something to fulfill the request for the missing page.
When content is completely missing, or there’s very little of it (a.k.a. thin content), the server will respond with a 200 status code, which means the request for the page was successful.
But for indexing webpages that are not successful webpage requests, search engines call this soft 404s.
The missing page redirects to the home page.
Some mistakenly believe that there’s something wrong with a 404 error response.
So, to stop the 404 error responses, a publisher may redirect the missing page to the homepage, even though the homepage is not what was requested.
Sometimes, missing pages redirect to a custom-made webpage that serves a 200 status code, which results in Google labeling these pages as soft 404s.
The concept of a soft 404 may have originated in a 2004 research paper titled, Towards an Understanding of the Web’s Decay (PDF).
The missing pages that are improperly substituted present a problem to search engines that are trying to index real pages.
Here is how the research paper frames soft 404s:
There are cases where the page isn’t missing, but specific problems (like coding errors) have triggered Google to categorize it as a missing page.
Soft 404s are essential to investigate because they could signal broken code.
If the cause of the 404 is a linking error, you have to fix the links.
The tricky part of this task is finding all the broken links on a site. It can be more challenging to crawl large complex sites with thousands or millions of pages.
In instances like this, crawling tools come in handy.
You have so many site crawler software options to choose from: the free Xenu and Greenflare; or paid software like Screaming Frog, DeepCrawl, Botify, Sitebulb, and OnCrawl, where several of these have free trial versions or free but limited feature versions.
When a page no longer exists, you have two options:
First, you have to locate all the linking errors on the site. Similar to finding all errors in linking for a large-scale website, you can use crawling tools.
However, crawling tools may not find orphaned pages: pages not linked from anywhere within the navigational links or from any of the pages.
Orphaned pages can exist if they used to be part of the website, then, after a website redesign, the link going to this old page disappears, but external links from other websites might still be linking to them.
To double-check if these kinds of pages exist on your site, you can use various tools.
The Search Console will report 404 pages as Google crawls through all the pages it can find. This can include links from other sites to a page that used to exist on your website.
You won’t find a missing page report in Google Analytics by default. However, you can track them in different ways.
For one, you can create a custom report and segment out pages with a page title mentioning Error 404 – Page Not Found.
Another way to find orphaned pages within Google Analytics is to create custom content groupings and assign all 404 pages to a content group.
One cannot use the site: search command to find 404 errors because Google doesn’t index 404 webpages or soft 404 webpages.
Google’s site: search operator is useful for finding webpages on a site that contain a specific keyword phrase in the content of the webpages.
Google’s Search Console is the best source for identifying a list of soft 404s and regular 404s.
The website traffic error logs are a useful source for identifying 404 error responses.
Backlink research tools like Majestic, Ahrefs, Moz Open Site Explorer, Sistrix, Semrush, LinkResearchTools, and CognitiveSEO can also help.
Most of these tools will export a list of backlinks linking to your domain. From there, you can check all the linked pages and look for 404 errors.
Crawling tools won’t detect a soft 404 because it isn’t a 404 error. But you can use crawling tools to catch something else.
Here are a few things to find:
Aside from the crawling tools, you can also use Google Search Console and check under crawl errors to find pages listed under soft 404s.
Crawling an entire site to find issues that cause soft 404s allows you to locate and correct problems before Google detects them.
After detecting these soft 404 issues, you will need to correct them.
Most of the time, the solutions appear to be common sense. This can include simple things like expanding pages with thin content or replacing duplicate content with new and unique ones.
Throughout this process, here are a few things to consider:
Sometimes, thin content is caused by being too specific with the page topic, leaving you with little to say.
Merging several thin pages into one page can be more appropriate if the topics are related. Not only does this solve thin content issues, but it can fix duplicate content issues as well.
For example, an ecommerce site selling shoes in different colors and sizes may have a different URL for each size and color combination. This leaves a large number of pages with content that is thin and relatively identical.
The more effective approach is to put this all on one page instead and enumerate the options available.
Using even the most straightforward web crawling tool like Xenu (which doesn’t look at content but only URLs, response codes, and title tags), you can still find duplicate content issues by looking at URLs.
This includes www vs. non-www URLs, HTTP and HTTPS, with index.html and without, with tracking parameters and without, etc.
The most important thing to remember about 404 errors is that if the pages are truly missing, then there is nothing to fix. It’s okay to show a 404 response for requests for pages that do not exist.
But if the pages exist but on a different URL, then that’s something to fix by redirecting a broken link to the actual URL, restoring a missing page, or redirecting the old URL to a new page that replaced it.
A soft 404 is always the result of a problem that must be diagnosed and fixed.
Understanding the difference between the 404s is essential to keeping a website operating at peak performance.