Redirect / 404 /501/301/ Canonical Implementation
Introduction
A 404 error means "not found". This is usually the page you get when you make
a mistake spelling page name in a site, or if the page is deleted or moved. The
problem is that the standard 404 page is ugly and unhelpful.
Many people have figured out that if you use a custom 404 page you can
present a much more helpful page to you visitors. Others have taken it a step
further and made that custom page a redirect to the home page, so that any links
(and PR) pointing to pages that have been deleted (or misspelled) will be passed
on to the website.
Sounds great, right? Well, there is a problem (there is almost always a
problem with things that sound too good to be true...). The problem is that if
you use a redirect to pass PR from an error page to a normal page, the
redirecting page will usually return a "200 OK" or 302 Redirect code, rather
than a proper 404. This messes up search engines and can result in a whole bunch
of indexed URL's all looking to the search engine like duplicates of your home
page (there is no redirect code, it's a pure 200 OK).
"410 Gone" Error - It's Gone, Dammit!
If you are really, really insistent on a page being removed, you can send a "410 Gone" error for the location, which means that the page is not there, will never be there, and there is no forwarding address.
This usually isn't necessary, but can be useful if you are trying to remove all traces of a page you no longer want associated with your site (i.e. one you were sued over, for example). It says that the page is missing on purpose, and is not an accident or temporary problem.
In this case a URL removal Request to Google followed by a 410 on the page location itself should do it. You can also use robots.txt and robots metatag as backup.
If you are really, really insistent on a page being removed, you can send a "410 Gone" error for the location, which means that the page is not there, will never be there, and there is no forwarding address.
This usually isn't necessary, but can be useful if you are trying to remove all traces of a page you no longer want associated with your site (i.e. one you were sued over, for example). It says that the page is missing on purpose, and is not an accident or temporary problem.
In this case a URL removal Request to Google followed by a 410 on the page location itself should do it. You can also use robots.txt and robots metatag as backup.
This is bad for your site. Additionally, there are a LOT of indexed "error"
pages in search engines (especially Yahoo) that should not be there.
The proper behaviour for an error page is to return a 404 error code. The
best result for your visitors is an error page that is either helpful by itself
or redirects to a helpful page. The best result from an SEO viewpoint is for any
link popularity for broken links be passed on to the page of your choice.
Naturally, the best result overall would be something that accomplishes all
of the above. Unfortunately, this is not directly possible. As soon as
the search engine is sent the error code, it treats it as a dead page and will
eventually remove it.
PR and link weight are only passed on if a page is not a 404. But your site
logs will not report errors if it responds as a 200, and your site will not
verify (for example, if you use Google Sitemaps) if you don't have a valid 404
page.
There are 4 possible scenarios with custom pages:
- 404 - Responds with an error, but shows a custom page to help your visitors
- 200 - If a page is missing, it's replaced with the custom error page
- 302 - If the page is missing, it's replaced with a temporary redirect to a custom error page
- 301 - Redirects errors to either a custom error page, or some other page in the site (i.e. sitemap, homepage or best guess)
Each has benefits and drawbacks. You have to choose - "Red or Blue":
Custom Error Page Types
404 Not Found Response | 200 OK (or 302, or 301) Response |
Properly Defines the result - a missing page. | Tricks the search engine into thinking all is well. |
Validates. | Does not validate, but won't break your site. |
Shows up in logs so you can fix it. | Does not show up as an error - harder to find. |
Does not pass on PR or link weight. | Passes on PR to final page. |
No duplication issues. | Can result in a duplication penalty. |
Custom Error Page Link Issue
One thing I'd like to make sure everyone is aware of - a custom error page
can be called anywhere in your site. This means that if you put any links on
that page to help the visitor find their way, you cannot make them relative -
since you don't know where they are relative to.
You must make them either absolute (recommended) or set the base HREF using
this code in the header of the page and make sure all your links are relative to
it:
Nifty Misuse of the Error Page
Sometimes you won't have access to the .htaccess of a site, but do have
access to a custom error page. Let's say you have a dynamic site on this site
but due to security issues (i.e. PHP "safe" mode) you can't write pages
dynamically to disk, and therefore, without .htaccess to do it on the fly or php
permissions to write static pages, you can't have a CMS with "SEO friendly"
URLS. Or can you?
Normally, I'd suggest switching hosts in this case. Really. But let's say you
want to stay with them.
You can write a script on your custom error page to parse what the requested
URL is: ie yourdomain.com/content/blue.htm into a database query that is
actually yourdomain.com?content=blue and then put the results of that
query into the error page thus "faking" .htaccess.
In reality, you are using the .htaccess, but just not in the way it was
intended. Naturally, this technique is not standard and your mileage may vary
depending on the server setup. It also results in a 200 OK. Make sure that you
program in error capturing so if someone legitimately types in the wrong URL
that it results in a 404.
Naturally, this also works with IIS and an ASP error page, as well.
Server Issues
Apache and IIS handle custom error pages differently. Usually, I've noticed
that the custom error pages on IIS are more likely to be wrong than the ones on
Apache, but they can both have issues.
First things first - you need to issue a 404 error code at the server
level
in order for it to work consistently. Attempting to write it in at the page
level will not work. If the page is dynamic and the error is written at the
server level before the page is served, that will usually work. Once you are at
the ISAPI level, it's too late to send an error code.
Important!
Normally, it's a good idea to define pages in a server as absolute URLS or
files. Not for custom error pages. The path MUST BE RELATIVE TO ROOT or
it will return a 200 OK. This applies to all servers I'm aware of,
including both Apache and IIS.
Of course, if you are trying to get a 200 OK status in an attempt to pass on
PR, then you would use the full URL, not the relative one.
Apache Custom 404
This one is easy. Just go to your.htaccess file (or control panel) and type
in the following:
ErrorDocument 404 /404.php
Change the name "404.php" to whatever the name for your custom error page is.
You might be tempted to type in:
ErrorDocument 404 http://www.mysite.com/404.php
*WRONG! Results in 200 OK*
But it won't work. Usually it will result in a 200 OK response. Once
again, the path must be relative to root or it won't respond with a 404 error
code properly.
IIS Custom 404
Dynamic Error Page
If you are running IIS and you are using .asp or aspx custom
error pages, then you can put:
Response.status = "404 Not Found"
In code, this usually looks like:
<%
dim pageRequested
with request
pageRequested = _
mid(.queryString, instr(.queryString,";") + 1)
end with
response.status = "404 Not Found"
%>
dim pageRequested
with request
pageRequested = _
mid(.queryString, instr(.queryString,";") + 1)
end with
response.status = "404 Not Found"
%>
Put this at the very top of the page. Then, create the rest of the page to do
and say what you want it to.
Static Error Page
There is nothing special you need to do to static error pages, just make sure
you connect to them using the full file name (i.e. c:/www/404.htm).
Setting the Custom Error Page in IIS
Go to IIS Administration and choose the web that you want to set the custom
error page for (each web may have it's own). Right click and go to the "Custom
Errors" tab.
For dynamic error pages, make sure that you are pointing to the custom page
using the URL not the File choice. If you use File, it will not
pre-process the page, and will simply treat it as static. Remember to use the
RELATIVE PATH FROM HOME or it will return a 200 OK instead of the 404
Not Found.
If you are using a static page (i.e. .htm) then you use the File
choice to connect to it, using the physical drive location (i.e. c:/www/404.htm)
You can then test it using a Header Viewer: If it comes back in the Header
section as:
HTTP/1.1·200·OK(CR)(LF)
Then it's not working, but if it comes back and shows:
HTTP/1.1·404·Object·Not·Found(CR)(LF)
Then it is.
The Metarefresh Problem
In some cases, people will create a custom error page that displays the
error, then "helpfully" uses a metarefresh
to forward the visitor to the site map, home page or some other page.
The problem with this is that each search engine treats metarefreshes
differently. Yahoo, for example, treats a metarefresh of 0 as a 301, and
anything larger as a 302. Most of the time, this works great. But in the case of
a 404 error page with a metarefresh- what is it being treated as?
The other search engines vary widely in how they handle these. I believe that
since the URL is sent to the search engine by the server as
404;http://www.mysite.com it would normally attempt to treat the page as a
404, and not look at the metarefresh, but I'm not certain if that's the case,
since the metarefresh overrides the initial 200 OK for other pages in
order to create the effect of a 302. There is no standardized method of dealing
with this from a search engine perspective.
Bottom line, don't use a metarefresh on an error page. If the page
used to exist but is somewhere else, then that's a legitimate redirect, not an
error.
I recommend avoiding metarefreshes on 404 error pages if you are hoping for
404 behaviour (i.e. avoiding duplication issues).
You could use a javascript refresh/forward with no issues, however, since
search engines do not execute those.
Holy Grail: Best Practice for Capturing PR and Still Validating
In general, you want a custom error page to respond with a 404 Not Found.
However, if you have a lot of broken incoming links to non-existent pages, you
may be tempted to capture the PR for them by setting up a custom error page
that does not respond with a 404 or full fledged redirect.
The problem is that this can result in a duplication error, and will mess up
validation of your site. There is another option.
What you can do is set up a custom 404 Error page that returns a proper 404
code, then watch your error logs. If you see visits to a bad page, either create
a copy of that bad page and 301 it to your home or some other page, or use
.htaccess to 301 those specific page calls.
This way, genuine on-the-fly misspellings are sent to an error page, but
existing broken links to your site are redirected using a 301 and therefore the
PR is passed on to your site. Win both ways :)
Conclusion
It's very common for people to use redirects while attempting to deal with
error pages and broken links. Hopefully this has provided some guidance on how
to deal with this properly.
Have Fun