Introduction
One of the best “Oh crap!” moments I experienced early in my career was when a client told me that a customer had made a purchase on their staging website.
Two events combined allowed this to happen:
- A developer “temporarily” removed the VPN restriction to allow a payment gateway to be tested
- The SEO company was testing a huge amount of improvements on their staging website
So Google did what Google does. It found the staging website, then started ranking it above the production website!
And what was the best response I could cobble together for the client?
- “Good news! Your staging site SEO improvements are working great!” … (awkward silence)
This was a jaw dropping mistake, but one I still see countless digital agencies making today.
The Solution
The solution is simple, all you need to do is add a noindex tag to your staging website so google does not crawl it.
During this article we will look at noindex tags, and explore different ways of adding noindex and nofollow tags to our staging websites.
What does Noindex actually mean?
But what do these terms mean?
- noindex: Do not show this page, media, or resource in search results.
- nofollow: Do not follow the links on this page.
These definitions are taken straight from the source at Google.
So when we combine these tags together it basically tells Google to move along, there’s nothing to see here.
Security Groups and Firewalls!
As I alluded to earlier, your firewall is your first line of defense. Always restrict access so customers can not access your staging website.
Simple right? Rarely!
There are a tonne of reasons that a staging website needs to be exposed publicly. Even if it’s hidden behind a firewall today, then all it takes is for some non-technical C-level employee to demand the firewall is removed so they can show off the shiny new staging website to their friend on some dodgy hotel wifi connection.
My defense-in-depth advice:
- Always assume your staging website will become publicly accessible in future.
Using the robots meta tag
Now to add these tags! Without doubt the easiest method is to add a robots meta tag to your header.
Simply add a meta tag to the pages you would like hidden:
<!DOCTYPE html>
<html>
<head>
<meta name="robots" content="noindex,nofollow">
<!-- header content -->
</head>
<body>
<!-- website content -->
</body>
</html>
However, this is also the hardest to scale. You will need to add this tag to every page you would like hidden.
You then need to make sure this tag is never deployed onto your production website or Google will delist every page.
Using the X-Robots-Tag header
This brings us onto my favourite solution. Simply adding a single header to your staging website. The X-Robots-Tag HTTP Header will overide any meta robots tag that has previously been set.
It’s common for digital agencies in the SME market to use a three tiered development structure.
So Local -> Staging -> Production.
This means we can add a single header to our Staging environment at the Nginx or Apache level and ensure that our staging site is never exposed.
Then we no longer have to worry about accidentally committing a noindex tag into our production environment and destroying our SEO ranking.
HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
(…)
X-Robots-Tag: noindex
(…)
Apache Noindex Example
You can utilise the mod_headers module to add a header in Apache.
Let it fail gracefully by wrapping it in an IfModule
Here is an example snippet to add to the virtual host file:
<IfModule mod_headers.c>
Header set X-Robots-Tag "noindex, nofollow"
</IfModule>
If mod_headers has not been enabled you, can enable as follows:
# Enable mod_headers
sudo a2enmod headers
# Check config is valid before restarting:
sudo apachectl configtest
# Restart apache for changes to take effect:
sudo service apache2 restart
Nginx Noindex Example
Similarly, here is an example snippet to add the header using nginx:
location / {
add_header X-Robots-Tag "noindex, nofollow";
}
Just add it to the relevant location block in your nginx config file for your staging website.
Summary
While lots of CMS platforms will have their own way of configuring noindex tags, I believe the best approach is to enable it at the apache/nginx level.
Your developers are less likely to override this. Your admin staff are less likely to override this. And your SEO company can focus on adding real value to your company rather than reminding you of junior level mistakes.
And finally, always consult your SEO expert/company before making changes to noindex/nofollow! Accidentally getting this wrong in production can have devastating consequences to your search rankings.