Building DevSecOps solutions using AWS, Terraform and Kubernetes

How to hide a Staging Website From Google

  • 12th September 2022

Introduction

One of the best “Oh crap!” moments I experienced early in my career was when a client told me that a customer had made a purchase on their staging website.

Two events combined allowed this to happen:

  • A developer “temporarily” removed the VPN restriction to allow a payment gateway to be tested
  • The SEO company was testing a huge amount of improvements on their staging website

So Google did what Google does. It found the staging website, then started ranking it above the production website!

And what was the best response I could cobble together for the client?

  • “Good news! Your staging site SEO improvements are working great!” … (awkward silence)

This was a jaw dropping mistake, but one I still see countless digital agencies making today.

The Solution

The solution is simple, all you need to do is add a noindex tag to your staging website so google does not crawl it.

During this article we will look at noindex tags, and explore different ways of adding noindex and nofollow tags to our staging websites.

What does Noindex actually mean?

But what do these terms mean?

  • noindex: Do not show this page, media, or resource in search results.
  • nofollow: Do not follow the links on this page.

These definitions are taken straight from the source at Google.

So when we combine these tags together it basically tells Google to move along, there’s nothing to see here.

Security Groups and Firewalls!

As I alluded to earlier, your firewall is your first line of defense. Always restrict access so customers can not access your staging website.

Simple right? Rarely!

There are a tonne of reasons that a staging website needs to be exposed publicly. Even if it’s hidden behind a firewall today, then all it takes is for some non-technical C-level employee to demand the firewall is removed so they can show off the shiny new staging website to their friend on some dodgy hotel wifi connection.

My defense-in-depth advice:

  • Always assume your staging website will become publicly accessible in future.

Using the robots meta tag

Now to add these tags! Without doubt the easiest method is to add a robots meta tag to your header.

Simply add a meta tag to the pages you would like hidden:

<!DOCTYPE html>
<html>
    <head>
        <meta name="robots" content="noindex,nofollow">
        <!-- header content -->
    </head>
    <body>
        <!-- website content -->
    </body>
</html>

However, this is also the hardest to scale. You will need to add this tag to every page you would like hidden.

You then need to make sure this tag is never deployed onto your production website or Google will delist every page.

Using the X-Robots-Tag header

This brings us onto my favourite solution. Simply adding a single header to your staging website. The X-Robots-Tag HTTP Header will overide any meta robots tag that has previously been set.

It’s common for digital agencies in the SME market to use a three tiered development structure.

So Local -> Staging -> Production.

Local Staging Production

This means we can add a single header to our Staging environment at the Nginx or Apache level and ensure that our staging site is never exposed.

Then we no longer have to worry about accidentally committing a noindex tag into our production environment and destroying our SEO ranking.

HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
(…)
X-Robots-Tag: noindex
(…)
Apache Noindex Example

You can utilise the mod_headers module to add a header in Apache.

Let it fail gracefully by wrapping it in an IfModule

Here is an example snippet to add to the virtual host file:

<IfModule mod_headers.c>
  Header set X-Robots-Tag "noindex, nofollow"
</IfModule>

If mod_headers has not been enabled you, can enable as follows:

# Enable mod_headers
sudo a2enmod headers

# Check config is valid before restarting:
sudo apachectl configtest 

# Restart apache for changes to take effect:
sudo service apache2 restart
Nginx Noindex Example

Similarly, here is an example snippet to add the header using nginx:

location / {
  add_header X-Robots-Tag "noindex, nofollow";
}

Just add it to the relevant location block in your nginx config file for your staging website.

Summary

While lots of CMS platforms will have their own way of configuring noindex tags, I believe the best approach is to enable it at the apache/nginx level.

Your developers are less likely to override this. Your admin staff are less likely to override this. And your SEO company can focus on adding real value to your company rather than reminding you of junior level mistakes.

And finally, always consult your SEO expert/company before making changes to noindex/nofollow! Accidentally getting this wrong in production can have devastating consequences to your search rankings.

Rhuaridh

Please get in touch through my socials if you would like to ask any questions - I am always happy to speak tech!