Introduction
When you are working in a heavily regulated environment then data protection is important. GDPR brought a lot of attention to people’s personal data, and the ICO have powers to fine companies who fall foul of the legislation in the UK.
During this article we will explore a new tool from AWS to help identity and prevent data leakage.
SNS message data protection is a new feature that just launched in preview in September 2022. This allows us to audit and block message flowing through SNS if they trigger one of the Data Identifiers we select.
We will look at implementing this in SNS to identify cases where full names or IP addresses are leaked.
Prerequisites
This articles assumes you have already followed my last article, and have configured this demo infrastructure:
The Lambda in that article gives us as an easy way to publish messages to test our data protection policy.
Creating a Data Protection Policy
Now for the fun bit! Creating a data protection policy.
Our data protection policy will intercept and audit the messages we publish before it arrives in our queue.
Next, let’s look at the component that make up a policy.
Audit or Block
There are two operations you can add to the policy:
- Deny – Blocks the Amazon SNS publish request or fails the message delivery.
- Audit – Emits metrics and finding logs without interrupting message publishing or delivery.
To start with we will explore the Audit setting. This is a lower risk way to ensure we will not overreach and accidentally block valid data from flowing through our service.
Sample Rate
The data protection policy lets you set a sample rate. If you have a low amount of data, you might want this to be close to 99%. However, if you publish millions of SNS message then perhaps a lower sample rate of 5% could be enough to catch potential breaches.
The sample rate can never be 100%, so this was never designed to be a catch-all solution to your data protection needs. Think of it as an extra safe guard working to help keep you from accidentally leaking personal data.
SNS Data Protection Pricing
It’s worth noting that the sample rate will also impact pricing. For full pricing, please view the SNS Pricing page.
At the time of writing this the eu-west-1 region had the pricing at:
- Publish and delivery message scanning is $0.08 per GB of payload data
- Audit reporting is $0.21 per GB of audit reporting data generated
Note: Each message that is scanned will be billed for a minimum of 1KB of message scanning.
Data Identifiers
The data protection policy expects a list of data identifiers to scan for. This could be Credentials, Device identifiers, Financial information, Health information or Personal information.
The full list of data identifiers can be found here.
However, we will focus on these two types:
- arn:aws:dataprotection::aws:data-identifier/IpAddress
- arn:aws:dataprotection::aws:data-identifier/Name
We want to check that we are not accidentally exposing our users names or IP addresses.
Amazon SNS can detect full names only. So for example, it could flag for “John Smith” but not “John”.
Amazon SNS uses a combination of criteria and techniques, including machine learning and pattern matching, to detect sensitive data. Running your own tests with mock data is important to understand the type of scenarios it will flag for.
Adding the Policy
Before we add our policy, we need to create an S3 bucket to store them in.
I have called my bucket demo-dataprotection-example, but you will need to pick your own name as these are globally unique.
Next, goto the AWS console and find your SNS topic, then click on Data Protection.

As you can see I have already added one to the topic. To add your own, select Edit and select the Data Protection section.
You can now either use the Basic builder, or just the Advanced option to add your own policy.

Here is the policy I created earlier:
{
"Name": "__default_data_protection_policy",
"Description": "Default data protection policy",
"Version": "2021-06-01",
"Statement": [
{
"Sid": "__audit_statement_19f47753",
"DataDirection": "Inbound",
"Principal": [
"*"
],
"DataIdentifier": [
"arn:aws:dataprotection::aws:data-identifier/IpAddress",
"arn:aws:dataprotection::aws:data-identifier/Name"
],
"Operation": {
"Audit": {
"SampleRate": "99",
"FindingsDestination": {
"S3": {
"Bucket": "demo-dataprotection-example"
}
}
}
}
}
]
}
IMPORTANT: replace demo-dataprotection-example with the name of your own bucket.
This policy will add our two data identifier checks, sample 99% of the data and export our findings to our bucket.
You can also pick from cloudwatch logs, S3 or kinesis data firehose as your findings destination. However, for the sake of simplicity we will stick with S3 for the purposes of this article.
Publish Test Message
Now to publish a test message with some juicy PII!
First, make sure you followed the previous article. We will be using the lambda we already configured to publish test messages to SNS.
Just log into the console and test the lambda using this input event:
{
"SnsMessage": "Mocked PII: John Smith. 8.8.8.8"
}
As you can see we are providing a full name, and an IP address. These fields will trigger our data protection policy we configured above.
You can easily tweak this message input and try different scenarios.
Audit Findings
Now if you trigger the lambda a few times and wait patiently for a few minutes, you will see that our lambda has triggered some PII audit results.
These can be found in the S3 bucket you created above for your audit results.
Here is an example output:
{
"messageId": "284ae24e-74d2-587b-bff2-b9ec1f846cae",
"auditTimestamp": "2022-09-19T13:05:26Z",
"callerPrincipal": "arn:aws:iam::YOUR_AWS_ACCOUNT_ID:role/demo-lambda-role",
"resourceArn": "arn:aws:sns:eu-west-1:YOUR_AWS_ACCOUNT_ID:user-updates-topic",
"dataIdentifiers": [
{
"name": "IpAddress",
"count": 1,
"detections": [
{
"start": 12,
"end": 24
}
]
},
{
"name": "Name",
"count": 1,
"detections": [
{
"start": 0,
"end": 9
}
]
}
]
}
Next Steps
Once you’re comfortable using this, you could also flip this over to a Deny policy so AWS will automatically block this message from being sent out which can further reduce the blast radius of an accidental personal data leak.
Or perhaps look at using Cloudwatch as the output instead of S3, then configure alerting so your Security team can get a head start on potential data leaks before they impact thousands of customers.
Summary
And that’s it! In just a few minutes we already have a way to audit for potential data breaches in SNS. What is even better is that if you’re already using SNS then this can be a really low effort feature to implement.
As with anything this is not a silver bullet, it won’t work for 100% of cases, but it’s definitely a valuable tool to add to your collection.