All the way back with SQL Server 2016 Microsoft released the Dynamic Data Masking feature in the database engine. It seemed like a huge step forward and promised so much, but there were severe limitations around the way that we could control who sees what masked data. It was a case of you either got to see masked data wherever it was configured, or you saw clear data, there was no granularity. I wrote about this and a few other things to do with Dynamic Data Masking all the way back in August of 2016 when I was at SentryOne. You can check that post out here. Also, back then I created several Connect items (blast from the past there), one of which was pulled over to the user voice replacement where I was asking for the UNMASK securable to be made more granular, you can check that out here.
So, why I am I writing this post? Well, it seems that our (my?) request has been granted. At least in Azure SQL Database. On March the 17th this year a little announcement slipped out stating “General availability: Dynamic data masking granular permissions for Azure SQL and Azure Synapse Analytics“. So, has this delivered on what we wanted, to really help this feature live up to its promise?
Prepare The Environment
Lets start by using the AdventureWorks sample database and configure some basic masking rules. I’m going to stick with T-SQL here even though we could use PowerShell in Azure SQL Database. Mainly because I am of the opinion that all of these schema elements are controlled by development and should be in source control as part of the database. We will also setup some database users without logins so that we can use EXECUTE AS to switch context rather than multiple connections etc.
In the AdventureWorks database we will create four users which we will be using for the rest of this post. This will allow us to look at the differing options in isolation.
CREATE USER MaskedUser1 WITHOUT LOGIN; CREATE USER MaskedUser2 WITHOUT LOGIN; CREATE USER MaskedUser3 WITHOUT LOGIN; CREATE USER MaskedUser4 WITHOUT LOGIN; GO
Now we need to look at defining the masks on the columns within the database which we will use to compare and contrast the capabilities. First of all we need to figure out the data we want to mask and how. We will base our work off the following query which presents several sensitive data elements over multiple tables.
CREATE VIEW dbo.CustomerContactList AS SELECT p.Title, p.FirstName, p.MiddleName, p.LastName, p.Suffix, ea.EmailAddress, pp.PhoneNumber, ppt.Name AS PhoneNumberType, c.CustomerID, c.AccountNumber FROM Person.Person AS p JOIN Person.EmailAddress AS ea ON ea.BusinessEntityID = p.BusinessEntityID JOIN Person.PersonPhone AS pp ON pp.BusinessEntityID = p.BusinessEntityID JOIN Person.PhoneNumberType AS ppt ON ppt.PhoneNumberTypeID = pp.PhoneNumberTypeID JOIN sales.Customer AS c ON c.PersonID = p.BusinessEntityID ; GO GRANT SELECT ON OBJECT::dbo.CustomerContactList TO MaskedUser1; GRANT SELECT ON OBJECT::dbo.CustomerContactList TO MaskedUser2; GRANT SELECT ON OBJECT::dbo.CustomerContactList TO MaskedUser3; GRANT SELECT ON OBJECT::dbo.CustomerContactList TO MaskedUser4; GO
Because we are good developers we will create a view based on this query and then grant our users access to that rather than the underlying tables. This will also make the next batch of queries a little easier to understand as we test things out.
Anyway, back to the masks.
Key things that jump out and us here are Email Address, Phone Number, Customer ID, and Account Number. Names are an interesting topic, for the purposes of this we will look to mask the first and middle name, working on the premise that this could be used by a call centre to contact their customers.
ALTER TABLE Person.EmailAddress ALTER COLUMN EmailAddress ADD MASKED WITH (FUNCTION ='email()') ; GO ALTER TABLE Person.PersonPhone ALTER COLUMN PhoneNumber ADD MASKED WITH (FUNCTION = 'partial(0,"xxx-",4)') ; GO ALTER TABLE Person.Person ALTER COLUMN FirstName ADD MASKED WITH (FUNCTION = 'default()') ; GO ALTER TABLE Person.Person ALTER COLUMN MiddleName ADD MASKED WITH (FUNCTION = 'default()') ; GO
Here we have defined our masks using the built-in functions in Dynamic Data masking, nice and easy. Now onto the Account Number…
In AdventureWorks this is a computed column, this poses some issues for us as we cannot apply a mask to the computed column (I feel some feedback coming on). So, we need to look at the underlying column which is used, in this case CustomerID. Ideally, we would like to expose the last three characters of the Account Number, which would mean we want to use the partial() function. But, this is not allowed on the INT datatype, so we are going to go with default. This means we will get the default mask for the data type in CustomerID and also have this pulled into the Account Number computed column.
DROP INDEX AK_Customer_AccountNumber ON Sales.Customer ; GO ALTER TABLE Sales.Customer DROP COLUMN AccountNumber ; GO ALTER TABLE Sales.Customer ALTER COLUMN CustomerID ADD MASKED WITH (FUNCTION = 'default()') ; GO ALTER TABLE Sales.Customer ADD AccountNumber AS (isnull('AW'+[dbo].[ufnLeadingZeros]([CustomerID]),'')) ; GO CREATE UNIQUE NONCLUSTERED INDEX [AK_Customer_AccountNumber] ON [Sales].[Customer] ([AccountNumber] ASC) ; GO
So, now we have our important columns masked which we can see when we query our view. Remember, we now need to be executing our query in the context of the masked users because as admin will will see clear data.
EXECUTE AS USER = 'MaskedUser1'; SELECT Title, FirstName, MiddleName, LastName, Suffix, EmailAddress, PhoneNumber, PhoneNumberType, CustomerID, AccountNumber FROM dbo.CustomerContactList ; REVERT;
Ideally, we would like to have had the last three characters of the CustomerID/Account Number visible. The reason being that an agent would then be able to help check with a customer that they were speaking to the right person. I’ll look at some options for how we might do this in a future blog post.
Anyway, we are all set and ready to go for seeing what these new Dynamic Data Masking capabilities will let us do.
So, now we have our four users who need to see varying amounts of the data in the system. Let’s set the rules out here before we go any further.
- Masked User 1 is only permitted to see masked data where it is configured.
- Masked User 2 is allowed to see clear data for CustomerID and Account Number, all other masking must remain in place.
- Masked User 3 is permitted to see First Name and Phone Number but not other masked columns.
- Masked User 4 can see all data in the clear for objects they have access to.
What We Had
Prior to this release of new capability, we would only be able to service the ruleset for Masked Users 1 & 4. We would not be able to use Dynamic Data Masking to service the rules defined for Masked Users 2 & 3. The reason being is that the UNMASK permission was all or nothing for the user in the database. So we could only use the following permissions configuration.
GRANT UNMASK TO MaskedUser4; GO
This would then apply to all masks defined in the scope of the database. This would rely on ensuring that very granular permissions are in place to limit what the user has access to and make sure that we don’t grant rights which are too permissive. Not a great situation, and somewhat limiting to really make use of what could be a great feature and capability.
What We Have Now
So, what have we got now? From what I can see the documentation has not caught up with the announcement and capabilities in the platform. This is always going to be the case with something as fast moving as Azure SQL Database. Assuming that it works the same way as granting permissions in SQL Server we should be able to figure it out.
Let’s start by checking what the new capability it, as-per the announcement “grant and deny UNMASK permission at the schema-level, the table-level, and the column-level”.
Let’s start with the scenario for Masked User 2 where they can only see clear data for the CustomerID and Account Number. In this situation we will look to grant UMASK on the column CustomerID and this will open up both for us. Based on the syntax for granting object level permissions we just need to specify the column of the underlying table to give the UNMASK rights to the user.
GRANT UNMASK ON OBJECT::Sales.Customer(CustomerID) TO MaskedUser2; GO
Once we have done this we can then call the view as MaskedUser2 and let’s see what we get.
EXECUTE AS USER = 'MaskedUser2'; SELECT Title, FirstName, MiddleName, LastName, Suffix, EmailAddress, PhoneNumber, PhoneNumberType, CustomerID, AccountNumber FROM dbo.CustomerContactList ; REVERT;
This is a massive leap forward for us in making Dynamic Data Masking usable for us now. Based on this we can then look to grant UNMASK to MaskedUser3 to implement the rules for data visibility there.
GRANT UNMASK ON OBJECT::Person.Person(FirstName) TO MaskedUser3; GO GRANT UNMASK ON OBJECT::Person.PersonPhone(PhoneNumber) TO MaskedUser3; GO
Now we can test our view again as MaskedUser3 and let’s see what we get.
EXECUTE AS USER = 'MaskedUser3'; SELECT Title, FirstName, MiddleName, LastName, Suffix, EmailAddress, PhoneNumber, PhoneNumberType, CustomerID, AccountNumber FROM dbo.CustomerContactList ; REVERT;
Again, we have a subset of the masked columns un-masked for our user so that they can only see the data they are permitted to see.
Now, for the final ruleset it gets a little more complex. We could grant UNMASK on the tables underpinning the view however, the rule states that its only unmasked data in the objects they have access to. Permissions are granted at the view level not the underlying table and there could be additional columns with sensitive data the user is not permitted to see. Personally, I would advocate for the granting of UNMASK for each column which is in the view, this way if something goes awry in the access to objects the other sensitive data is still going to be protected. For the purposes of this demo we will use a bit of both just to test-drive the new feature capability.
First we need to revoke the general UNMASK permission, then implement the more granular configuration.
REVOKE UNMASK TO MaskedUser4; GO GRANT UNMASK ON OBJECT::Person.Person(FirstName) TO MaskedUser4; GO GRANT UNMASK ON OBJECT::Person.Person(MiddleName) TO MaskedUser4; GO GRANT UNMASK ON OBJECT::Sales.Customer(CustomerID) TO MaskedUser4; GO GRANT UNMASK ON OBJECT::Person.PersonPhone TO MaskedUser4; GO GRANT UNMASK ON OBJECT::Person.EmailAddress TO MaskedUser4; GO
Now we can query the view and see what we get.
EXECUTE AS USER = 'MaskedUser4'; SELECT Title, FirstName, MiddleName, LastName, Suffix, EmailAddress, PhoneNumber, PhoneNumberType, CustomerID, AccountNumber FROM dbo.CustomerContactList ; REVERT;
As we can see below the use has access to clear-data via the view.
So, this has shown a great step forward in capability for the Dynamic Data Masking feature by making the permissions structure more granular. This is now something that has to be considered a feature well worth using to help protect data in our database systems. By building this into the engine it means that a lot of the work that we would have had to do coding around the gaps in capability are gone and any reduction in complexity is a massive bonus.
Is it perfect? No. The issue we encountered around the implementation of Dynamic Data Masking on computed columns is not great. It is outside of the scope of this but frustrating none the less. Additionally, this currently only appears to be available in the Azure PaaS based versions of SQL Server (Azure SQL DB, Managed Instance, Synapse). Will this be coming to on-premises? I sincerely hope so as this would have a significant impact to the way we manage access to systems we run.
In short, I have to say a massive thanks to the Azure SQL Database team for doing this work and I’m going to keep my fingers crossed to see this brought down into current retail SQL Server versions.
However, while these improvements are a big step forward we have to remember that this is only one piece of the puzzle. This does not, and should not, replace good design and security best practices such as applying the principle of least privilege and additional protections at each point of the data platform lifecycle. Additionally, Dynamic Data Masking is still susceptible to inference attacks if users have direct access to the database to perform ad-hoc queries. There is a really good post here by Ronen Ariely (B | T) on this topic which highlights some of the pitfalls.