MySQL Blog Archive
For the latest blogs go to blogs.oracle.com/mysql
Exporting Masked and De-Identified Data from MySQL
In all likelihood your MySQL database contains valuable and sensitive information. Within that database, MySQL protects that data using features such as encryption, access controls, auditing, views, and more. However in many cases you may need to share some of this data, but must at the same time protect that sensitive information.  That is where data masking and de-identification techniques are needed.

Masking Data

Sensitive data can be any number of things. For example, your data may contain PII (personally identifiable information), PHI (Protected Health Information), PCI (Payment Card Information), financial data, intellectual property, or education related information. In many cases, handling and sharing of this sensitive data is regulated.
  • PCI DSS
  • HIPAA
  • European Union’s General Data Protection Regulation (GDPR)
  • ITAR
  • EAR
Of course there are many more regulations covering data protectsion. Plus, corporations have their own internal controls defining handling of confidential information.
And there are cases where data needs to be shared outside of the production MySQL database. Users of this shared data – analysts, developers, business partners, regulators – need some form of this data for various reasons. However you as a DBA need to protect the data and provide just enough to meet their needs . These users need partial data to perform tasks such as:
  • Data Analytics
  • Application Development
  • Testing
  • Auditing
  • Training
When sharing and exporting data from the MySQL database sensitive data needs to be cleansed of sensitive information, using techniques such as
  • Data masking
  • Data de-identification
  • Obfuscation
  • Randomization
  • Random Formatted Data

How do you perform these tasks?

What follows shows steps towards exporting masked and de-identified data from your MySQL database.

7 Steps to Mask and Export Data

  1. Install or upgrade
  2. Load the MySQL masking plugin and functions
  3. Create a select statement on the columns of interest
  4. Determine which columns are sensitive
  5. Chose an appropriate mask type for each column
  6. Add masking functions to columns with sensitive data
  7. Export the data from the select in desirable format

Now lets drill down into the “how to” for each step

Step 1 – Install/Upgrade to MySQL EE

Installation and Upgrade Instructions to MySQL EE. You can install 8.0 or 5.7
Optional but recommended (and used in this blog)
Installation and Upgrade Instructions to (use the 8.0 versions of MySQL Workbench and Shell as they support both MySQL Server 8.0 and 5.7)

Note: You could use the mysql client tool as well, however your output formats are more limited as MySQL Workbench and Shell provide you many more result format options.

Step 2 – Load the masking plugin and functions

FYI no server restarts are needed.
Installing MySQL Masking Plugin and Functions using MySQL Workbench
MySQL Enterprise Masking and De-identification – Installation

Step 3 – Create a select statement on the columns of interest

An easy way to do this is using MySQL Workbench
MySQL Workbench - Select ALL Statement makes it easy to create SQL statement
MySQL Workbench – Select ALL Statement

Step 4 – Run the Select and Determine which columns are sensitive

For example – here we see a select on the table “users” – with many columns of sensitive data shown. Salary, SSN, etc. – you would not want to share much of this information as-is.
MySQL Workbench - Select ALL Statement makes it easy to create SQL statement
MySQL Workbench – Example Showing Sensitve Data

Step 5 – Chose an appropriate mask type for each column

In this example selected columns are showing sensitive data. To protect this data I’ll show how to
  • Partially Obfuscate names
  • Obscure by planing range on salaries
  • Use standard SSN masking
  • Mask the Credit Card PAN data

Your tasks for step 5

  1. Entirely omit columns from your select statement that are unnecessary for the targeted data export
    • The fewer, the better
  2. Determine which columns are sensitive
  3. For those sensitive columns determine the appropriate masking, obfuscation or random data substitution needed to meet security requirements.
    1. Overview of MySQL Masking and De-identification
    2. Documentation on MySQL Masking and De-identification

Step 6 – Add masking functions to columns with sensitive data

Create the masked data using SELECT – for example

Here you can see the statement run in Workbench
Masking on Sensitive Columns - without Casting results in binary blog data without a type
Masking on Sensitive Columns – without Casting
In Workbench you’ll notice the masked columns are shown as BLOBs – and if you view the BLOB and select text, you will see the text. This is because MySQL UDFs do not return the type information, thus its treated as binary data type.
So next I’ve modify the masked columns selections and add CAST. Now you can see the results data as expected.
Create the masked data using SELECT with CAST added – for example

How you can see the masked results within the Workbench Columns
MySQL Workbench Showing Data Masking using enterprise masking functions in SELECT
MySQL Workbench Showing Data Masking

Step 7 – Export the data from the select in desirable format

If you using MySQL Workbench, with number of results in the thousands, you can easily export your masked data from the select statement.
MySQL Workbench - Export Masked Data export button and selecting - csv, tab, json, ...
MySQL Workbench – Export Masked Data
However, for much larger results sets its likely you will want to run the export of this masked data in batch mode. For this the MySQL Shell is best.
First create a file with your masked select statement.
Using VI editor to create a file with the SELECT SQL with masking functions
MySQL Shell first create a SQL File to use Export Masked Data
  • Save this SQL to a file.
Next – determine desired results format –

Options are Tabbed, Table, Vertical, JSON (formatted – easy to read) , JSON (raw)

  • Choosing the results format
    • –result-format options are: tabbed, table, vertical, json or json/raw
  • Choosing a protocol – Classic or X Protocol.(for 5.7 use –sqlc, 8.0 either)
    • –sqlc for classic (3306)
    • –sqlx for new X Protocol (33060)

Examples

  • Run the command – for tabbed
    • mysqlsh –result-format=tabbed –user=root –password –sqlc –file=maskeduser.sql | head -20
Shown MySQL Shell - Masked Data - Tabbed Format as output from batch execution
MySQL Shell – Masked Data – Tabbed Format
  • Run the command – for json
    • mysqlsh –result-format=json/raw –user=root –password –sqlc –file=maskeduser.sql
Batch output from MySQL Shell using the JSON forma
MySQL Shell – Masked Data – Raw JSON Format
  • Run the command – for formatted “pretty” json
    • mysqlsh –result-format=json –user=root –password –sqlc –file=maskeduser.sql
MySQL Shell - Masked Data - Pretty JSON Format
MySQL Shell – Masked Data – Pretty JSON Format
If the results are very large – then optionally compress by piping to zip.
  • Run the command – for compressing to zip
    • mysqlsh –result-format=tabbed –user=root –password –sqlc –file=maskeduser.sql | zip f.zip –
  • Check the zip file –
    • unzip -l f.zip
Often results of an export can be large - compression MySQL Results via ZIP
MySQL Results compressed via pipe to ZIP

Conclusion

This blog has shown how easy it can be to protect data using data masking and de-identification. Its highly likely you’ll need tools such as these to obtain and maintain compliance with regulations like PCI, GDPR, HIPAA and others. It just takes a few simple steps, thanks to MySQL EE Masking, MySQL Workbench, and MySQL Shell.