Enabling SSE-KMS

Amazon S3-KMS Managed encryption keys (SSE-KMS) is one of the types of the server-side encryption that AWS supports.

For details on the client-side KMS encryption, see Enabling the Client-side Encryption (AWS).

Enabling SSE in the QDS Control Plane

The QDS Control Plane denotes all the components except the clusters. Understanding the Qubole Folders in the Default Location on S3 (AWS) provides the list of folders in the account’s default location into which QDS has access to write data.

Currently, QDS allows you to enable the SSE-KMS only through a REST API call as described in Enable SSE on the QDS Control Plane.

Note

When SSE-KMS is enabled in QDS, any command running with these settings may not be able to fetch the result data. As such, these settings must only be used when results are irrelevant (for example, populating data into a directory in S3 using a Spark or a Hive job).

Enabling SSE in the Hadoop and Spark Clusters

To enable SSE-KMS, perform these steps:

  1. Navigate to the Clusters page, click Edit to edit an existing cluster or click New to create a new cluster.
  2. In the cluster’s Advanced Configuration tab, under Override Hadoop Configuration Variables, add fs.s3a.server-side-encryption-algorithm=SSE-KMS.

Note

When SSE-KMS is enabled in QDS, any command running with these settings may not be able to fetch the result data. As such, these settings must only be used when results are irrelevant (for example, populating data into a directory in S3 using a Spark or a Hive job).

The same syntax is applicable on Hive commands, which is set per command and in the same command session as the command.

For example,

CREATE EXTERNAL TABLE New2 (`Col0` STRING, `Col1` STRING, `Col2` STRING) PARTITIONED BY (`20100102` STRING,`IN` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 's3://ap-dev-qubole/common/hive/30day_1/30daysmall';fs.s3a.server-side-encryption-algorithm=SSE-KMS;

Enabling the Encryption Key

Set the following properties to use the SSE-KMS on the S3a filesystem:

  1. fs.s3a.server-side-encryption-algorithm=SSE-KMS.
  2. fs.s3a.server-side-encryption.key=<key>: It is the encryption key to be used for encrypting the data. If you leave this property empty, the default S3 KMS key is used. Set this property to the specific KMS key ID if you do not want the default S3 KMS key.

Enabling SSE-KMS while using Hadoop DistCp

While using Hadoop DistCp, these parameters can be set for server-side encryption along with the other parameters:

  • s3ServerSideEncryption: It enables encryption of data at the object level as S3 writes it to disk.
  • s3SSEAlgorithm: It is the algorithm used for encryption. Specify SSE-KMS as its value. If you do not specify it but s3ServerSideEncryption is enabled, then AES256 algorithm is used by default.
  • encryptionKey: It is the key used to encrypt the data. If the algorithm is SSE-KMS, the key is not mandatory as AWS KMS would be used.

Enabling SSE-KMS in the Presto Cluster

Note

When SSE-KMS is enabled in QDS, any command running with these settings may not be able to fetch the result data. As such, these settings must only be used when results are irrelevant (for example, populating data into a directory in S3 using a Spark or a Hive job).

Perform these steps to enable SSE-KMS in Presto:

  1. As a Presto catalog/hive.properties setting, set hive.s3.sse.enabled=true.
  2. You must set the type of encryption to KMS as mentioned here:
    • Set hive.s3.sse.type=kms if the Presto version is 0.180 or 0.193.
  3. Set the KMS key by using the hive.s3.sse.kms-key-id property. For example, set hive.s3.sse.kms-key-id=<KMS Key ID>. This step is optional. If you do not set the KMS key, then the default key is used.

For more information, see catalog/hive.properties.

Note

The results of the select calls with the limit clause are not encrypted as the limit clause would result in bypassing of the map/reduce flow.

Results of select calls without the limit clause are encrypted. Basically, a standard Hadoop map/reduce output is encrypted. A Presto output, which does not use map/reduce is not encrypted.