Enabling SSE-C

AWS supports the SSE with the customer-provided encryption keys (SSE-C), which allows you to set your own encryption keys.

Enabling SSE-C in Hadoop and Spark Clusters

To enable SSE-C, perform these steps:

  1. Navigate to the Clusters page, click Edit to edit an existing cluster or click New to create a new cluster.
  2. In the cluster’s Advanced Configuration tab, under Override Hadoop Configuration Variables, add fs.s3a.server-side-encryption-algorithm=SSE-C.

Note

When SSE-C is enabled in QDS, any command running with these settings may not be able to fetch the result data. As such, these settings must only be used when results are irrelevant (for example, populating data into a directory in S3 using a Spark or a Hive job).

The same syntax is applicable to Hive commands, which is set per command and in the same command session as the command.

For example,

CREATE EXTERNAL TABLE New2 (`Col0` STRING, `Col1` STRING, `Col2` STRING) PARTITIONED BY (`20100102` STRING,`IN` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION 's3://ap-dev-qubole/common/hive/30day_1/30daysmall'; set fs.s3n.sse=SSE-C;

Enabling the Encryption Key

Set the following properties to use the SSE on the S3a filesystem:

  1. fs.s3a.server-side-encryption-algorithm=SSE-C
  2. fs.s3a.server-side-encryption.key=<key>: It is the encryption key to use for encrypting the data. For the SSE-C algorithm, the value of this property must be the Base64 encoded key.

Enabling SSE-C while using Hadoop DistCp

While using Hadoop DistCp, these parameters can be set for server-side encryption along with the other parameters:

  • s3ServerSideEncryption: It enables encryption of data at the object level as S3 writes it to disk.
  • s3SSEAlgorithm: It is the algorithm used for encryption. Specify SSE-C as its value. If you do not specify it but s3ServerSideEncryption is enabled, then AES256 algorithm is used by default.
  • encryptionKey: It is the key used to encrypt the data. In case of SSE-C, you must specify it to avoid the job failure.

Enabling SSE-C in Presto

The SSE-C type of encryption is only supported in Presto 0.157 version and it is not supported in Presto 0.180 or later versions.

Note

When SSE-C is enabled in QDS, any command running with these settings may not be able to fetch the result data. As such, these settings must only be used when results are irrelevant (for example, populating data into a directory in S3 using a Spark or a Hive job).

Perform these steps to enable SSE-C in Presto:

  1. As a Presto catalog/hive.properties setting, set hive.s3.sse.enabled=true.
  2. You must set the type of encryption to C by using one of the following command:
    • Set hive.s3.encryption-materials-provider=<custom encryption provider> (if the Presto version is 0.157).

For more information, see catalog/hive.properties.

Note

The results of the select calls with the limit clause are not encrypted as the limit clause would result in bypassing of the map/reduce flow.

Results of select calls without the limit clause are encrypted. Basically, a standard Hadoop map/reduce output is encrypted. A Presto output, which does not use map/reduce is not encrypted.