Ad-hoc Log Queries with Amazon Athena

Overview

One of the primary benefits of AWS Control Tower is the centralization of logs across all regions of all managed accounts. This lab demonstrates using Amazon Athena to quickly query against the centralized data. Ad-hoc queries against the central log archive are a cost-effective serverless way to gain specific insights into CloudTrail and Config data.

If you are interested in more robust, near realtime searches, see the Centralized Logging with ElasticSearch lab.

First you will configure an SSO Permission Set to allow use of Amazon Athena in the Log Archive account. You will authorize members of the AWSAuditLogReadOnlyViewers group by assigning the new permission set.

After this initial configuration, you will use the Athena query editor to create an external table that can be used to query CloudTrail information for a specific AWS Account.

The elective guardrail Disallow Changes to Bucket Policy for Amazon S3 Buckets effectively blocks adding cross-account access to the CT Logs buckets. AWS Control Tower enables this guardrail by default, but it can be disabled if it’s desired to use Athena from a different account. Disabling the guardrail would enable a cross-account bucket policy to be added to the logs bucket.

Prerequisites

You must have access to the Control Tower Management account with authority to modify AWS SSO configurations

Steps

1. Configure Permission Set in AWS Single Sign-On

  1. Login to the Management Account with a user authorized to administer SSO.
  2. Open the AWS SSO service via the Services search box
  3. Click AccountsCreate Permission Set
  4. Create a custom permission setnext
  5. Fill in name of AdHocAuditor, select both Attach AWS managed policies and Create a custom permissions policy
  6. Attach AWS Managed Policy AmazonAthenaFullAccess
  7. Under Permissions Policy, cut and paste the below

    {
    "Version": "2012-10-17",
    "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject"
       ],
      "Resource": "arn:aws:s3:::aws-controltower-logs-*/*"
    }
    ]
    }
    
  8. Skip Tags and click Create

2. Apply new Permission Set to AWSLogArchiveViewers group

  1. in the AWS SSO homepage click AWS accounts and drill into Log Archive account by clicking the name (not the selection button)
  2. Click Change permission sets to the right of AWSLogArchiveViewers change permission set
  3. Select AdHocAuditor (created in the previous step) and click the Save Changes button
  4. You should return to the Log archive Account Details screen show in step 1 above. Now drill into AWSLogArchiveViewers to see the details of the group.
  5. Under Group members click Add users and select the user you want to use to continue

If you would instead prefer to create a new user, you can follow the walkthrough in the AWS Single Sign-On user guide

3. Configure S3 storage for Amazon Athena query results.

Start this step and the next by accessing the Audit Logs account as the same user you used in Step 2 above

  1. In the Search for services… bar at the screen top, enter “S3” to bring up the Amazon S3 service console.
  2. Click Create Bucket
  3. Enter Bucket name of aws-athena-query-results-{accountNumber} in the general configuration section, where {accountNumber} is the twelve-digit number associated with your Audit Log account, found at the top of the screen. Using the account number as a suffix should satisfy the requirement for a globally unique S3 bucket name.
  4. Leave the rest of the fields unchanged, scroll to the bottom, and click Create Bucket

The IAM policy AmazonAthenaFullAccess allows for creation of S3 buckets using this specific naming convention. If you deviate from the above prefix (aws-athena-query-results) you will get an AccessDenied error.

Upon bucket creation, you will likely see errors related to the failure of operation s3:PutBucketPublicAccessBlock. This is due to the elective guardrail Disallow Changes to Bucket Policy for Amazon S3 Buckets described above. The bucket is created in a default secure configuration

4. Find path to log data

Amazon S3 buckets can be thought of and navigated similarly to traditional file-based tree storage. Next you will descend through the structure to choose a specific folder to query with Athena.

  1. Still in the Amazon S3 service page, click Buckets on the left bar to return to the list of buckets in the account
  2. Find the bucket starting with aws-controltower-logs- and click the name to open the Objects view
  3. The first level of storage will be the AWS Organization ID, in the format of o-xxxxxxxxxx/. Click it to recurse.
  4. Click AWSLogs/ to continue the drilldown.
  5. The next folder should have multiple objects, each representing a Control Tower managed account. Choose one to work with, and drill in.
  6. The final level should have three folders for the different types of logs. Click CloudTrail/ as we will be using these in the lab.
  7. Finally click the Copy S3 URI button towards the top of the page. You should see a green message indicating the S3 URI copied. s3 uri copied
  8. Paste the S3 URI into a text file for the next step. It should look similar to the below, with s3:// prefix and the trailing slash.

    s3://aws-controltower-logs-123423453456-us-east-1/o-s45wwvdgeb/AWSLogs/012323453456/CloudTrail/
    

5. Use Amazon Athena to query the S3 Log Data

Amazon Athena provides a serverless SQL query interface against the log file data collected by Control Tower.

  1. In the Search for services… bar at the screen top, enter “Athena” to bring up the Amazon Athena home page. Click the Get Started button.

    If the Athena query editor is shown instead of the home page, you may have already configured Athena. Access Settings menu in the top right to change the bucket if desired and proceed to step 3.

  2. The Query Editor will prompt you to set up a query result location in Amazon S3. Click there.

set query result location 3. Click the folder icon to the left of the Query result location 4. Find the S3 bucket you created and click the small arrow to the right of it to select 5. Click Select and Save to continue. 6. In the query editor, copy and paste the below SQL to create a table to use for the query.

CREATE EXTERNAL TABLE ct_lab_logs (
         eventversion STRING,
         userIdentity STRUCT< type:STRING,
         principalid:STRING,
         arn:STRING,
         accountid:STRING,
         invokedby:STRING,
         accesskeyid:STRING,
         userName:STRING,
         sessioncontext:STRUCT< attributes:STRUCT< mfaauthenticated:STRING,
         creationdate:STRING>,
         sessionIssuer:STRUCT< type:STRING,
         principalId:STRING,
         arn:STRING,
         accountId:STRING,
         userName:STRING>>>,
         eventTime STRING,
         eventSource STRING,
         eventName STRING,
         awsRegion STRING,
         sourceIpAddress STRING,
         userAgent STRING,
         errorCode STRING,
         errorMessage STRING,
         requestParameters STRING,
         responseElements STRING,
         additionalEventData STRING,
         requestId STRING,
         eventId STRING,
         resources ARRAY<STRUCT< ARN:STRING,
         accountId:STRING,
         type:STRING>>,
         eventType STRING,
         apiVersion STRING,
         readOnly STRING,
         recipientAccountId STRING,
         serviceEventDetails STRING,
         sharedEventID STRING,
         vpcEndpointId STRING
)
PARTITIONED BY (region string, timestamp string)
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde' STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '%%YOUR_S3_URI%%'
TBLPROPERTIES (
  'projection.enabled'='true',
  'projection.timestamp.format'='yyyy/MM/dd',
  'projection.timestamp.interval'='1',
  'projection.timestamp.interval.unit'='DAYS',
  'projection.timestamp.range'='2021/01/01,NOW',
  'projection.timestamp.type'='date',
  'projection.region.type'='enum',
  'projection.region.values'='us-east-1,us-east-2,us-west-1,us-west-2',
  'storage.location.template'='%%YOUR_S3_URI%%${region}/${timestamp}'
 );
  1. Replace both sections labeled %%YOUR_S3_URI%% with the URI you saved in the previous step. Make sure to include the trailing slash. Your storage.location.template should look similar to this s3://aws-controltower-logs-123423453456-us-east-1/o-s45wwvdgeb/AWSLogs/012323453456/CloudTrail/${region}/${timestamp}

The line above labeled projection.region.values restricts the query to US regions only; you might need to modify this if your accounts use different regions. Also note the projection.timestamp.range restricts the dates. This syntax enables the Partition Projection feature of Athena to avoid the requirement to manually load partitions.

  1. Paste the modified SQL in the textbox under New query 1 and click Run query. The Results section should indicate Query successful after a few seconds.

Results

Now you can execute queries against the table, for example try below:

SELECT
eventsource,
eventname,
useridentity.sessioncontext.attributes.creationdate,
useridentity.sessioncontext.sessionissuer.arn
FROM ct_lab_logs
WHERE useridentity.sessioncontext.sessionissuer.arn IS NOT NULL
ORDER BY eventsource, eventname
LIMIT 10

To learn more about using Amazon Athena to query CloudTrail logs, see the relevant section in the Athena user guide