AWS Data Exchange makes it easy to find, subscribe to, and use third-party data in the cloud. It simplifies access to data and provides you a secure and easy-to-use way of consuming third-party data products. With datasets from AWS Data Exchange, you can make more data-driven decisions. AWS Data Exchange has hundreds of data products from multiple data providers. After creating a subscription, you can export data into you Amazon S3 bucket and then analyze it with services such as Amazon Athena, Amazon Redshift, build machine learning models using Amazon Sagemaker, transform and process data with Amazon EMR and AWS Glue, or add it to a data lake with AWS Lake Formation.
Watch the following video to learn about AWS Data Exchange.
Here are important AWS Data Exchange terminologies:
An Asset is data object (a single file). Each asset is uniquely identifiable via an ID. One or more assets are grouped together to create a revision.
A Revision is a point-in-time view/update of data set. Each revision is uniquely identifiable via an ID. One or more revisions are grouped together in form of a dataset.
A Data Set is a logical grouping of data which you subscribe to. Each dataset is uniquely identifiable via an ID
This is how you use AWS Data Exchange as a subscriber:
In today’s workshop, you will subscribe to a sample dataset (500 Image & Metadata free sample) from shutterstock which contains a single revision. As you will see, the dataset contains camera footage of workplace which allows pets and also some images from whole foods, a grocery store chain.
Step 1: Subscribe to AWS Data Exchange product
Congratulations, you have just completed step 1 from the following architecture diagram.