Qlik Integration with Amazon Redshift

Introduction

This is a hands-on guide to running Qlik Sense in the cloud with Amazon Redshift with Control Tower setup. To provide you with some context, we provide some background information on Amazon Web Services, Control Tower, Redshift, and Qlik. The main part is a step-by-step guide to getting you started.

About Amazon Web Services

Amazon Web Services is a collection of web services that collectively make up a cloud computing platform. Compared to buying and building a physical server farm, the three key benefits of Amazon’s cloud platform are:

  • Ease of use – a platform can be constructed in hours, unlike a physical server which may take days

  • Flexibility – capacity can be grown or shrunk on demand

  • Cost matching – the cost of a platform can be easily matched to the benefits gained.

Under the AWS banner, Amazon offers a number of webs services, including:

  • DynamoDB – NoSQL database

  • EC2 – cloud-based servers running software

  • RDS – relational database service

  • Redshift – data warehouse as a service

  • S3 – scalable cloud storage

  • EMR – elastic map reduce(Hadoop as Service)

About Amazon Redshift

AWS Control Tower offers the easiest way to set up and govern a new, secure, multi-account AWS environment. It establishes a landing zone that is based on best-practices blueprints, and enables governance using guardrails you can choose from a pre-packaged list. The landing zone is a well-architected, multi-account baseline that follows AWS best practices. Guardrails implement governance rules for security, compliance, and operations.

Benefits:

  • Quickly setup and configure a new AWS environment

  • Automate ongoing policy management

  • View policy-level summaries of your AWS environment

About Amazon Redshift

From the Amazon website: “Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. It is optimized for datasets ranging from a few hundred gigabytes to a petabyte or more and costs less than $1,000 per terabyte per year, a tenth the cost of most traditional data warehousing solutions.”

http://aws.amazon.com/redshift/

Here are some of the benefits of using a Redshift as opposed to physical hardware:

  • Data Warehouse as a Service (DaaS) - no physical hardware needed and a pay-as-you-go model.

  • Fast, effective and low-cost data warehouse – columnar database built for analytical workloads.

  • Easy to use – one click deployment, easy to back up, easy to manage

  • Scalability – allows resizing and clustering

  • Fully managed – hardware and software upgrades are all managed by AWS.

Qlik on AWS

Since 2011, Qlik and AWS have been partners, providing cloud-based business intelligence using cloud-based data. Qlik works well as a web service and we have a large number of cloud-based customers and partners. Some of our partners base entire business lines around Qlik in the cloud, including CONTEXT, IPC Global, IFR Monitoring, Rosslyn Analytics, and SynerTrade.

Amazon Redshift was release in 2013, adding the flexibility of a massively scalable cloud-based data warehouse to Qlik’s data analysis capabilities

The diagram below gives an overview of how Qlik works with Amazon’s web services.

drawing

Amazon released Redshift in 2013, adding the flexibility of a massively scalable cloud-based database to Qlik’s data analysis capabilities.

Why use Qlik and AWS Redshift together?

  • Redshift is certified for Qlik 11 SR5.

    • Redshift was certified by the Qlik Partner Engineering team in the 2nd half of 2013.
  • Qlik Server is certified on AWS EC2 servers.

    • Since 2011, all Instances of EC2 running Microsoft Windows Server have been tested with Qlik
  • Redshift is one of Qlik’s preferred Big Data Platforms for Direct Discovery (in-database processing)

    • Qlik 11 SR5 has been tested by extracting 100 million rows into Qlik’s associative in –memory data store in the cloud

    • Using Qlik’s Direct Discovery platform with data sourced from Redshift, Qlik 11 SR5 has been tested with 1 billion rows of data

    • Qlik has shown consistent performance in running inside AWS Environment.

  • Many System Integrator Partners are experienced in deploying Qlik and AWS.

We recommend the currently certified Redshift/Qlik platform, namely Qlik Server and Publisher running inside an Amazon EC2 instance as shown in the figure below.

Platform

We do not recommend Qlik running on premises with Redshift. This is because of bandwidth variability issues that can degrade performance. We recommend using AWS Direct Connect when using Qlik running on premises with Redshift. AWS Direct Connect provides dedicated bandwidth that removes variability and ensures a good end user experience.

Qlik accesses data through the Redshift leader node via ODBC data connectors (see the figure below).

Picture3

Due to distribution of data happening inside AWS Platform; Qlik recommend users follow Redshift best practices and place organize data that will be optimal for query performance.

Lab Steps

This lab requires an account with Administrator Privileges and Control Tower setup.

  1. Record your AWS SSO AWSReservedSSO_AWSServiceCatalogEndUserAccess role name. In the next steps we will grant access for this role to launch Qlik products from service catalog.

    a. Login to your Control Tower master with Administrator role

    b. Navigate to IAM console

    c. Select Roles

    d. Search for role with name started with AWSReservedSSO_AWSServiceCatalogEndUserAccess

    e. Click on the Role and record the Role ARN, you will need it for the next step.

  2. Deploy the Service Catalog porfolio into Control Tower.

    a. Login to your Control Tower master with Administrator role

    b. Navigate to CloudFormation console

    c. Choose the Create stack button

    d. Enter the template URL: https://qlik-aws-public.s3.amazonaws.com/sc-portfolio-mpdemoQlik.json

    e. Enter the Stack name qlik-control-tower

    f. For parameter LinkedRole1 enter the Role ARN from previous step

    g. Leave the remaining parameters as default

    h. Click Next twice

    i. Enable checkbox for I acknowledge that AWS CloudFormation might create IAM resources with custom names. and I acknowledge that AWS CloudFormation might require the following capability: CAPABILITY_AUTO_EXPAND

    j. Click Create Stack

  3. Login to Control Tower Master as Service Catalog End User

    a. The stack you deployed on the previous step will create Qlik service catalog porfolio that is shared to AWS SSO group AWSServiceCatalogEndUserAccess. To access this service catalog you need to re-login via AWS SSO using the right role.

    b. From your AWS SSO URL, and login to your master account using AWSServiceCatalogEndUserAccess role.

  4. Deploying your Service Catalog Product into Control Tower.

    a. Login to your Service Catalog and You should see something similar to this.

    Picture4

    b. Create a Qlik Instance and Amazon Redshift Cluster from AWS Service Catalog. Select the product Qlik Sense Enterprise Instance with Redshift.

    Picture5

    c. Click Launch Product to Launch the Demo Qlik Sense and Redshift Instance to your Control Tower Instance.

    Picture6

    d. Enter Name as Identifier.

    Picture7

    e. Set Primary Availability Zone to correct availability zone.

    Picture8

    f. Set Inbound IPv4 CIDR to your external IP with /32 using www.whatismyip.com

    g. Set Master Password to Welc0me$ko2020

    Picture9

    h. Click Next.

    i. Review and click Next to deploy.

    Picture10

    j. Once completed, AWS Service Catalog will show the following:

    Picture11

    Please note the Value will open up CloudFormation window for you to track the progress

    Picture12

    k. If you go back to Service Catalog and Look at Provisioned product list. You see the status as Available and all needed information for this product will be listed.

    Picture13

  5. Connecting to Qlik Sense Server and Conncting Redshift to Clik

    Due to issues with certificates we will not be configuring Qlik Sense be accessible from other browser. For purposes of this lab we will log into the server and make a connection to AWS Redshift.

    a. Open RDP Tool Connect to Qlik Sense Server by using user: qlik and password Welc0me$ko2020:

    Note: For this workshop all of Qlik Sense Servers were imaged with above credentials and loaded partner licenses

    Picture14

    Picture15

    Picture16

    Note: please click Continue if you get challenged by certificates

    b. Once connected open Qlik Sense Hub.

    Picture17

    c. Qlik Sense will launch Firefox and auto authenticate with saved credentials. Once in the hub double click and open “AWS Redshift” Icon inside Qlik Sense The Firefox browser has credentials qlik/ Welc0me$ko2020 pre-loaded.

    Picture18

    Picture19

    Picture20

    d. Now We will connect Redshift to Qlik Sense Application. First Create ODBC connection to Redshift. Find ODBC Data Sources (64-bit) in Windows

    e. Add Redshift ODBC Driver to User/System DSN

    f. Add Credentials from Step 1k

    g. For Redshift Database “qlik_aws_sko” is the database name

    h. Validate Connection

    Picture21

    Picture22

    Picture23

    Picture24

    Picture25

    Picture26

    i. Now back to hub Click on “Add data from files and other sources”. Refer to Step 2c.

    Picture27

    j. Click on ODBC tile from Qlik Sense Connectors and select the ODBC Connection created. Select created DSN and Enter credentials (qlik/ Welc0me$ko2020).

    Picture28

    Picture29

    k. At this point you have your Qlik Sense Server connected Redshift Instance.

    Picture30

  6. (Optional) Loading Sample Data to Redshift and Visualizing in Qlik

    a. For Final Part of the Work we will attempt to load a few sample rows from NYC Taxi and Limousine Commission Data. The original data can be found here https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page. For purpose of this lab the Data has been loaded to a public bucket folder s3://qlik-aws-public/taxi_sample

    b. Login to Redshift and Open Query Editor.

    Picture31

    c. Run the Following Code. Create Trips Table.

    CREATE TABLE trips (
      vendorid              VARCHAR(10) ENCODE LZO,
      tpep_pickup_datetime         TIMESTAMP NOT NULL,
      tpep_dropoff_datetime        TIMESTAMP NOT NULL ENCODE LZO,
      passenger_count  DECIMAL(6,0) ,
      trip_distance           DECIMAL(6,3) DEFAULT '0.0' ENCODE MOSTLY8,
      PULocationID  VARCHAR(2) ,
      DOLocationID VARCHAR(2) ,
      rate_code_id            SMALLINT NOT NULL ENCODE LZO,
      store_and_fwd_flag      VARCHAR(5),
     payment_type            VARCHAR(3),
     fare_amount             DECIMAL(6,2) DEFAULT '0.0' ENCODE MOSTLY8,
      extra                   DECIMAL(6,2) DEFAULT '0.0' ENCODE MOSTLY8,
      mta_tax                 DECIMAL(6,2) DEFAULT '0.0' ENCODE MOSTLY8,
      tip_amount              DECIMAL(6,2) DEFAULT '0.0' ENCODE MOSTLY8,
      tolls_amount            DECIMAL(6,2) DEFAULT '0.0' ENCODE MOSTLY8,
      improvement_surcharge   DECIMAL(6,2) DEFAULT '0.0' ENCODE MOSTLY8,
      total_amount            DECIMAL(6,2) DEFAULT '0.0' ENCODE MOSTLY8,
       congestion_surcharge   DECIMAL(6,2) DEFAULT '0.0' ENCODE MOSTLY8
    ) sortkey(tpep_dropoff_datetime)
    

    d. Copy Files into Trips Tables.

    copy trips from 's3://qlik-aws-public/taxi_sample/xaa'
    credentials 'aws_access_key_id=MY_ACCESS ;aws_secret_access_key=MYSECRET
    delimiter ',' region 'us-east-1'
    IGNOREHEADER as 1;
    
    copy trips from 's3://qlik-aws-public/taxi_sample/xab'
    credentials credentials 'aws_access_key_id=MY_ACCESS ;aws_secret_access_key=MYSECRET
    delimiter ',' region 'us-east-1'
    

    e. Validate Trips Data in Redshift Query window. At this point you should have 200K lines in Redshift and you can validate.

    Picture34

    f. Use Qlik Sense to Connect data. Click on created ODBC Connection inside Redshift.

    Picture35

    g. If you set the owner to public and check the checkmark next to trips you should see your data from Redshift.

    Picture36

    h. Press the green “Add data” Button and you should see Qlik connecting to Redshift and caching and aggregating data (Approximately 24 seconds).

    Picture37

    i. At this Point Qlik Indexing and cognitive engine has inspected the data and metadata. Click on “Generate Insight” for Qlik to give suggestion on insights found on data.

    Picture38

    j. Qlik will Display suggested insights and you can press “Add to Sheet” to create you first Qlik Dashboard.

    Picture39

    Picture40

    k. Click on My new sheet and Explore by Clicking on different charts and fields interacting with Redshift.

    Picture41

In This Tutorial you have seen Service Catalog, Qlik, Redshift. Please reach out to your Qlik AWS Partner Team for more information.

Copyright 2020, Amazon Web Services, All Rights Reserved.