Redshift

Integrating Amazon Redshift with Upriver

Amazon Redshift is a fully managed, petabyte-scale data warehouse solution provided by Amazon Web Services (AWS). By integrating Redshift with Upriver, you can streamline data governance and maintain high data quality directly within your data warehouse. This integration enables efficient data monitoring, traceability, and consistency, ensuring your analytics and reporting pipelines remain accurate and reliable.

Prerequisites

To monitor Amazon Redshift with Upriver, you will need the following:

Provision AWS for Upriver operation.
Configure a Redshift integration in Upriver:
1. Create a Redshift user for Upriver.
2. Grant Upriver's user access to your Redshift.

Step 1: Provision AWS

Before connecting Redshift to Upriver, make sure your AWS account is properly set up. Follow the guidelines provided in this page to ensure correct setup.

Provisioning AWS for Redshift access is only needed in Hybrid/Outpost deployment methods.

Step 2: Configure Redshift integration in Upriver's Platform

Once your AWS account is set up, its time to configure a Redshift integration in the Upriver platform by providing the following the next steps:

Navigate to Settings->Data Integrations , click "+ Add" in the top-right corner.
Fill in basic settings:
- Name - A user defined name.
- Type - Type of integration. Redshift should be chosen.
In the Redshift console: Granting Upriver's user access to the Redshift instance:

SaaS

Select the region of the AWS account.
In your Redshift account under the desired cluster or workgroup:
1. Press "Grant access".
2. Enter the provided Uriver's Account ID and the VPC IDs.
Fill the connection details (from step 3):
1. Username: Upriver's Redshift username.
2. Password: Upriver's Redshift user's password.

Hybrid

Please set up a redshift-managed VPC endpoint according to the following guide by AWS. Alternatively, set the redshift as publicly accessible, and use security group/firewall to limit access as you wish. Connect to the VPC created by the cloud formation script (details available in the outputs of the cloud formation). Note that you'll need to create a security group allowing redshift access for that VPC yourself.

Fill in the connection details:

Host: The url of the host for your redshift. If you've set up a VPC endpoint, it's the host for that endpoint (for SaaS deployment - you'll receive the host from the Upriver representative). In a publicly accessible redshift, for serverless redshift, this is the host of the workgroup. For provisioned redshift, this is the cluster endpoint.
Port: The port used in the connection. By default, this is 5439.
User: Upriver's Redshift username as created on step 3.
Password: Upriver's Redshift user's password.

In the Redshift console: Create a Redshift user for Upriver: In order to connect to redshift, Upriver will need a user created in the database we the sufficient permissions. Run the following script in order to create a user called upriver :
1. Replace <password> a desired password.
2. Replace <schema_name> with any schema you want to grant access to (can be more than one)
  1. For multiple schemas duplicate these rows and change the <schema_name> accordingly.

  -- Create user and grant schema\tables permissions
  CREATE USER upriver PASSWORD '<password>';
  GRANT USAGE ON SCHEMA <schema_name> TO upriver;
  GRANT SELECT ON ALL TABLES IN SCHEMA <schema_name> TO upriver;

  -- Create a view for functions and procedures
  CREATE SCHEMA IF NOT EXISTS upriver_meta;
  CREATE VIEW upriver_meta.routine_definitions AS
  SELECT
    n.nspname                         AS schema_name,
    p.proname                         AS routine_name,
    p.prokind                         AS routine_kind,
    p.proargtypes                     AS argtypes,
    p.prosrc                          AS body
  FROM pg_proc_info p
  JOIN pg_namespace n ON n.oid = p.pronamespace
  WHERE n.nspname NOT IN ('pg_catalog','information_schema');

  -- Grant read-only access to the routine definitions view
  GRANT USAGE ON SCHEMA upriver_meta TO upriver;
  GRANT SELECT ON upriver_meta.routine_definitions TO upriver;

  -- [Optional] Allow for lineage analysis permissions 
  ALTER USER upriver WITH SYSLOG ACCESS UNRESTRICTED;
  GRANT SELECT ON pg_catalog.svv_redshift_databases TO upriver;  -- Database information and properties
  GRANT SELECT ON pg_catalog.svv_redshift_schemas TO upriver;    -- Schema information within databases
  GRANT SELECT ON pg_catalog.svv_external_schemas TO upriver;    -- External schemas (Spectrum, federated)
  GRANT SELECT ON pg_catalog.svv_table_info TO upriver;          -- Table metadata, statistics, and properties
  GRANT SELECT ON pg_catalog.svv_external_tables TO upriver;     -- External table definitions (Spectrum)
  GRANT SELECT ON pg_catalog.svv_external_columns TO upriver;    -- External table column information
  GRANT SELECT ON pg_catalog.pg_class_info TO upriver;           -- Table creation timestamps and basic info
  GRANT SELECT ON pg_catalog.pg_class TO upriver;                -- Table and view definitions
  GRANT SELECT ON pg_catalog.pg_namespace TO upriver;            -- Schema namespace information
  GRANT SELECT ON pg_catalog.pg_description TO upriver;          -- Table and column descriptions/comments
  GRANT SELECT ON pg_catalog.pg_database TO upriver;             -- Database catalog information
  GRANT SELECT ON pg_catalog.pg_attribute TO upriver;            -- Column definitions and properties
  GRANT SELECT ON pg_catalog.pg_attrdef TO upriver;              -- Column default values
  GRANT SELECT ON pg_catalog.svl_user_info TO upriver;           -- User information for ownership
  GRANT SELECT ON pg_catalog.svv_datashares TO upriver;          -- Cross-cluster datashare information
  GRANT SELECT ON pg_catalog.stv_mv_info TO upriver;             -- Materialized view information (provisioned)
  GRANT SELECT ON pg_catalog.svv_user_info TO upriver;           -- User information (serverless alternative)
  GRANT SELECT ON pg_catalog.svv_mv_info TO upriver;             -- Materialized view information (serverless)

Upriver supports different users for different hosts, however does not support different users for different databases/schemas accessed via the same host.

Insert the password back in Upriver platform.
1. For hybrid: insert the host and port as well.

Step 3: Configure the Data Source in Upriver's Platform

Now that your Amazon Redshift integration is set up, you can configure Data Sources to be monitored.

To do this:

Follow the instructions provided in Data Source Configuration section of the documentation.
When configuring a Amazon Redshift Data Source, choose the correct integration in the Connection step. The integration chosen should point to the Redshift instance that holds the relevant table you wish to monitor.
Provide the required Database, Schema and Table.
Continue with the rest of the configuration as needed.

Monitor and Manage Your Data

After configuring a Redshift datasource, Upriver will automatically monitor it for you. You can track data issues and enforce governance policies, ensuring your data is consistently accurate and trustworthy throughout the pipeline.

PreviousS3 NextGCP

Last updated 24 days ago

hashtagIntegrating Amazon Redshift with Upriver

hashtagPrerequisites

hashtagStep 1: Provision AWS

hashtagStep 2: Configure Redshift integration in Upriver's Platform

hashtagStep 3: Configure the Data Source in Upriver's Platform

hashtagMonitor and Manage Your Data