Upload data into DWA ('custom' schema) – Two Circles Help Center

Overview

The custom schema is unique in that you may create your own tables and freely write data into them. Although you can use a tool like DBeaver's data editor, that is often impractical for adding any significant amount of data. Since DWA is powered by Amazon Redshift, it is possible to upload files to Amazon S3 (Simple Storage Service) and import them into Redshift.

Once data is added to the custom schema, you can use it like any other DWA data source. This includes Tableau reports, custom SQL queries combining multiple data sources, and Helix Golden Records.

Note: Users of KORE CRM-integrated products may have access to a custom table uploader. That feature is deprecated and may cease to work in the future. Support for it is no longer available.

Prerequisites

First, you must have an AWS (Amazon Web Services) account with S3 access. Choose from:

Your own account—Amazon offers new users an S3 plan free for 12 months with 5GB storage (subject to change without notice here). You may be charged by Amazon if your usage exceeds the free tier's limits. -or-
KORE-provided account—Request access from your KORE Success Manager.

You also need to know which Amazon region your KORE data warehouse is stored in. Your KORE Success Manager can provide this to you.

To copy your data into Redshift, the file must conform to Amazon's requirements. Amazon supports a variety of data formats including CSV and JSON.

Upload data to S3

First, create an S3 bucket in the same region that your KORE data warehouse is in. We recommend using Amazon's S3 console (web interface) for this unless you are already familiar with another method such as the AWS CLI (command line interface).

Next, turn on default bucket encryption to ensure that the file(s) you upload will be encrypted while stored on disk.

Finally, select the file(s) containing your data and upload it to S3 as an object. Again, we recommend using Amazon's S3 console for this step.

Obtain AWS credentials

Caution: Do not share credentials for your own AWS account with any third party, including KORE.

You may optionally follow Amazon's directions to configure an IAM user, but these instructions assume you will act as the root user to avoid extra steps.

To access your S3 data from Redshift during the copy operation, you must provide two credentials: aws_access_key_id and aws_secret_access_key.

If you are using your own AWS account, sign in to the Security Credentials page of the AWS Management Console and then create a new access key and download the key file by following Amazon's directions.

If you are using a KORE-provided AWS account, your Success Manager will provide these credentials to you.

Create a new table

Next, connect to your KORE data warehouse.

Before you can import your data, there must be a destination table for it in the custom schema. Creating a new table requires a SQL query (command). You can do this using your preferred database tool (such as DBeaver) or via the web with the Redshift query editor.

First, examine the data you uploaded to S3 to determine the fields and data types to use. For example, if a column of data contains "123 Any Street", "456 Any Avenue", etc., you might name the field street_address_1 and assign a data type of varchar(200). Or, if a field contains a count of something, you might name the field quantity_sold and use the data type int. If you are unsure what data type to use, refer to Amazon's table of supported data types.

Next, construct and execute a CREATE TABLE query. There are several optional parameters available for advanced users. For example, the NOT NULL parameter makes a field mandatory. Amazon's CREATE TABLE page lists many more.

Here is an example query to create a new table:

CREATE TABLE IF NOT EXISTS custom.example (
    user_id int NOT NULL,
    user_name varchar(200),
    street_address_1 varchar(200),
    street_address_2 varchar(200),
    city varchar(100),
    state char(2),
    postcode varchar(10),
    country_code char(2),
    current_member boolean,
    total_purchases int,
    total_spent decimal,
    first_purchase timestamp,
    latest_purchase timestamp
);

Copy data into Redshift

Important: Redshift allows duplicate rows within a table. If you are updating an existing table, we recommend either:
(a) dropping (deleting) the existing table and re-creating it, or
(b) creating a temporary table for the S3 copy process, then inserting the unique rows into the "real" table.

The example assumes that you did not create an IAM user. Change the credentials format if you did.

While connected to your KORE data warehouse, create and execute another SQL query using the COPY command to load data from S3.

Depending on your file's format, you may need to specify data format parameters. By default, Redshift uses the pipe character (|) to identify the break between two fields. Use CSV for comma-separated values, JSON for JSON (JavaScript Object Notation) files, or DELIMITER \t for tab-separated fields in a text file. See Amazon's data format parameters guide for additional possibilities.

The COPY command also supports other parameters. For example, the BLANKSASNULL parameter instructs Redshift to treat a field as NULL if the source file only contains whitespace (spaces or tabs) for that field, instead of copying the whitespace into the field. Many of these parameters may be useful, but they generally are not required.

Here is an example query to import data from a CSV file—note the absence of commas:

COPY custom.example FROM 's3://example_bucket/example_file.csv'
    ACCESS_KEY_ID '1234567890'
    SECRET_ACCESS_KEY 'abcdefghijk'
    CSV
    BLANKSASNULL
    EMPTYASNULL
    IGNOREBLANKLINES
    IGNOREHEADER 1  --use if the file has a header row with field names
    TRIMBLANKS;

If the destination table already contains data, the COPY command appends new rows even if a new row is identical to an older one. The command will not overwrite any existing rows.

Depending on the size of your file(s), this operation may take a while to finish. If the copy operation fails, the STL_LOAD_ERRORS table may help you identify why.

Review the results

If the copy operation completes without errors, look at a few rows in the table to ensure the data is mapped correctly. You can do this using your database tool or with another SQL query:

SELECT * FROM custom.example LIMIT 10;

If the results are satisfactory, you may optionally delete your file from S3 to save storage space.

Finally, use the ANALYZE command to help the system execute queries more efficiently:

ANALYZE custom.example;

Potential duplicate records

Because Redshift does not enforce uniqueness or primary key constraints, it is possible to have exact duplicate rows within a table. It's also possible to have multiple rows with different data for the same primary key.

To test whether your table contains more rows than primary keys, use a SQL query like this one:

SELECT COUNT(*) AS total_rows, COUNT(DISTINCT user_id) AS unique_ids
FROM custom.example;

KORE does not provide any deduplication services within the custom schema. If deduplication is desired, you will need to write custom SQL queries that perform the correct functions for your use case. The KORE Support team cannot help with writing these queries. Instead, consider engaging with KORE Planning & Insights services if you need assistance. (KPI services may incur additional fees, depending on your organization's contract with KORE.)

Create a view (requires "WITH NO SCHEMA BINDING")

Warning: Failure to use the WITH NO SCHEMA BINDING clause may cause KORE-provided DWA integrations to fail with errors, sometimes long after the view is configured.

Creating views is optional, but if you create a view for the custom schema which references any table outside of custom, you must use the WITH NO SCHEMA BINDING clause. If you omit it, any table used in the view cannot be dropped or modified. This prevents KORE's integrations from changing a column's data type or adding more columns to the table in the future. It also prevents making other changes to a table's structure when we update an integration to maintain compatibility with the data provider's feed.

You can use the following query to identify if you have any views that don't include the WITH NO SCHEMA BINDING clause.

SELECT pg.viewowner ,pg.viewname, pg.definition
FROM   pg_views pg
WHERE  upper(definition) not like '%WITH NO SCHEMA BINDING%'
and pg.viewowner != 'rdsdb';

Update the table

To make bulk updates to the table in the future, upload your new data into S3 as described above. For most file formats (including CSV), each line must provide data in the same order as your actual table. (That is, the data must match the existing columns, even for null values.) Use JSON files if you need more flexibility.

Then follow the same steps to create a new table, but treat this one as a "staging" table.

CREATE TABLE IF NOT EXISTS custom.example_staging
    user_id int NOT NULL,
    [...]
);

Next, use UPDATE commands to perform any necessary data cleansing on the staging table (such as updating timestamps or changing email addresses to all-lowercase). For efficiency, only run UPDATE commands on your staging table—not on your actual table after merging.

Finally, follow Amazon's instructions to merge the staging table into your actual table. It is usually best to replace existing rows, but check and follow Amazon's guidance.

Note: In some cases, you may want to delete all of the existing data in a table and completely replace it with new data. For best performance, carefully use the TRUNCATE command on the table to immediately delete ALL of its rows. We still recommend using a staging table to ensure your new data is ready before deleting the old data.

Troubleshooting

If you encounter an error similar to SQL Error [XX000]: ERROR: Load into table 'example' failed. Check 'stl_load_errors' system table for details., this usually indicates a data type mismatch. Use the following query to review the errors:

SELECT err_reason, line_number, raw_field_value, colname, type, filename
  FROM stl_load_errors
  WHERE starttime > current_date - 1
  ORDER BY starttime DESC;

Common problems include:

Dollar signs ($) or commas (,) in numeric values
Alpha characters in numeric values
Invalid date or time values/formats

To correct these, change the column to the varchar type using an ALTER COLUMN command and try loading the data again. Then query the table to find and correct invalid values. We do not recommend leaving such columns as type varchar as this affects operations like ORDER BY and could prevent you from noticing invalid data again in the future.

Automate updates (advanced users)

It is possible to automate the uploading and copying of data, essentially creating your own file-drop DWA integration. The basic steps are:

Install the AWS CLI (command line interface).
Create a script or batch file to upload your file to S3 and run SQL commands using the AWS CLI.
Schedule the script to run via Task Scheduler or cron.

We strongly recommend using IAM roles instead of storing your root S3 credentials in the plain-text script file.

Do not use this technique to create "real-time" custom integrations. This is for Redshift efficiency reasons—frequently inserting a small number of records at a time consumes far more compute resources than inserting the same data all at the same time. This may cause performance degradation throughout your data warehouse. We recommend a daily update for most use cases, and no more than hourly for situations that warrant frequent updates.

The KORE Support team cannot help with creating custom automations. Instead, consider engaging with KORE Planning & Insights services if you need assistance. (KPI services may incur additional fees, depending on your organization's contract with KORE.)

DEPRECATED - Custom Table Uploader Guide.docx
1 MB Download