custom schema is unique in that you may create your own tables and freely write data into them. Although you can use a tool like DBeaver's data editor, that is often impractical for adding any significant amount of data. Since DWA is powered by Amazon Redshift, it is possible to upload files to Amazon S3 (Simple Storage Service) and import them into Redshift.
Note: Users of KORE CRM-integrated products may have access to a custom table uploader. That feature is deprecated and may cease to work in the future. Support for it is no longer available.
First, you must have an AWS (Amazon Web Services) account with S3 access. Choose from:
- Your own account—Amazon offers new users an S3 plan free for 12 months with 5GB storage (subject to change without notice here). You may be charged by Amazon if your usage exceeds the free tier's limits. -or-
- KORE-provided account—Request access from your KORE Success Manager.
You also need to know which Amazon region your KORE data warehouse is stored in. Your KORE Success Manager can provide this to you.
Upload data to S3
First, create an S3 bucket in the same region that your KORE data warehouse is in. We recommend using Amazon's S3 console (web interface) for this unless you are already familiar with another method such as the AWS CLI (command line interface).
Next, turn on default bucket encryption to ensure that the file(s) you upload will be encrypted while stored on disk.
Finally, select the file(s) containing your data and upload it to S3 as an object. Again, we recommend using Amazon's S3 console for this step.
Obtain AWS credentials
Caution: Do not share credentials for your own AWS account with any third party, including KORE.
You may optionally follow Amazon's directions to configure an IAM user, but these instructions assume you will act as the root user to avoid extra steps.
To access your S3 data from Redshift during the copy operation, you must provide two credentials:
If you are using your own AWS account, sign in to the Security Credentials page of the AWS Management Console and then create a new access key and download the key file by following Amazon's directions.
If you are using a KORE-provided AWS account, your Success Manager will provide these credentials to you.
Create a new table
Before you can import your data, there must be a destination table for it in the
custom schema. Creating a new table requires a SQL query (command). You can do this using your preferred database tool (such as DBeaver) or via the web with the Redshift query editor.
First, examine the data you uploaded to S3 to determine the fields and data types to use. For example, if a column of data contains "123 Any Street", "456 Any Avenue", etc., you might name the field
street_address_1 and assign a data type of
varchar(200). Or, if a field contains a count of something, you might name the field
quantity_sold and use the data type
int. If you are unsure what data type to use, refer to Amazon's table of supported data types.
Next, construct and execute a
CREATE TABLE query. There are several optional parameters available for advanced users. For example, the
NOT NULL parameter makes a field mandatory. Amazon's
CREATE TABLE page lists many more.
Here is an example query to create a new table:
CREATE TABLE IF NOT EXISTS custom.example ( user_id int NOT NULL, user_name varchar(200), street_address_1 varchar(200), street_address_2 varchar(200), city varchar(100), state char(2), postcode varchar(10), country_code char(2), current_member boolean, total_purchases int, total_spent decimal, first_purchase timestamp, latest_purchase timestamp );
Copy data into Redshift
Important: Redshift allows duplicate rows within a table. If you are updating an existing table, we recommend either:
(a) dropping (deleting) the existing table and re-creating it, or
(b) creating a temporary table for the S3 copy process, then inserting the unique rows into the "real" table.
The example assumes that you did not create an IAM user. Change the credentials format if you did.
Depending on your file's format, you may need to specify data format parameters. By default, Redshift uses the pipe character (
|) to identify the break between two fields. Use
CSV for comma-separated values,
DELIMITER \t for tab-separated fields in a text file. See Amazon's data format parameters guide for additional possibilities.
COPY command also supports other parameters. For example, the
BLANKSASNULL parameter instructs Redshift to treat a field as
NULL if the source file only contains whitespace (spaces or tabs) for that field, instead of copying the whitespace into the field. Many of these parameters may be useful, but they generally are not required.
Here is an example query to import data from a CSV file—note the absence of commas:
COPY custom.example FROM 's3://example_bucket/example_file.csv' ACCESS_KEY_ID '1234567890' SECRET_ACCESS_KEY 'abcdefghijk' CSV BLANKSASNULL EMPTYASNULL IGNOREBLANKLINES IGNOREHEADER 1 --use if the file has a header row with field names TRIMBLANKS;
If the destination table already contains data, the
COPY command appends new rows even if a new row is identical to an older one. The command will not overwrite any existing rows.
Review the results
If the copy operation completes without errors, look at a few rows in the table to ensure the data is mapped correctly. You can do this using your database tool or with another SQL query:
SELECT * FROM custom.example LIMIT 10;
If the results are satisfactory, you may optionally delete your file from S3 to save storage space.
Finally, use the
ANALYZE command to help the system execute queries more efficiently:
Potential duplicate records
Because Redshift does not enforce uniqueness or primary key constraints, it is possible to have exact duplicate rows within a table. It's also possible to have multiple rows with different data for the same primary key.
To test whether your table contains more rows than primary keys, use a SQL query like this one:
SELECT COUNT(*) AS total_rows, COUNT(DISTINCT user_id) AS unique_ids FROM custom.example;
KORE does not provide any deduplication services within the
custom schema. If deduplication is desired, you will need to write custom SQL queries that perform the correct functions for your use case. The KORE Support team cannot help with writing these queries. Instead, consider engaging with KORE Planning & Insights services if you need assistance. (KPI services may incur additional fees, depending on your organization's contract with KORE.)
Create a view (requires "WITH NO SCHEMA BINDING")
Warning: Failure to use the
WITH NO SCHEMA BINDING clause may cause KORE-provided DWA integrations to fail with errors, sometimes long after the view is configured.
Creating views is optional, but if you create a view for the
custom schema which references any table outside of
custom, you must use the
WITH NO SCHEMA BINDING clause. If you omit it, any table used in the view cannot be dropped or modified. This prevents KORE's integrations from changing a column's data type or adding more columns to the table in the future. It also prevents making other changes to a table's structure when we update an integration to maintain compatibility with the data provider's feed.
You can use the following query to identify if you have any views that don't include the
WITH NO SCHEMA BINDING clause.
SELECT pg.viewowner ,pg.viewname, pg.definition
FROM pg_views pg
WHERE upper(definition) not like '%WITH NO SCHEMA BINDING%'
and pg.viewowner != 'rdsdb';
Update the table
To make bulk updates to the table in the future, upload your new data into S3 as described above. For most file formats (including CSV), each line must provide data in the same order as your actual table. (That is, the data must match the existing columns, even for null values.) Use JSON files if you need more flexibility.
Then follow the same steps to create a new table, but treat this one as a "staging" table.
CREATE TABLE IF NOT EXISTS custom.example_staging user_id int NOT NULL, [...] );
UPDATE commands to perform any necessary data cleansing on the staging table (such as updating timestamps or changing email addresses to all-lowercase). For efficiency, only run
UPDATE commands on your staging table—not on your actual table after merging.
Note: In some cases, you may want to delete all of the existing data in a table and completely replace it with new data. For best performance, carefully use the
TRUNCATE command on the table to immediately delete ALL of its rows. We still recommend using a staging table to ensure your new data is ready before deleting the old data.
If you encounter an error similar to
SQL Error [XX000]: ERROR: Load into table
'example' failed. Check 'stl_load_errors' system table for details., this usually indicates a data type mismatch. Use the following query to review the errors:
SELECT err_reason, line_number, raw_field_value, colname, type, filename FROM stl_load_errors WHERE starttime > current_date - 1 ORDER BY starttime DESC;
Common problems include:
- Dollar signs ($) or commas (,) in numeric values
- Alpha characters in numeric values
- Invalid date or time values/formats
To correct these, change the column to the
varchar type using an
ALTER COLUMN command and try loading the data again. Then query the table to find and correct invalid values. We do not recommend leaving such columns as type
varchar as this affects operations like
ORDER BY and could prevent you from noticing invalid data again in the future.
Automate updates (advanced users)
It is possible to automate the uploading and copying of data, essentially creating your own file-drop DWA integration. The basic steps are:
- Install the AWS CLI (command line interface).
- Create a script or batch file to upload your file to S3 and run SQL commands using the AWS CLI.
- Schedule the script to run via Task Scheduler or
We strongly recommend using IAM roles instead of storing your root S3 credentials in the plain-text script file.
Do not use this technique to create "real-time" custom integrations. This is for Redshift efficiency reasons—frequently inserting a small number of records at a time consumes far more compute resources than inserting the same data all at the same time. This may cause performance degradation throughout your data warehouse. We recommend a daily update for most use cases, and no more than hourly for situations that warrant frequent updates.
The KORE Support team cannot help with creating custom automations. Instead, consider engaging with KORE Planning & Insights services if you need assistance. (KPI services may incur additional fees, depending on your organization's contract with KORE.)