Overview
Most DWA integrations refresh data one or more times per day. Depending on how the vendor's systems work, this is either by making incremental updates to the data in DWA (adding new records and updating existing records) or by totally replacing all records in DWA with a "full pull". Although DWA integrations generally provide historical data, this does not include the prior states of a record which has changed over time. In some cases, you may want to preserve a snapshot of certain data as it is now for future comparisons.
Upon request, KORE can automatically create daily snapshots of selected tables and/or views. To avoid increasing your Redshift storage costs, we store the snapshots in Amazon S3 instead of Redshift. You can access the snapshots as external tables in the snapshotx
external schema for analysis.
Note: KORE provides a large storage allowance for data snapshots, but excessive storage use may result in additional charges—contact your Success Manager for specific details. You can delete unneeded data from S3 at any time.
Example use cases
Marketing—Compare past snapshots with current data to measure the results of a multichannel campaign.
Sponsorship—Use snapshots to observe how a deal moves through different stages throughout its life and the effects on revenue, or search for year-over-year and month-over-month trends.
Secondary ticket market—Use snapshots to aggregate seat listings by section, then measure pricing fluctuations in the days or weeks leading up to an event to reveal trends in market demand.
Setup
To begin, connect to your data warehouse and create a view in the custom
schema. We recommend naming it custom.v_snapshot_data_set_name
. Use the view's SQL query to select a list of columns representing the data you wish to snapshot. This data may optionally come from more than one table or schema.
We discourage using the star (*
) operator in this query. If new columns are later added to the referenced table, this view will not include them. This could cause confusion in the future.
For example, suppose you have a data source named texzt
and you want to snapshot specific details from one of its tables:
CREATE OR REPLACE VIEW custom.v_snapshot_texzt_user_details AS (
SELECT user_id, given_name, family_name, cell_phone, response_rate
FROM texzt.user_details;
);
Verify that the view displays the desired results, then ask your KORE Success Manager to enable data snapshots for the view. You may create as many views and snapshots as you wish.
Warning: Do not delete this view after configuring snapshots. The system cannot take snapshots without it.
Analysis
Once the first snapshot is taken, you can access your snapshots in the snapshotx
schema. This schema contains external tables which will be loaded from Amazon S3. You can write SQL queries to analyze the data from your snapshots.
All of the daily snapshots for a particular view are stored in the same external table. The external table is partitioned by date, and the partition_date
field contains a timestamp indicating the date when the snapshot was taken. (Currently, the time of day portion is always set to midnight. However, this could change in the future.)
Important: It is very important to limit queries using the partition_date
field. As the number of partitions to search increases, performance will greatly slow. For example, suppose you want to see how a record has changed over the past year—it would be dramatically faster to search for the record in one snapshot per month (12 partitions) instead of all the daily snapshots (365 partitions).
Example query
Suppose we want to observe how the record of a particular user has changed over the past three years. To limit the number of partitions to search, we can use the modulo operator (%
) to choose partition dates 30 days apart. (If the number of days between partition_date
and today is divisible by 30 with no remainder, we use that partition_date
.)
SELECT * FROM snapshotx.texzt_user_details
WHERE user_id = '12345abc'
AND (DATEDIFF(DAY, GETDATE(), partition_date) < (365*3)) --search within past 3 years
AND ((DATEDIFF(DAY, GETDATE(), partition_date) % 30) = 0); --one partition per 30 days
Configuration changes
Change which data to capture
Every time KORE's system takes a snapshot, it refers to the view you created in the custom
schema. You can make changes to that view and thus change the data to be included in your snapshots without KORE's intervention. However, this has implications for your previous snapshots because a table can only have one definition.
If you change your view to include additional columns, those columns will be added to the table definition and your future snapshots will capture this data. Your existing snapshots will also gain this column, but the field will be NULL
in every record since we can't know what the value would have been when the snapshot was created.
If you change your view to remove columns, those columns will also be removed from the table definition during the next snapshot. This means that the data in that column will be inaccessible from Redshift although it will still be present in the S3 files. We recommend against doing this. Instead, please open a support ticket if you need a column to be deleted from existing snapshots.
To keep the existing data from a column but stop collecting it in future snapshots, we suggest creating a new view in the custom
schema for future snapshots. Once enabled, ask your KORE Success Manager to disable snapshots for the older view.
Discontinue snapshots
Contact your KORE Success Manager if you wish to stop taking snapshots of a view. We can deactivate the feature either temporarily or permanently. Please also specify if we should preserve or delete the existing snapshots.
If you instead delete the associated view in the custom
schema, the daily snapshot function will fail with errors. Your existing snapshots will remain accessible from Redshift but no additional snapshots will be taken. KORE will not be able to restore the deleted view if you change your mind, so we strongly advise against deleting the view.