Overview
Helix can support any data set available in your DWA warehouse (via our integrations or the custom
schema). To do so, first add the data set via a SQL query and then map the data set's fields to the matching Helix fields. Only an Admin or Power User can perform these actions.
How to add a Helix data set
Select Golden Record Management > Add Data Set & Map Fields to open the interface. Here you can view existing data sets and configurations. At the bottom of the list of data sets, there are two options for adding additional data sets:
- Add a new data set: You must provide the required details, including a SQL query.
- Add from template library: Select a data set in the first column and click Add to pre-populate the required details. (You may then edit the details if you wish.)
The required details are:
- Display name: will be shown in the user interface and can be changed later.
- API name: the name to use for this data set within the Helix database, which cannot contain spaces and cannot be changed later.
- Connection: the system to pull this data from, usually DWA.
- Schedule: selects how often this data set should be refreshed. Choose "Paused" to stop refreshing the data but leave the previously synced data available in Helix.
- SQL: a query that selects fields from your data set to be used in Golden Records—each row it returns must be unique.
Click the Analyze button to verify the data set is configured correctly and proceed to field mapping.
Best practices for creating a Helix data set
It is important to review your data quality and accuracy prior to setting up and configuring Helix. KORE recommends test and reviewing the results of your Helix data set SQL definitions prior to finalizing within Helix. You want to review your data for bad data entry, relevancy and standardization.
There’s a few main benefits to this, firstly poor quality or generic values will create poor golden record matches, which will then flow into your down stream systems like your CRM or warehouse reducing the value of Helix' datasets. Cleansed source data also produces less data stewardship tasks, which means less management within Helix and your downstream systems. Improve data quality directly translates to less maintenance in Helix and more accuracy from the customers groups it creates.
This should be done by querying your data base directly (outside of Helix). When reviewing your data it is a good idea to look over the top N values for each field you are returning in the query, or if possible, review all of the unique values for the personal identifying fields you plan on using in your matching rules. Additionally, it can be helpful to combine some of the PII fields such as first name and last name and then reviewing the more common combinations of values.
SQL template for reviewing your data:
select top N <insert field> from <insert table> group by <insert field> order by count(*) desc
While you can always clean and standardize your data by updating the SQL definition of each data source within Helix’s UI, it is often more efficient to do this at the database level prior to Helix querying the data set. One way you can do this is to create a table or view within the database using the following data prep best practices.
SQL template for reviewing your data:
select distinct <insert field> from <insert table>
Cleanup "bad" data entry
Review your data and make sure you are replacing “bad” data entry with NULL values so that the incorrect data entry wont negatively impact the matching results or persist through the tiebreaker rules onto the golden record. Below are examples of “bad” data that you will want to look out for and resolve in the SQL definition of each Helix data source.
-
Replace blank values with NULL
-
i.e., CASE WHEN email ilike '' THEN NULL
-
-
Replace sting values of ‘NULL' and 'NAN’ with NULL
-
i.e., CASE WHEN upper(email) in ('NULL','NAN') THEN NULL
-
-
Review your data for all PII fields
-
Do you have records with bad data where the values are just punctuation or symbol or special character
-
i.e. . ? / - *
-
-
Do you have records with bad data where the values are random strings or place holder values
-
i.e., al;lkas;fk
-
i.e., 8888888888
-
i.e., 8675309
-
i.e., unknown
-
i.e., will call
-
i.e., <team name> fan
-
i.e., !Transcription
-
-
Do you have records with “bad” data due to business practices that require specific fields to be filled out to create a customer record. A common example is requiring the email field to be filled out and when the sales rep doesn't know the email of the customer they use their work email to meet the requirement.
-
Do you have customers that use generic work email and phone numbers for large organizations. i.e., multiple KORE employees using the support@koresoftware.com as their email address when submitting information to organizations.
-
-
Is this field used for customer information or business information?
-
An example of customer information would be customer email
-
An example of business information is if the field is only used for the sales rep's email
-
Or does this field contain a mix of customer information and business information?
- i.e., sometime we populate the sales rep's email in the secondary email field and sometimes we populate another valid email for the customer in the secondary email field.
- Cleaning out business information will result in more accurate matching
-
SQL template:
select
case
when upper(<insert field>) in ('','NAN','NULL','<insert value>','<insert value>') then NULL
when len(<insert field>) <= 1 then NULL
else <insert field> as <insert field name>
from <insert table>
Name template:
CASE
WHEN UPPER([first_name]) IN ('UNKNOWN') THEN NULL
WHEN LEN([last_name]) <= 1 AND LEN([first_name]) <= 1 THEN NULL
ELSE [first_name]
END AS [first_name]
Email template:
CASE
WHEN UPPER([email_addr]) IN ('INVALID@INVALID.COM') THEN NULL
WHEN LEN([email_addr]) < 5 THEN NULL
WHEN [email_addr] NOT ILIKE '%@%' THEN NULL
ELSE LOWER([email_addr])
END AS [email]
Address template:
CASE
WHEN UPPER([mailing_address]) IN ('NULL', 'UNKNOWN', 'NOT AVAILABLE') THEN NULL
WHEN LEN([mailing_address]) <= 5 THEN NULL
ELSE TRIM([mailing_address])
END AS [address_line_1]
Phone number template:
-- remove sequence of numbers, non-alpha numbers, phone numbers less than 7 characters
case
when REGEXP_REPLACE(phone_primary, '[^0-9]', '') ilike '%0000000%' then null
when REGEXP_REPLACE(phone_primary, '[^0-9]', '') ilike '%1111111%' then null
when REGEXP_REPLACE(phone_primary, '[^0-9]', '') ilike '%2222222%' then null
when REGEXP_REPLACE(phone_primary, '[^0-9]', '') ilike '%3333333%' then null
when REGEXP_REPLACE(phone_primary, '[^0-9]', '') ilike '%4444444%' then null
when REGEXP_REPLACE(phone_primary, '[^0-9]', '') ilike '%5555555%' then null
when REGEXP_REPLACE(phone_primary, '[^0-9]', '') ilike '%6666666%' then null
when REGEXP_REPLACE(phone_primary, '[^0-9]', '') ilike '%7777777%' then null
when REGEXP_REPLACE(phone_primary, '[^0-9]', '') ilike '%8888888%' then null
when REGEXP_REPLACE(phone_primary, '[^0-9]', '') ilike '%9999999%' then null
when REGEXP_REPLACE(phone_primary, '[^0-9]', '') ilike '%1234567%' then null
when phone_primary !~ '[0-9]' then null when len(phone_primary) < 7 then NULL
else phone_primary
end phone_primary,
Complex SQL templates:
-- remove unwanted values from a list
case
-- known values
when last_name in ('""','NULL') then null
-- wildcard values
when last_name ilike '%unknown%' then null
-- invalid length
when len(last_name) <= 1 then NULL
else INITCAP(last_name)
end last_name
-- remove values with multiple conditions
case when lower(firstname) = 'new' and lower(lastname) = 'prospect' then null else firstname end AS first_name
-- remove values with a wildcard condition
case when email like '%{VALUE1}%' then null else email end AS email
-- remove sequence of numbers, non-alpha numbers, phone numbers less than 7 characters
case
when REGEXP_REPLACE(phone_primary, '[^0-9]', '') ilike '%0000000%' then null
when REGEXP_REPLACE(phone_primary, '[^0-9]', '') ilike '%1111111%' then null
when REGEXP_REPLACE(phone_primary, '[^0-9]', '') ilike '%2222222%' then null
when REGEXP_REPLACE(phone_primary, '[^0-9]', '') ilike '%3333333%' then null
when REGEXP_REPLACE(phone_primary, '[^0-9]', '') ilike '%4444444%' then null
when REGEXP_REPLACE(phone_primary, '[^0-9]', '') ilike '%5555555%' then null
when REGEXP_REPLACE(phone_primary, '[^0-9]', '') ilike '%6666666%' then null
when REGEXP_REPLACE(phone_primary, '[^0-9]', '') ilike '%7777777%' then null
when REGEXP_REPLACE(phone_primary, '[^0-9]', '') ilike '%8888888%' then null
when REGEXP_REPLACE(phone_primary, '[^0-9]', '') ilike '%9999999%' then null
when REGEXP_REPLACE(phone_primary, '[^0-9]', '') ilike '%1234567%' then null
when phone_primary !~ '[0-9]' then null when len(phone_primary) < 7 then NULL
else phone_primary
end phone_primary,
Standardize field values
Review your data and make sure you are submitting standardized data for the PII fields into Helix. For example convert all values for the STATE field into full names or 2 letter abbreviations. The key is to not have a mixture of value types. It is also recommend to standardize your data into all upper case, lower case or popper case values.
SQL template:
select
case
when upper(<insert field>) in ('<insert value>','<insert value>','<insert value>') then '<insert value>'
else <insert field> as <insert field name>
from <insert table>
Example SQL:
-- replace values
CASE
WHEN lower(c.state) = 'ma' OR c.state = 'MASSA' THEN 'MA'
WHEN lower(c.state) = 'ct' OR c.state = 'CONNE' THEN 'CT'
ELSE UPPER(c.state)
END AS state
-- lower or upper values
select
upper(first_name) first_name,
lower(last_name) last_name,
intcap(full_name) full_name
Which records are relevant?
Not every record needs to run through Helix. Some records may not have enough valid PII data to match to another record via your matching rules. For example, if a record only has a values for last name and phone number but all of your matching rules require more than just phone number and last name then this record will never be able to match with another record. KORE recommends removing any records that do not meet your minimum level of PII data to run through the matching rules you have configured. Remember to apply the data cleaning rules from the section above to get the best results.
Sample SQL
-
example matching rules for the sample SQL statement
-
Email exact match
-
last name and phone number
-
select
<insert field>
,<insert field>
,<insert field>
from <insert table>
where
( case when upper(<email>) in ('','NAN','NULL','<insert value>','<insert value>') then NULL
else <email> is not null )
OR
( case when upper(<last name>) in ('','NAN','NULL','<insert value>','<insert value>') then NULL
else <first name> is not null
OR
case when upper(<phone number>) in ('','NAN','NULL','<insert value>','<insert value>') then NULL
else <phone number> is not null )
Additionally, not all customer records may be relevant for your business use case with Helix. For example, if your primary reason to use Helix is to add business context (Helix Tags and Attributes) to your marketable customer database and then use the audience builder to create customer segments for targeted marketing campaigns, then you should first define who is a marketable customer and only bring records that meet this criteria into Helix. For example a marketable customer may need to be someone who lives within the local market and could be sold a ticket.
There are a lot of business use cases for Helix, so KORE recommends being intentional and define your key use cases and then configure the solution to best achieve them.
Sample SQL template
select
<insert field>
,<insert field>
,<insert field>
from <insert table>
where <insert field> in ('<insert value>' , '<insert value>' , '<insert value>')
Example SQL:
-- remove records that are not identifiable
where <email> not in ('<insert value>' , '<insert value>' , '<insert value>')
-- remove records that are null
where <email> not in null
-- remove records with a wildcard condition
where email not like '%{VALUE1}%'
-- remove records that are not in market
where <zip code> in ('<insert value>' , '<insert value>' , '<insert value>')
How to map data set fields to Golden Records
Note: It is not mandatory to map every Golden Record field to your new data set. If you leave a Golden Record field unmapped, Helix will ignore this data set for that particular field.
There are two ways to map fields: automatic and manual.
- Automatic: If a column name in your SQL query results matches a Golden Record field's name, the system will automatically map it for you. You can use the
AS
keyword in your SQL query to control this, as shown in the prior examples. - Manual: For each column name from your SQL query results, choose a Golden Record field from the drop-down list.
Click the Save button to save the new data set and field mappings. Once you complete this, you can edit your matching rules to take advantage of the new data set.
Records from the new data set will be processed during the next update unless it is paused. You can click the Processing History button to view the logs and verify whether the data has been processed.