7.19. Advanced transformations
7.19.1. Divide a table
The transformation will divide the record in the input table into three records:
input table:
A
B1,B2,B3
C
output table:
A
B1
C
A
B2
C
A
B3
C
The transformation will apply to just one field.
The transformation will use the active filter(s).
7.19.2. Set of business rules
A set of business rules allows you, without having to write any code, to:
Make calculations for the current record.
Add new fields whose values will be calculated based on the fields in the current record.
Calculation business rules are structured as follows:
IF: <Add conditions separated by AND/OR>.
Then: <Add the calculations required if condition IF has been met>.
Else: <Add the calculations required if condition IF has not been met>.
Click Ajouter to create a new calculation rule:
The *business rules editor* will show the following zones:
To add a condition, click “Add an Expression” in the zone . A dialogue window will open.
In the left section of a condition, you can add:
As the screen capture below shows, you must then add the right section of the condition by clicking Add an Expression to the right of the operator:
In the right section of a condition you can enter:
A literal (a number, text, date, etc.).
The value of another field (e.g. IF field1 = field2).
The result of a function.
To add a calculation to a THEN or ELSE clause, click + in the zone or . A dialogue window similar to the condition editing window will let you assign the result of a calculation to fields. The calculation may be:
A literal (e.g. field1 = 12).
The value of another field (e.g. field1 = field2).
The result of a function (e.g. field1 = first 3 characters of field2).
An arithmetic expression (e.g. field1 = field2 * field3 + 10).
7.19.3. Generate UUIDs
Generate a UUID (universally unique identifier) in a new field for each current dataset record.
The transformation will not apply to any field.
The transformation will not use the active filter(s).
7.19.4. Invert Boolean values
Invert (true/false) Boolean values in selected fields:
True values will become false.
False values will become true.
Other values will be unchanged.
The transformation will apply to one or more fields.
The transformation will use the active filter(s).
7.19.5. Reconcile/enrich with master data
Go to the section Using repositories.
7.19.6. Script-based transformations
7.19.6.1. Description
Open-source scripting language (Groovy [16]) gives read or write access to all the cells in a record.
You can create new fields.
To do this, click New Column to enter the name and type of field to be added:
All basic arithmetic operations will then become available together with hundreds of functions for manipulating text, numbers and dates (the scripting language therefore allows you to create potentially complex business rules for your data).
Groovy will be available in its entirety: loops, controls, collection manipulation, variable creation, etc. (http://www.groovy-lang.org/groovy-dev-kit.html).
7.19.6.2. Field selector tab
7.19.6.3. Function Selector tab (alternative to autocomplete)
7.19.6.4. Syntax validation, Run preview and Run
Click Check Script to check that the syntax is valid (essential if you want to run the script).
Click Preview to preview script run (at this stage, script-type transformation has not yet been applied).
The coloured areas of the preview table show the fields with values read by the script.
The coloured areas of the preview table show the fields with values changed by the script.
Example:
The following script will create a field (C) and will allot to it the contents of fields A and B.
\_row['C'] = \_row['A'] + \_row['B'] ou bien \_row.C = \_row.A + \_row.C
The second syntax is only possible if field names contain no spaces.
_row
is a word used only to access an existing cell (in read or write mode).
The transformation will apply to one or more fields.
The transformation will use the active filter(s).
7.19.7. Point & click transformation
Select a piece of text: click the cell you want (left button on the mouse, depress and then release immediately). Then once the cell (and row) has been selected, drag the mouse, while keeping the left button depressed, over the text you want.
Tale of Data will then offer several options based on the portion of the text that you have selected.
One or more columns with a blue background will appear showing what the result of the transformation will be if you decide to apply it (preview).
Also called Smart Selection, this functionality lets the system suggest one or more actions when you select all or part of the content of a cell.
For example:
If you select the year in a date, the system will suggest extracting the number for the year and putting it into a new field.
If you select a character (e.g. ‘,’ or « ; », etc.), the system will suggest dividing the content of the cell into several different fields, separating them with the character(s) you have selected.
If you select a word in a sentence, the system will suggest creating a field that indicates whether the word is present in the cell (true/false).
The transformation will apply to just one field.
The transformation will use the active filter(s).
7.19.8. Adding a quality indicator
Create a quality indicator based on several criteria: missing or invalid values, invalid rows (violations of column and/or row validation rules).
Choose a name for the indicator.The name will be used to create the column(s) containing information on the quality of each record (row).
Select an indicator calculation method. Six options will be offered:
Boolean: ‘true’ means there is a quality problem with the row (concerning the types of control specified in the table areas |image505| *and ).
Boolean: ‘true’ means there is no quality problem.
Full (0 or 1): 1 means there is a quality problem.
Full (0 or 1): 1 means there is no quality problem.
Number of control fails (0 = OK). the indicator value for each row will be the total number of control fails specified in the table areas and .
Number of control fails per type of anomaly. Calculation is the same as above but in this case 5 columns (one per type of anomaly) will be added to the column total (that carries the name of the indicator) :
blank values
incorrect values (= type verification failure)
invalid form (e.g. invalid telephone or email)
values that violate a column validation rule
record that violates a row validation rule
Number of control fails per type of anomaly calculation method: six additional columns. Five columns will be prefixed with the indicator name (KPI_1_XXX). The column bearing the name of the indicator will contain the total number of anomalies for a given row.
Two types of control can be used to calculate the quality indicator:
Control by field (Règles de validation de champs tab ). Performed for each selected field:
Type:
in a date field, for example, all cells containing a value that is not recognized as a valid date will be treated as incorrect during indicator calculation.
Nature:
in a field identified as containing emails, for example, all cells containing a value that is not recognized as a valid email will be treated as incorrect during indicator calculation.
Presence:
all cells that do not contain a value will be treated as incorrect during indicator calculation.
- Field validation rules will be controlled in an upstream preparation node.
All cells in the specified field that contain a value that is invalid in rule terms (e.g. list of permitted values or validity interval) will be treated as incorrect during indicator calculation.
Control by record (Règles de validation de lignes tab ). Record validity will be checked for each ticked rule. All invalid records will be treated as incorrect during indicator calculation.