7.19. Advanced transformations

7.19.1. Divide a table

The transformation will divide the record in the input table into three records:

  • input table:

    A

    B1,B2,B3

    C

  • output table:

    A

    B1

    C

    A

    B2

    C

    A

    B3

    C

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.19.2. Set of business rules

A set of business rules allows you, without having to write any code, to:

  • Make calculations for the current record.

  • Add new fields whose values will be calculated based on the fields in the current record.

Calculation business rules are structured as follows:

  • IF: <Add conditions separated by AND/OR>.

  • Then: <Add the calculations required if condition IF has been met>.

  • Else: <Add the calculations required if condition IF has not been met>.

Click Ajouter to create a new calculation rule:

image454

The *business rules editor* will show the following zones:

image455

  • Zone for creating new calculated fields image456.

  • Zone for defining condition IF image457.

  • Zone for THEN clause calculations image458.

  • Zone for ELSE clause calculations image459.

  • Zone for previewing calculation results image460.

To add a condition, click “Add an Expression” in the zone image461. A dialogue window will open.

image462

In the left section of a condition, you can add:

  • A field image463. Just double-click the name of the field.

  • An arithmetic expression image464, e.g. field1 + field2.

  • A function image465, e.g. contains, « starts with, etc.

As the screen capture below shows, you must then add the right section of the condition by clicking Add an Expression image466 to the right of the operator:

image467

In the right section of a condition you can enter:

  • A literal (a number, text, date, etc.).

  • The value of another field (e.g. IF field1 = field2).

  • The result of a function.

To add a calculation to a THEN or ELSE clause, click + in the zone image468 or image469. A dialogue window similar to the condition editing window will let you assign the result of a calculation to fields. The calculation may be:

  • A literal (e.g. field1 = 12).

  • The value of another field (e.g. field1 = field2).

  • The result of a function (e.g. field1 = first 3 characters of field2).

  • An arithmetic expression (e.g. field1 = field2 * field3 + 10).

7.19.3. Generate UUIDs

Generate a UUID (universally unique identifier) in a new field for each current dataset record.

  • The transformation will not apply to any field.

  • The transformation will not use the active filter(s).

7.19.4. Invert Boolean values

Invert (true/false) Boolean values in selected fields:

  • True values will become false.

  • False values will become true.

  • Other values will be unchanged.

  • The transformation will apply to one or more fields.

  • The transformation will use the active filter(s).

7.19.5. Reconcile/enrich with master data

Go to the section Using repositories.

7.19.6. Script-based transformations

7.19.6.1. Description

Open-source scripting language (Groovy [16]) gives read or write access to all the cells in a record.

You can create new fields.

To do this, click New Column image471 to enter the name and type of field to be added:

image472

All basic arithmetic operations will then become available together with hundreds of functions for manipulating text, numbers and dates (the scripting language therefore allows you to create potentially complex business rules for your data).

Groovy will be available in its entirety: loops, controls, collection manipulation, variable creation, etc. (http://www.groovy-lang.org/groovy-dev-kit.html).

image473

  • Help zone image474: this offers syntax information for reading and writing in existing fields and for creating new fields.

  • Information to be provided on the script image475 : name, comment, etc.

  • Two additional tabs image476: a field selector and a function selector (see below for further details).

  • Zone for entering script text image477.

  • Autocomplete image478 (Ctrl + Space).

  • Help with the selected completion option image479.

7.19.6.2. Field selector tab

image480

  • Select the Column Selector tab image481.

  • Double-click a field image482.

  • … to input the appropriate syntax into the script zone image483.

7.19.6.3. Function Selector tab (alternative to autocomplete)

image484

  • Select the Function Selector tab image485.

  • Double-click a function image486.

  • … to input the appropriate syntax into the script zone image487.

  • This zone offers help image488 with the selected function.

7.19.6.4. Syntax validation, Run preview and Run

image489

  • Click Check Script image490 to check that the syntax is valid (essential if you want to run the script).

  • Click Preview image491 to preview script run (at this stage, script-type transformation has not yet been applied).

  • The coloured areas image492 image493 of the preview table show the fields with values read by the script.

  • The coloured areas image494 image495 of the preview table show the fields with values changed by the script.

  • Appliquer image496: applies the script to the dataset.

Example:

The following script will create a field (C) and will allot to it the contents of fields A and B.

\_row['C'] = \_row['A'] + \_row['B'] ou bien \_row.C = \_row.A + \_row.C

The second syntax is only possible if field names contain no spaces.

_row is a word used only to access an existing cell (in read or write mode).

  • The transformation will apply to one or more fields.

  • The transformation will use the active filter(s).

7.19.7. Point & click transformation

image497

  • image498 Select a piece of text: click the cell you want (left button on the mouse, depress and then release immediately). Then once the cell (and row) has been selected, drag the mouse, while keeping the left button depressed, over the text you want.

  • image499 Tale of Data will then offer several options based on the portion of the text that you have selected.

  • image500 One or more columns with a blue background will appear showing what the result of the transformation will be if you decide to apply it (preview).

  • Appliquer image501: apply the selected transformation.

Also called Smart Selection, this functionality lets the system suggest one or more actions when you select all or part of the content of a cell.

For example:

  • If you select the year in a date, the system will suggest extracting the number for the year and putting it into a new field.

  • If you select a character (e.g. ‘,’ or « ; », etc.), the system will suggest dividing the content of the cell into several different fields, separating them with the character(s) you have selected.

  • If you select a word in a sentence, the system will suggest creating a field that indicates whether the word is present in the cell (true/false).

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.19.8. Adding a quality indicator

Create a quality indicator based on several criteria: missing or invalid values, invalid rows (violations of column and/or row validation rules).

image502

  • image503 Choose a name for the indicator.The name will be used to create the column(s) containing information on the quality of each record (row).

  • image504 Select an indicator calculation method. Six options will be offered:

    • Boolean: ‘true’ means there is a quality problem with the row (concerning the types of control specified in the table areas |image505| *and image506).

    • Boolean: ‘true’ means there is no quality problem.

    • Full (0 or 1): 1 means there is a quality problem.

    • Full (0 or 1): 1 means there is no quality problem.

    • Number of control fails (0 = OK). the indicator value for each row will be the total number of control fails specified in the table areas image507 and image508.

    • Number of control fails per type of anomaly. Calculation is the same as above but in this case 5 columns (one per type of anomaly) will be added to the column total (that carries the name of the indicator) :

      • blank values

      • incorrect values (= type verification failure)

      • invalid form (e.g. invalid telephone or email)

      • values that violate a column validation rule

      • record that violates a row validation rule

image509

Number of control fails per type of anomaly calculation method: six additional columns. Five columns will be prefixed with the indicator name (KPI_1_XXX). The column bearing the name of the indicator will contain the total number of anomalies for a given row.

Two types of control can be used to calculate the quality indicator:

  • Control by field (Règles de validation de champs tab image510). Performed for each selected field:

    • Type:

      in a date field, for example, all cells containing a value that is not recognized as a valid date will be treated as incorrect during indicator calculation.

    • Nature:

      in a field identified as containing emails, for example, all cells containing a value that is not recognized as a valid email will be treated as incorrect during indicator calculation.

    • Presence:

      all cells that do not contain a value will be treated as incorrect during indicator calculation.

    • Field validation rules will be controlled in an upstream preparation node.

      All cells in the specified field that contain a value that is invalid in rule terms (e.g. list of permitted values or validity interval) will be treated as incorrect during indicator calculation.

  • Control by record (Règles de validation de lignes tab image511). Record validity will be checked for each ticked rule. All invalid records will be treated as incorrect during indicator calculation.