7.3. Data quality

image339

  • Data quality zone displays when you double-click a field image340. Double-click again to hide the zone.

  • Transformation suggestion zone image341.

  • Box presenting statistics for the selected field image342.

  • Box for distributing values for the selected field image343.

  • Box for managing the nature of the selected field and viewing its validity statistics image344.

  • Box for managing the field validation rules for the selected field and viewing its validity statistics image345.

  • TNV (Type – Nature – Validation Rules) zone image346 showing in chart format the data quality status of every field in the dataset.

Tip

You can click any chart in Data Quality to filter the dataset.

7.3.1. Managing natures

A nature is an item of business data that is governed by common rules known to everyone, e.g. telephone numbers, emails, etc.

image347

If no nature has been attributed to a field, this view will display.

image348

If a nature has already been attributed to a field, this view will display.

Just one nature can be attributed per field.

7.3.2. Available natures

The following natures are available:

  • Email.

  • Telephone number: FR / US (international and national); Formats: E164, International.

  • Postcode, INSEE municipality code.

  • GTIN : GTIN-8, GTIN-12, GTIN-13, GTIN-14.

  • ISBN-10.

  • IPv4.

  • SIREN, SIRET.

  • Gender, title.

  • Surname, first name, full name.

  • RIB, IBAN.

  • French Social Security number.

  • Pôle-emploi login.

  • URL.

  • Country code (ISO-3166-1 alpha2, ISO-3166-1 alpha3).

7.3.3. Add Field validation rules

A field validation rule is a business constraint that is usually known only to business experts and serves to ensure that there are no inconsistencies or, if there are inconsistencies, to measure them.

image349

If no field validation rule has been attributed, the box above will become available. Click Add a validation rule to add a new field validation rule to the selected field.

image350

The list of available field validation rules will vary, depending on the type of data in the selected field.

  • Text fields:

    • Pattern (regular expressions).

    • Minimum length.

    • Maximum length.

    • Precise length.

    • Enumeration by value.

    • Not empty.

  • Numeric fields:

    • Range of values.

    • Not empty.

  • Date fields:

    • Range of values.

    • Not empty.

Just one type of field validation rule can be attributed per field. In other words, you cannot e.g. set two minimum length field validation rules for the same field.

image351

  • List image352 of field validation rules for the selected field.

  • Editing image353 a field validation rule.

  • Deleting image354 a field validation rule.

  • The V in the TNV zone image355 shows the average validity of the field validation rules added to each field.

Note

Practical example : Add Field Validation Rule

Before Transformation:

ID

Name

Email

Status

1

Alice

alice@email.com

Active

2

Bob

bob@email.com

Pending

3

Carol

carol@email.com

Cancelled

Transformation Configuration:

Rule: add an Enumeration by value constraint to the status field. Allowed values are : Active, Pending.

After Transformation:

ID

Name

Email

Status

1

Alice

alice@email.com

Active

2

Bob

bob@email.com

Pending

3

Carol

carol@email.com

Cancelled*

*This example shows that after adding the Field Validation Rule, the value Cancelled will be considered invalid (in the preparation editor, a purple bar will appear on the right of the cell).

7.3.4. Add a Record validation rules

Record validation rules let you check a particular record (or row) without having to write any code. The checks can involve combinations of conditions across several fields (e.g. fieldA < fieldB).

Structure of record validation rules:

  • IF : <Add conditions separated by AND/OR>.

  • Then: tick one of the two options: The row is valid if condition IF is true or The row is invalid if condition IF is true.

Click Add to create a new record validation rule:

image365

The record validation rules editor will display the following fields:

image366

  • Zone for naming and describing the rule image367 .

  • Zone for defining condition IF image368.

  • Zone for setting the rule valuation mode image369. Two options: The row is valid if condition IF is true or The row is invalid if condition IF is true.

  • Zone for previewing the result of rule evaluation image370.

To add a condition, click “Add an Expression” in the zone image371. A dialogue window will open:

image372

In the left section of a condition, you can add:

  • A field image373. Just double-click the name of the field.

  • An arithmetic expression image374, e.g. field1 + field2.

  • A function image375, e.g. contains, starts with, etc.

As the screen capture below shows, you must then add the right section of the condition by clicking Add an Expression image376 to the right of the operator:

image377

In the right section of a condition you can enter:

  • A literal (a number, text, date, etc.).

  • The value of a different field (e.g. IF field1= field2).

  • The result of a function.

The screen capture below shows an example of rule creation:

IF the Body Type is in HATCHBACK, SUV or COUPE

AND the Quantity is greater than 35

THEN the record is valid.

A preview of the evaluation will be displayed in the zone image378.

image379

When the record validation rule has been saved, it will appear in the “Validation Rules” tab.

image380

You can then apply filters (image381) to the record validation rules to apply transformations only to the rows for which the rule is valid or invalid.

Note

A record validation rule can have one of three statuses: valid, invalid or error. Error status may apply if the rule has not been evaluated because e.g. an item of data is missing or malformed.

Record validation rules can be re-used in the flow via a validation processor downstream of the preparation. In the Row validation rules tab image382 of the validation processor’s configurations you can tick the Verify validity option to send records that are not valid for the rule in question to the second processor output.

image383

Note

Practical example : Add Record Validation Rule

Before Transformation:

ID

Order

Order Date

Delivery Date

Delivery Mode

1

Order #1

15/09/2024

18/09/2024

Express

2

Order #2

17/09/2024

21/09/2024

Normal

Transformation Configuration:

Rule: If the delivery method is Express, then the time between the order date and the delivery date must not exceed one day

After Transformation:

ID

Order

Order Date

Delivery Date

Delivery Mode

1

Order #1

15/09/2024

18/09/2024

Express

2

Order #2

17/09/2024

21/09/2024

Normal

In the example above, the order with ID 1 will have an invalid marker because 3 days have elapsed between the order date and the delivery date. This marker can be used in further processing to isolate rows for which the delivery service did not work correctly.