7.3. Data quality

image339

  • Data quality zone displays when you double-click a field image340. Double-click again to hide the zone.

  • Transformation suggestion zone image341.

  • Box presenting statistics for the selected field image342.

  • Box for distributing values for the selected field image343.

  • Box for managing the nature of the selected field and viewing its validity statistics image344.

  • Box for managing the field validation rules for the selected field and viewing its validity statistics image345.

  • TNV (Type – Nature – Validation Rules) zone image346 showing in chart format the data quality status of every field in the dataset.

Tip

You can click any chart in Data Quality to filter the dataset.

7.3.1. Managing natures

A nature is an item of business data that is governed by common rules known to everyone, e.g. telephone numbers, emails, etc.

image347

If no nature has been attributed to a field, this view will display.

image348

If a nature has already been attributed to a field, this view will display.

Just one nature can be attributed per field.

7.3.2. Available natures

The following natures are available:

  • Email.

  • Telephone number: FR / US (international and national); Formats: E164, International.

  • Postcode, INSEE municipality code.

  • GTIN : GTIN-8, GTIN-12, GTIN-13, GTIN-14.

  • ISBN-10.

  • IPv4.

  • SIREN, SIRET.

  • Gender, title.

  • Surname, first name, full name.

  • RIB, IBAN.

  • French Social Security number.

  • Pôle-emploi login.

  • URL.

  • Country code (ISO-3166-1 alpha2, ISO-3166-1 alpha3).

7.3.3. Field validation rules

A field vallidation rule is a business constraint that is usually known only to business experts and serves to ensure that there are no inconsistencies or, if there are inconsistencies, to measure them.

image349

If no field validation rule has been attributed, the box above will become available. Click Add a validation rule to add a new field validation rule to the selected field.

image350

The list of available field validation rules will vary, depending on the type of data in the selected field.

  • Text fields:

    • Pattern (regular expressions).

    • Minimum length.

    • Maximum length.

    • Precise length.

    • Enumeration by value.

    • Not empty.

  • Numeric fields:

    • Range of values.

    • Not empty.

  • Date fields:

    • Range of values.

    • Not empty.

Just one type of field validation rule can be attributed per field. In other words, you cannot e.g. set two minimum length field validation rules for the same field.

image351

  • List image352 of field validation rules for the selected field.

  • Editing image353 a field validation rule.

  • Deleting image354 a field validation rule.

  • The V in the TNV zone image355 shows the average validity of the field validation rules added to each field.

7.3.4. Record validation rules

Record validation rules let you check a particular record (or row) without having to write any code. The checks can involve combinations of conditions across several fields (e.g. fieldA < fieldB).

Structure of record validation rules:

  • IF : <Add conditions separated by AND/OR>.

  • Then: tick one of the two options: The row is valid if condition IF is true or The row is invalid if condition IF is true.

Click Add to create a new record validation rule:

image365

The record validation rules editor will display the following fields:

image366

  • Zone for naming and describing the rule image367 .

  • Zone for defining condition IF image368.

  • Zone for setting the rule valuation mode image369. Two options: The row is valid if condition IF is true or The row is invalid if condition IF is true.

  • Zone for previewing the result of rule evaluation image370.

To add a condition, click “Add an Expression” in the zone image371. A dialogue window will open:

image372

In the left section of a condition, you can add:

  • A field image373. Just double-click the name of the field.

  • An arithmetic expression image374, e.g. field1 + field2.

  • A function image375, e.g. contains, starts with, etc.

As the screen capture below shows, you must then add the right section of the condition by clicking Add an Expression image376 to the right of the operator:

image377

In the right section of a condition you can enter:

  • A literal (a number, text, date, etc.).

  • The value of a different field (e.g. IF field1= field2).

  • The result of a function.

The screen capture below shows an example of rule creation:

SI Le type du Véhicule est soit BERLINE, soit TS TERRAINS/CHEMINS, soit COUPE

ET que la Quantité est strictement supérieure à 35

ALORS l'enregistrement est valide.

A preview of the evaluation will be displayed in the zone image378.

image379

When the record validation rule has been saved, it will appear in the “Validation Rules” tab.

image380

You can then apply filters (image381) to the record validation rules to apply transformations only to the rows for which the rule is valid or invalid.

Note

A record validation rule can have one of three statuses: valid, invalid or error. Error status may apply if the rule has not been evaluated because e.g. an item of data is missing or malformed.

Record validation rules can be re-used in the flow via a validation processor downstream of the preparation. In the Row validation rules tab image382 of the validation processor configurator you can tick the Verify validity option to send records that are not valid for the rule in question to the second processor output.

image383