7.3. Data quality
Data quality zone displays when you double-click a field . Double-click again to hide the zone.
Box for managing the nature of the selected field and viewing its validity statistics .
Box for managing the field validation rules for the selected field and viewing its validity statistics .
TNV (Type – Nature – Validation Rules) zone showing in chart format the data quality status of every field in the dataset.
Tip
You can click any chart in Data Quality to filter the dataset.
7.3.1. Managing natures
A nature is an item of business data that is governed by common rules known to everyone, e.g. telephone numbers, emails, etc.
If no nature has been attributed to a field, this view will display.
If a nature has already been attributed to a field, this view will display.
Just one nature can be attributed per field.
7.3.2. Available natures
The following natures are available:
Email.
Telephone number: FR / US (international and national); Formats: E164, International.
Postcode, INSEE municipality code.
GTIN : GTIN-8, GTIN-12, GTIN-13, GTIN-14.
ISBN-10.
IPv4.
SIREN, SIRET.
Gender, title.
Surname, first name, full name.
RIB, IBAN.
French Social Security number.
Pôle-emploi login.
URL.
Country code (ISO-3166-1 alpha2, ISO-3166-1 alpha3).
7.3.3. Field validation rules
A field vallidation rule is a business constraint that is usually known only to business experts and serves to ensure that there are no inconsistencies or, if there are inconsistencies, to measure them.
If no field validation rule has been attributed, the box above will become available. Click Add a validation rule to add a new field validation rule to the selected field.
The list of available field validation rules will vary, depending on the type of data in the selected field.
Text fields:
Pattern (regular expressions).
Minimum length.
Maximum length.
Precise length.
Enumeration by value.
Not empty.
Numeric fields:
Range of values.
Not empty.
Date fields:
Range of values.
Not empty.
Just one type of field validation rule can be attributed per field. In other words, you cannot e.g. set two minimum length field validation rules for the same field.
7.3.4. Record validation rules
Record validation rules let you check a particular record (or row) without having to write any code. The checks can involve combinations of conditions across several fields (e.g. fieldA < fieldB).
Structure of record validation rules:
IF : <Add conditions separated by AND/OR>.
Then: tick one of the two options: The row is valid if condition IF is true or The row is invalid if condition IF is true.
Click Add to create a new record validation rule:
The record validation rules editor will display the following fields:
Zone for setting the rule valuation mode . Two options: The row is valid if condition IF is true or The row is invalid if condition IF is true.
To add a condition, click “Add an Expression” in the zone . A dialogue window will open:
In the left section of a condition, you can add:
As the screen capture below shows, you must then add the right section of the condition by clicking Add an Expression to the right of the operator:
In the right section of a condition you can enter:
A literal (a number, text, date, etc.).
The value of a different field (e.g. IF field1= field2).
The result of a function.
The screen capture below shows an example of rule creation:
SI Le type du Véhicule est soit BERLINE, soit TS TERRAINS/CHEMINS, soit COUPE
ET que la Quantité est strictement supérieure à 35
ALORS l'enregistrement est valide.
A preview of the evaluation will be displayed in the zone .
When the record validation rule has been saved, it will appear in the “Validation Rules” tab.
You can then apply filters () to the record validation rules to apply transformations only to the rows for which the rule is valid or invalid.
Note
A record validation rule can have one of three statuses: valid, invalid or error. Error status may apply if the rule has not been evaluated because e.g. an item of data is missing or malformed.
Record validation rules can be re-used in the flow via a validation processor downstream of the preparation. In the Row validation rules tab of the validation processor configurator you can tick the Verify validity option to send records that are not valid for the rule in question to the second processor output.