7.5. Transformation configuration and use

image402

  • Transformations image403.

  • Palette of available transformations image404. Click to select a transformation.

Caution

If filters are added during a transformation, that transformation will apply only to the dataset records that have been accepted by the filter.

image405

  • Help zone image406 describing what the transformation does (hide it by clicking ‘-‘ in the top left of the zone).

  • List of fields image407 to which the transformation will apply (you can select multiple fields using Shift or Ctrl) and of all the options available for the current transformation.

  • Apply image408 applies the current transformation to the dataset. Preview image409 shows the changes the current transformation will make.

7.6. Quick access to transformations

Field Browser gives fast access to some frequently used transformations. To call these up, right-click the selected field(s) in Field Browser:

image410

Tip

The Field Browser pop-up menu also lets you add filters to selected fields

7.7. Viewing the changes made by a transformation

The Change View Window of a transformation presents as shown below:

image411

Changes are organized using 5 criteria:

  • Input field (white).

  • Modified field (beige).

  • Modified value (yellow).

  • New field (blue).

  • Deleted field (gray).

You can view the changes made by a transformation in 2 locations:

  • In the configuration zone of any transformation, i.e. before the transformation is applied by clicking Preview:

    image412

  • In Transformation History, i.e. after you have clicked to apply the transformation image413:

    image414

7.8. Field transformations

7.8.1. Add a Column

Create a new field. By default, this will be empty. If you enter the default value, you can allocate the same value to all the cells.

  • The transformation will use the active filter(s).

Note

Practical example : Add a Column

Before Transformation:

ID

Name

Age

1

Alice

30

2

Bob

25

Transformation Configuration:

  • New Column Name: “Location”

  • Column Type: “Text”

  • Default Value: “Unknown”

After Transformation:

ID

Name

Age

Location

1

Alice

30

Unknown

2

Bob

25

Unknown

In this example, the transformation adds a new column named “Location” to the dataset. Each row in the dataset is updated to include the default value “Unknown” in the new column.

7.8.2. Concat columns

Create a new field by concatenating multiple existing fields.

  • The transformation will apply to multiple fields.

  • The transformation will use the active filter(s).

Note

Practical example : Concat Columns

Before Transformation:

First name

Name

Date of Birth

John

Doe

1990-01-01

Jane

Smith

1985-05-23

Transformation Configuration:

  • Target Columns: FirstName, LastName, DateOfBirth

  • New column name: FullName

  • Join Text: ” “

  • Concat missing values: true

  • Delete concatenated columns: true

After Transformation:

FullName

John Doe 1990-01-01

Jane Smith 1985-05-23

In this example, the transformation takes the values from the ‘FirstName’, ‘LastName’, and ‘DateOfBirth’ columns and merges them into a new ‘FullName’ column. Each piece of data is separated by a space, providing a clear, readable format that combines critical information into a single column. As expected, the concatenated columns have been deleted.

7.8.3. Copy a Column

Create a new field that is a copy of an existing field.

  • The transformation will apply to just one field.

  • The transformation will not use the active filter(s).

Note

Practical example : Copy a Column

Before Transformation:

ID

Name

Age

1

Alice

30

2

Bob

25

3

Carol

28

Transformation Configuration:

  • Target Column: Age

  • Target Column: Age_Copy

After Transformation:

ID

Name

Age

Age_Copy

1

Alice

30

30

2

Bob

25

25

3

Carol

28

28

In this example, the transformation has successfully created a new column named Age_Copy and duplicated the data from the Age column into it for each record. This allows for data manipulation or analysis on Age_Copy while keeping the original Age data intact.

7.8.4. Split Column

Divide a field into multiple fields by using one or more separators.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.8.5. Merge columns

Fill in the missing values in the first field with the first value found in the other selected fields.

  • The transformation will apply to multiple fields.

  • The transformation will use the active filter(s).

7.8.6. Change Column type

Change or override field type (e.g. change field type from numeric to text).

  • The transformation will apply to one or more fields.

  • The transformation will not use the active filter(s).

Note

Practical example : Change Column type

Before Transformation:

ID

Name

Date of Birth

1

Alice

“1985-04-12”

2

Bob

“1990-08-23”

Transformation Configuration:

  • Target Column: City

  • New Data Type: Date (from String)

After Transformation:

ID

Name

Date of Birth

1

Alice

1985-04-12T00:00:00

2

Bob

1990-08-23T00:00:00

In this example, the “Date of Birth” column was originally stored as strings. The transformation configuration specifies that this column should be converted to a Date type. After applying the transformation, the “Date of Birth” column in each row reflects the new data type: Dates instead of Strings.

7.8.7. Fill the missing values

Fill in the missing values in one or more fields with the last non-blank value found. The table below provides an example:

Date

Forecast

11/05/2019

800

12/05/2019

13/05/2019

21/05/2019

950

22/05/2019

23/05/2019

After the transformation, the following table will be produced:

Date

Forecast

11/05/2019

800

12/05/2019

800

13/05/2019

800

21/05/2019

950

22/05/2019

950

23/05/2019

950

  • The transformation will apply to one or more fields.

  • The transformation will use the active filter(s).

7.8.8. Rename Column

Change the name of a field.

  • The transformation will apply to a single field.

  • The transformation will not use the active filter(s).

7.8.9. Reorder Columns

Reorder the fields in a dataset.

  • The transformation will apply to multiple fields.

  • The transformation will not use the active filter(s).

7.8.10. Remove Column

Delete one or more fields.

  • The transformation will apply to one or more fields.

  • The transformation will not use the active filter(s).

7.8.11. Transpose columns to rows

Transpose the values of selected fields in one record per field, e.g. in the dataset below:

Country Name

1960

1961

Afghanistan

414.371

491.378

Angola

550.05

454.708

Albania

2024.184

2280.874

Using the following parameters to make the transformation:

  • List of fields: 1960 and 1961.

  • Header sink field: Year.

  • Value sink field: Level.

Result:

Country Name

Year

Level

Afghanistan

1960

414.371

Angola

1960

550.05

Albania

1960

2024.184

Afghanistan

1961

491.378

Angola

1961

454.708

Albania

1961

2280.874

  • The transformation will apply to multiple fields.

  • The transformation will not use the active filter(s).

7.8.12. Mask sensitive data

Hide sensitive data to protect its confidentiality.

Several masking strategies are available:

  • Semantic masking

  • Deterministic masking

  • Mask everything

  • Mask the first n characters

  • Keep the first n characters

  • Mask the last n characters

  • Mask the characters between two positions

  • Keep the characters between two positions

  • Mask digits

Use Preview for an advance view of masking results.

  • The transformation will apply to multiple text fields.

  • The transformation will use the active filter(s).

7.9. Record transformations

7.9.1. Delete records

Delete Filtered Rows

  • The transformation will use the active filter(s).

7.9.2. Delete Empty Rows

Delete blank records. In a blank record all the fields will be blank.

  • The transformation will not use the active filter(s).

7.10. Text transformations

7.10.1. Add a prefix

Add a prefix to the values in the selected field.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

Note

Practical example : Add a prefix

Before Transformation:

ID

Name

Phone Number

1

John Doe

123456789

2

Jane Smith

3

Bob Brown

987654321

Transformation Configuration:

  • Target Column: Phone Number

  • Prefix: +1-

  • Add to Missing Values: Yes

After Transformation:

ID

Name

Phone Number

1

John Doe

+1-123456789

2

Jane Smith

+1-

3

Bob Brown

+1-987654321

In this example, the transformation adds the prefix “+1-” to each phone number. For Jane Smith, who originally had an empty phone number, the prefix is added because the ‘Add to Missing Values’ option is enabled, resulting in “+1-“.

7.10.2. Add a suffix

Add a suffix to the values in the selected field.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

Note

Practical example : Add a suffix

Before Transformation:

ID

Name

City

1

Alice

New York

2

Bob

3

Charlie

Los Angeles

Transformation Configuration:

  • Target Column: City

  • Suffix: ” - USA”

  • Add to Missing Values: Yes

After Transformation:

ID

Name

City

1

Alice

New York - USA

2

Bob

- USA

3

Charlie

Los Angeles - USA

In this example, the transformation appends “- USA” to each entry in the ‘City’ column. Notice that even the blank entry for Bob has been updated to include the suffix, demonstrating the effect of enabling the addition of the suffix to missing values.

7.10.3. Attribute a value to cells

Attribute the same value to all the cells in a field. If the new value is blank, the content of the cells will be deleted.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.10.4. Capitalize

Capitalize the first letter of each word in each cell in the selected field(s). Can be applied to the first word only by selecting the First word only option.

  • The transformation will apply to one or more fields.

  • The transformation will use the active filter(s).

Note

Practical example : Capitalize a selection

Before Transformation:

ID

Description

Charlie

1

APPLE PIE FROM STORE

bakery items

2

fresh orange JUICE

beverages

3

chocolate Bar

confectionery

Transformation Configuration:

  • Selected Columns: Description, Category

  • Capitalization Type: Capitalize each word

After Transformation:

ID

Description

Charlie

1

Apple Pie From Store

Bakery Items

2

Fresh Orange Juice

Beverages

3

Chocolate Bar

Confectionery

In this example, the transformation is configured to capitalize each word in the ‘Description’ and ‘Category’ columns. After applying the transformation, all the words in these columns start with an uppercase letter, enhancing the text’s appearance and making it more formal and readable.

7.10.5. Right Pad

Complete to the right, with a given string of characters, all cells in the selected field.

For example:

  • A cell bcd, a completion string a and a length 4, will give bcda.

  • A cell bcda, a completion string a and a length 4, will give bcda.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.10.6. Left Pad

Complete to the left, with a given string of characters, all cells in the selected field.

For example:

  • A cell bcd, a completion string a and a length 4, will give bcda.

  • A cell abcd, a completion string a and a length 4, will give bcda.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.10.7. Convert to lowercase

Convert all the characters in each cell of the selected field to lower case

For example: “CONTENT of a Cell” will become “content of a cell”.

  • The transformation will apply to one or more fields.

  • The transformation will use the active filter(s).

7.10.8. Convert to uppercase

Convert all the characters in each cell of the selected field to upper case.

For example: “CONTENT of a Cell” will become “content of a cell”.

  • The transformation will apply to one or more fields.

  • The transformation will use the active filter(s).

7.10.9. Convert from HTML entities

Convert the HTML entities of each cell in the selected field to characters.

For example:

  • Entité HTML will become Entité HTML.

  • Entité HTML devient Entité HTML.

  • The transformation will apply to one or more fields.

  • The transformation will use the active filter(s).

7.10.10. Escape HTML entities

Convert the eligible characters in each cell in the selected field into their HTML entity format.

For example: Entité HTML will become Entité HTML.

  • The transformation will apply to one or more fields.

  • The transformation will use the active filter(s).

7.10.11. Unescape Unicode

Convert Unicode [7] code points into characters (glyphs) in each selected field.

For example:

  • Biblioth\\u00E8que will become Bibliothèque.

  • Mus\\  u00E9e will become Musée.

For further information :

  • Unicode on Wikipedia.

  • Unicode code point in Wikipedia.

  • The transformation will apply to one or more fields.

  • The transformation will use the active filter(s).

7.10.12. Normalize spaces

Delete consecutive spaces (and any character similar to a space: tabs, returns, etc.) and replace them with a single space (ASCII + 10).

  • The transformation will apply to one or more fields.

  • The transformation will use the active filter(s).

7.10.13. Find / Replace

Search/replace using either plain text or a regular expression. This can be done for all types of field and unrecognised values.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.10.14. Infer Language

Recognize the language of each cell in the selected field. The language recognized will be stored in a new field. Available language formats are:

  • 2-letter language code, e.g. en, fr, de, etc. [8]

  • 3-letter language code (ISO 639-2), e.g. eng, fra, deu, etc. [9]

  • Localized language name, e.g. français, anglais for localized French, or French, English for localized English.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.10.15. Remove HTML code

Delete HTML tags, replace HTML and Unicode entities with their text equivalents in the selected fields.

  • The transformation will apply to one or more fields.

  • The transformation will use the active filter(s).

7.10.16. Remove diacritics

Delete all diacritics [10] from each cell in the selected field. Accents too are diacritics.

The characters in the following Unicode categories are also considered diacritics:

  • LM LETTER, Modifier.

  • SK SYMBOL, Modifier.

  • MN MARK, Nonspacing.

  • The transformation will apply to one or more fields.

  • The transformation will use the active filter(s).

7.10.17. Remove spaces from both ends

Delete spaces and and all characters similar to spaces (tabs, returns, etc.) at the start or end of a text.

  • The transformation will apply to one or more fields.

  • The transformation will use the active filter(s).

7.11. Numeric transformations

7.11.1. Round a Decimal

Decimal rounding:

  • If the strategy is to the nearest whole number , the decimal will be rounded up or down to the nearest whole number (2.6 will become 3 and 2.4 will become 22).

  • If the strategy is to the next whole number above, the decimal will be rounded up to the nearest whole number (2.1 will become 3).

  • If the strategy is to the next whole number below, the decimal will be rounded down to the nearest whole number (2.6 will become 2)

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.11.2. Create intervals of fixed-size

Create fixed-size intervals from a numeric column.

image415

Example:

  • [0, 10[

  • [10, 20[

  • [20, 30[ etc.

This transformation can be used to create e.g. histograms (i.e. bar charts) using continuous values.

image416

  • image417 Choose the name of the numeric column that will be used to create the intervals.

  • image418 Choose the name of the new column that will contain the intervals.

  • image419 Set the size of the intervals.

  • image420 Set a lower limit for the first interval and an upper limit for the last interval (this is optional). If a lower limit B is set, all values below that limit will fall into interval ]-∞, B[. If an upper limit B’**is set |image421|, all values at or above that limit will fall into interval **[B’, +∞[.

  • image422 Tick the option to get a text column with [X, Y[ intervals, meaning that the value is >= X and < Y.

  • image423 Tick the option to get two numeric columns. The first will contain the lower interval limits (inclusive) and the second will contain the upper interval limits (exclusive).

Result preview:

image424

7.11.3. Convert measurement units

Convert one unit of measurement into another (e.g. convert kilometers into miles).

The source unit of measurement can also be detected automatically.

image425

This transformation can be done in 2 ways:

  • Simple conversion: from one numeric field to another, possibly the same, numeric field.

  • Conversion with automatic unit recognition: from one text field to a new numeric field.

  • The transformation will apply to one or more fields.

  • The transformation will use the active filter(s).

7.11.4. Extract Numbers

Extract the numbers from a text field and enter them in new fields.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.12. Date transformations

7.12.1. Add Current Date Time

The goal of the “Add Current Date Time” transformation is to add a new column to a dataset that records the current date and time when the transformation is applied. This can be particularly useful for tracking when data was processed or updated.

How it works:

To configure this transformation, you need to specify the name of the new column where the current time will be recorded. This is done by setting a parameter that determines the target column name. The transformation will automatically add this new column to each row in your dataset, populating it with the exact date and time of the transformation’s execution.

Note

Practical example : Add Current Date Time

Before Transformation:

OrderID

CustomerName

Product

Quantity

001

John Doe

Laptop

1

002

Jane Smith

Tablet

2

Transformation Configuration:

You decide to add a column named “ProcessedTime” to track when each record was processed.

After Transformation:

OrderID

CustomerName

Product

Quantity

ProcessedTime

001

John Doe

Laptop

1

07/12/2021 10:00 AM

002

Jane Smith

Tablet

2

07/12/2021 10:00 AM

In this example, “07/12/2021 10:00 AM” represents the date and time when the transformation was executed, and this timestamp is added to each row in the new “ProcessedTime” column. Important: after application of the transformation, all the rows in the dataset will have the same value in the column ProcessedTime.

7.12.2. Convert an integer to date

Convert a whole number into a date.

The input number may be a literal representation of the date (e.g. 20190715 = 15 July 2019) or it may represent the number of days or milliseconds since a start date. For example, if you indicate that your number is taken from Microsoft Excel, 39448 will be converted to 1 January 2008 because there are exactly 39 448 days between 1 January 1900 and 1 January 2008.

  • The transformation will apply to one or more fields.

  • The transformation will use the active filter(s).

7.12.3. Extract Date Part

Extract part of a date field (Year, Quarter, Month, Day ) and enter it in a new field.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.12.4. Format a date using a custom format

Format the date in a date field using a custom format and then enter it in a new field.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.12.5. Format a date using a predefined pattern

Format the date in a date field using a preset format and then enter it in a new field.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.12.6. Dates interval

Calculate the date interval between two date fields. A new field will be created with the result of the calculation, expressed in Weeks, Days, Hours, Minutes, Seconds or Milliseconds, as you prefer.

  • The transformation will apply to one or more fields.

  • The transformation will use the active filter(s).

7.12.7. Date recognition

At import, the system will attempt to decrypt all not recognized cells within a date field as dates by trying various formats and countries (e.g. 2nd March 2015, 2015.03.02, etc.). The selected formats must be validated by the user.

image426

  1. The first date field in the dataset will be automatically selected image427 (when the field has been processed, select the next one manually).

  2. The system will try to discover the format (e.g. YYY-MM-dd) image428 that recognizes the greatest number of dates in the field (that number, and the percentage it represents, will be shown in the drop-down list). If no such format can be found or if you prefer not to use automatic recognition, you can enter your own format in the zone image429.

  3. Zone offering help image430 with building your own date formats.

  4. Results preview image431: cells are green if the format shown in the zone image432 recognizes the cell as a date (left column shows the raw value that was not recognized at import; the right column shows the date as recognized using the format specified in the zone image433).

  5. Apply image434.

Note

Repeat steps 1-5 until the zone image435 is completely empty (i.e. all the cells have been recognized).

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

Hint

For custom formats, in the vast majority of cases you must use yyyy rather than capitals for four-figure years. This is because * y represents the calendar year (which is widely used in practice), while * Y represents the “Week Year”. Depending on whether the new year starts in a new week, the last few days of the year may fall into the following year.

For more information, see: https://en.wikipedia.org/wiki/ISO_8601#Week_dates

7.13. Clustering transformations

Group words/texts with similar spelling together in a given field. Algorithms will create these clusters but the decision whether to standardize the text in the clustered cells is left up to the user.

image436

  • Selection of the field image437 you want to explore for potential clustering.

  • Open preview image438. Clusters will have been calculated to enable you to make a decision.

  • The zone image439 will display the clusters found: 1 row = 1 cluster (e.g. in the fig. above the system indicates that it has found 3 similar spellings for MONOSPACE and that there are 48 rows in this cluster).

  • Click Display cluster details image440 (or double-click the zone image441) to go to Details and make a decision on each cluster.

image442

  • Cluster of similarly spelled words image443. Deselect one or more options to exclude them from the final cluster (= false positives).

  • When you select an option in this zone image444, the system will display the relevant records in the zone image445 (making it easier to e.g. identify false positives).

  • Select a replacement for all similar spellings (or double-click an option in the zone image446 to select it automatically as the replacement).

  • Use Accept then Next image447 to confirm with a single click all the decisions you have made about the current cluster (i.e. selections made in the zone image448 + replacement text in the zone image449) and move on to the next cluster. Click Next to move on to the next cluster without validating the current cluster (i.e. without making any replacements).

Apply: click to apply a replacement in all clusters validated during the 4 steps described above.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.13.1. Phonetic clustering

Clusters of cells with similar pronunciations, e.g. telephone and telefones.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.13.2. Consonant-first fingerprint

Cluster of character-strings, based on the consonant sequence in the text.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.13.3. Exact match fingerprint

Exact value match clusters

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.13.4. Universal fingerprint

Match similar texts using a combination of strategies:

  • Removal of non-meaningful words (articles, pronouns, coordinating conjunctions etc.).

  • Removal of double vowels and double consonants.

  • Removal of common abbreviations.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.13.5. Vowels-first fingerprint

Cluster of character-strings, based on the vowel sequence in the text.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.13.6. N-Gram clustering

Cell clustering based on common letter pairs (or triplets, quadruplets, etc. depending on the value of N).

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.14. Deduplication transformations: multi-algorithm deduplication

Tale of Data allows multi-field (user selects the fields to be used when deciding whether to match two rows) and multi-algorithm (user selects the match algorithm for each field: phonetic, N-Gram, etc.) deduplication:

image450

  • Select the deduplication fields image451.

  • Select the algorithm image452 to be used for each of the columns.

  • Click Validate image453 to start deduplication.

  • The transformation will apply to one or more fields.

  • The transformation will use the active filter(s).

7.15. Nature transformations

7.15.1. Define Column Nature

Change or override the nature of a field (e.g. override ‘telephone number’ as the nature of a numeric column).

  • The transformation will apply to one or more fields.

  • The transformation will not use the active filter(s).

7.15.2. Remove Column Nature

Delete the nature of one or more fields.

  • The transformation will apply to one or more fields.

  • The transformation will not use the active filter(s).

7.15.3. Split URL

From a URL, extract the: scheme (http, ftp, etc.), domain name (www.taleofdata.com, www.google.fr, etc.), port, path or query from the URLs of the selected field to one or more new fields.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.15.4. Decode URLs

Decode each URL in the selected field into application/x-www-form-urlencoded format. For example:

  • http%3A%2F%2Fwww.taleofdata.com will become http://www.taleofdata.com.

  • The transformation will apply to one or more fields.

  • The transformation will use the active filter(s).

7.15.5. Encode URL

Encode each URL in the selected field into application/x-www-form-urlencoded format. For example:

  • http://www.taleofdata.com will become http%3A%2F%2Fwww.taleofdata.com.

  • The transformation will apply to one or more fields.

  • The transformation will use the active filter(s).

7.15.6. Shape First Names

Format first names (standardize spaces, capitalization) in the selected field.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.15.7. Extract Full Names parts

Extract first names, surnames or titles from a field containing full names. This will also deduce titles and genders.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.15.8. Shape Proper Nouns

Format proper nouns in the selected field.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.15.9. Shape Genders

Format genders by localized country/language, using a long/short format.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.15.10. Shape Civilities

Format titles by localized country/language, using a long/short format.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.15.11. Shape Nationalities

Format nationalities (standardize spaces, capitalization) in the selected field.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.15.12. Extract SIREN code

Extract the SIREN (Système d’Identification du Répertoire des ENtreprises) code from a field with SIRET (Système d’Identification du Répertoire des ETablissements) numbers to a new field. Extraction will only work if the SIRET codes are valid.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.16. Geographic transformations

7.16.1. Parse Address

Analyze one or more fields containing addresses, then extract each part to new fields. If the address is organized across several fields, you must select them in the correct order.

  • The transformation will apply to one or more fields.

  • The transformation will use the active filter(s).

7.16.2. Convert country codes

In the selected field, convert country codes from/to ISO 3166-1 [13] in the following formats:

  • 2-letter country code: ISO 3166-1 alpha-2.

  • 3-letter country code: ISO 3166-1 alpha-3.

  • Country name.

For example:

  • For codes with 2 letters, such as FR, conversion into ISO 3166-1 alpha-3 format will produce FRA.

  • For codes with 3 letters, such as USA, conversion into country name format using French as the language, will produce États-Unis.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

Note

Practical example : Convert country codes

Before Transformation:

ID

Name

CountryCode

1

John Doe

USA

2

Jane Doe

DE

Transformation Configuration:

  • Target Column: CountryCode

  • Desired Standard: ISO 3166-1 alpha-3

After Transformation:

ID

Name

CountryCode

1

John Doe

USA

2

Jane Doe

DEU

In this example, the transformation was configured to convert the country codes in the ‘CountryCode’ column to the ISO 3166-1 alpha-3 standard. As a result, ‘DE’ was converted to ‘DEU’, while ‘USA’, already in the correct format, remained unchanged.

7.16.3. Fix deprecated INSEE Commune Codes (France)

Replace obsolete INSEE municipality codes with the new codes. INSEE municipality codes can become obsolete for a number of reasons, including:

  • Municipality has been associated.

  • Municipality has been abolished.

  • Municipality has become delegated.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.16.4. Deduce geographic data from ZipCodes (France)

Enrich the current dataset with geographic data (department name, department code, region name, region code) deduced from the postcode in the selected field.

For example, postcode 75001 produces the following new fields:

  • Department code: 75.

  • Department name: Paris.

  • Region code: 11.

  • Region name: Île-de-France.

Tip

The same postcode may apply to several municipalities.

Some post codes can belong to a department that is different from the department of the code.

For better results, we advise using INSEE municipality codes or combining the postcode with the name of the town (delivery address).

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.16.5. Enrich from INSEE Commune Codes (France)

Enrich the current dataset with geographic data from the Official Geographic Code [14] (name of municipality, name of department, code of department, name of region, code of region) based on the INSEE municipality code in the selected field.

For example, INSEE municipality code 75056 produces the following new fields:

  • Name of municipality: Paris.

  • Department code: 75.

  • Department name: Paris.

  • Region code: 11.

  • Region name: Île-de-France.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.16.6. Enrich from ZipCodes (France)

Enrich the current dataset with geographic data (INSEE municipality code, name of municipality, name of department, code of department, name of region, code of region) deduced from the postcode and town name (delivery address) in the selected field.

For example, post code 75001 plus town Paris will produce the following new fields:

  • INSEE municipality code: 75056.

  • Name of municipality: Paris.

  • Department code: 75.

  • Department name: Paris.

  • Region code: 11.

  • Region name: Île-de-France.

  • The transformation will apply to two fields.

  • The transformation will use the active filter(s).

7.16.7. Enrich with Country Codes

Add country codes or country names to a new field, using the ISO SO 3166-1 [15] country codes or country names in the selected field, in the following formats:

  • 2-letter country code: ISO 3166-1 alpha-2.

  • 3-letter country code: ISO 3166-1 alpha-3.

  • Country name.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.16.8. Geocode Address

Geocode one or more fields with addresses. Two fields will be added for latitude and longitude. If the address is divided over several fields, you must select them in the correct order (i.e. number, street, postcode, then town).

To retrieve address fields in the repository (e.g. number, street name, postcode, etc.) in addition to latitude and longitude, select Standardize addresses.

  • The transformation will apply to one or more fields.

  • The transformation will use the active filter(s).

7.16.9. Find the country from latitude and longitude

This transformation allows retrieving the country based on latitude and longitude. It is also possible to enrich the dataset with the country’s capital, its geographical area, and the continent to which it belongs.

Warning

For performance reasons, this transformation uses polygons representing slightly simplified outlines of the world’s countries. In some cases, points very close to a border may not be assigned to the correct country. The success rate is 98.9% on the GeoNames ‘All Countries’ dataset, which contains latitude, longitude, and country information for 1,552,935 locations (the allCountries.zip file can be downloaded here: https://download.geonames.org/export/zip/).

  • The transformation will apply to one or more fields.

  • The transformation will use the active filter(s).

Note

Practical example : Find the country from latitude and longitude

Before Transformation:

Record ID

Latitude

Longitude

1

48.858844

2.294351

2

51.507351

-0.127758

Transformation Configuration:

  • Latitude Column: Latitude

  • Longitude Column: Longitude

  • Language: English

  • Additional Fields:

    • ISO 2-letter code

    • Capital city

After Transformation:

Record ID

Latitude

Longitude

Country

ISO2

Capital city

1

48.858844

2.294351

France

FR

Paris

2

51.507351

-0.127758

United Kingdom

GB

London

In this example, the transformation takes geographic coordinates and appends the country name, ISO 2-letter code, and capital city for each record based on the specified configuration.

7.16.10. Standardize French postal addresses (AFNOR NF Z 10-011)

Standardize French postal addresses in accordance with AFNOR NF Z 10-011 of 19 January 2013 simplifying postal delivery. The regulation organizes French postal addresses into 6 fields:

  1. TITLE or STATUS - FIRST NAME - SURNAME.

  2. APT OR POSTBOX NO. - FLOOR - CORRIDOR - STAIR

  3. HALL - WING - BUILDING - RESIDENCE.

  4. NUMBER - STREET NAME.

  5. PLACE CALLED or POST COLLECTION OFFICE.

  6. POSTCODE and DESTINATION LOCALITY or CEDEX CODE and CEDEX NAME.

  • The transformation will apply to one or more fields.

  • The transformation will use the active filter(s).

7.16.11. Standardize French postal addresses (S42)

Standardize French postal addresses as per regulation S42, which organizes French postal addresses into 10 fields:

  1. Given name.

  2. Surname.

  3. Street No.

  4. Street Name.

  5. Street type.

  6. Floor.

  7. Town.

  8. Region.

  9. Postcode.

  10. Country.

  • The transformation will apply to multiple fields.

  • The transformation will use the active filter(s).

7.17. Phone number transformations

7.17.1. Extract Country from International Phone Number

Extract the country from the international telephone numbers in the selected field and enter them in a new field.

Requirements:

  • A field with an international telephone number nature.

  • Only valid international telephone numbers can be used.

Outgoing format for countries:

  • 2-letter country code: ISO 3166-1 alpha-2.

  • 3-letter country code: ISO 3166-1 alpha-3.

  • Country name (localized).

For example:

Format

Language

Telephone number (incoming)

Country (outgoing)

2-letter country code

N/A

+33 1 02 03 04 05

FR

3-letter code

N/A

+32 2 333 44 55

BEL

Country name

English

+41 22 333 44 55

Switzerland

For further information :

  • International format (E.123) in Wikipedia.

  • ISO 3166-1 alpha-2 in Wikipedia.

  • ISO 3166-1 alpha-3 in Wikipedia.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.17.2. Shape Phone Numbers

Format telephone numbers to a pattern:

  • International format: +33 9 67 13 07 93.

  • National format: 0967130793.

  • E.164 format: +33967130793.

  • The transformation will apply to one or more fields.

  • The transformation will use the active filter(s).

7.17.3. Correct Phone Numbers to International

Within a field containing telephone numbers, adjust those numbers to become international numbers by using a field containing country codes (ISO 3166-1 alpha-2), country codes (ISO 3166-1 alpha-3) or countries. Telephone numbers will be adjusted in a new field.

Requirements:

  • A field with an international telephone number nature.

  • A field with country code (ISO 3166-1 alpha-2, alpha-3) or country name nature.

For example:

  • 33 (010) 203-0405 with country code FR and international outgoing format will become +33 1 02 03 04 05 when adjusted.

  • 0102030405 with country code FR and international outgoing format will become +33 1 02 03 04 05 when adjusted.

  • +33 (010) 203-0405 with country code FR and E.164 outgoing format will become +33 1 02 03 04 05 when adjusted

Possible values for the diagnostic column:

  • VALID: correction OK.

  • INVALID_FOR_REGION: telephone number not valid for country code.

  • INVALID_COUNTRY_CODE: invalid country code.

  • INVALID_INPUT: not a telephone number.

For further information :

  • International format (E.123) in Wikipedia.

  • E.164 format in Wikipedia.

  • ISO 3166-1 alpha-2 in Wikipedia.

  • ISO 3166-1 alpha-3 in Wikipedia.

  • The transformation will apply to two fields.

  • The transformation will use the active filter(s).

7.18. Email address transformations

7.18.1. Split Email address

Extract from an e-mail address: the domain part (contact@taleofdata.com, contact@gmail.com, etc.), the local part (contact@taleofdata.com, contact@gmail.com, etc.) or the domain and local parts from e-mails in the selected field and enter them in one or more new fields.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.18.2. Shape Emails

Format the e-mails (standardize spaces, delete accents, make characters lower case) in the selected field.

  • The transformation will apply to just one field.

  • The transformation will use the active filter(s).

7.18.3. Check E-mail consistency

Check for faults in e-mail addresses against fields containing the first names and surnames.

Requirements:

  • A field containing the e-mail addresses to be checked.

  • A field containing the relevant first names.

  • A field containing the relevant surnames.

Exemple visuel :

email

surname

first name

email_consistency

first name.surname@domain.com

Name

First name

OK

fist namefist name.surname@domain.com

Name

First name

FIRST_NAME_MISSPELLED

first name-surnam@domain.com

Surname

First name

LAST_NAME_MISSPELLED

  • The transformation will apply to three fields.

  • The transformation will use the active filter(s).

Note

Practical example : Check E-mail consistency

Before Transformation:

email

First name

Country name.

john.doe@xyz.com

John

Doe

jan.smith@abc.co

Jane

Smith

Transformation Configuration:

  • Target Email Column: Email

  • First Name Column: First Name

  • Last Name Column: Last Name

  • Email Consistency Column: Email Status

  • Email Pattern Column: Email Pattern

  • Email Corrected Column: Corrected Email

After Transformation:

email

First name

Country name.

Email.email_consistency

Email.email_pattern

Email.email_corrected

john.doe@xyz.com

John

Doe

OK

FIRSTNAME__SEPARATOR

john.doe@xyz.com

jan.smith@abc.com

Jane

Smith

FIRST_NAME_MISSPELLED

FIRSTNAME__SEPARATOR

jane.smith@abc.com

In this example, the transformation checks each email for consistency with the associated first and last names. The first email is found to be valid and matches a standard pattern. The second email is flagged as invalid (FIRST_NAME_MISSPELLED) because the spelling of the first name in the email does not match the spelling of the first name in the First Name column.