10. Generative AI features

This section documents the configuration and use of generative AI for data transformation in the preparation editor.

10.1. Introduction

This transformation makes it possible to use natural language for :

  • Obtain a quality diagnosis of the state of the dataset before transformation.

  • Transform the rows of the dataset.

The transformation is available here :

image1

Although the transformation is designed to use all types of generative AI (ChatGPT, Mistral AI, Llama-2, Falcon LLM,…) only ChatGPT 4-turbo (“gpt-4-1106-preview”) is supported for the moment, as it is the only one to offer natural language understanding compatible with the use of Tale of Data by non-computer scientists.

To use this feature, you need an OpenAI API key authorizing access to ChatGPT 4 turbo (LLM model code name: “gpt-4-1106-preview”).

10.2. Setting your API key

Select the Security and Authorizations item from the user drop-down menu at top left:

image2

In the dialog window that opens, in the API ChatGPT tab, enter :

  1. the URL of the API access point (e.g. https://api.openai.com/v1/)

  2. Your API key

image3

Important

Click on the Save button before closing the window to activate encrypted storage of your API key.

10.3. Obtain a quality diagnosis

image4

Click on the Quality Audit button in the top right-hand corner to generate a quality diagnostic for the sample data displayed.

Note

Whether it’s a quality audit or a data transformation, Tale of Data only sends the generative AI a sample of the data displayed in the preparation editor. This sample contains no more than a hundred lines.

Once configured and validated by the user on the sample, the data transformation will be applied to the entire dataset when the flow is executed.

For greater efficiency, the Renew sample pastille3 button is used to create a new sampling of one hundred lines to be supplied to the generative AI.

Generative AI will create a list of anomalies detected in the data sample currently being edited (before transformation). Once generated, this list will be displayed in the Quality audit before transformation tab:

image5

Hint

Please note that, depending on the bandwidth available for calls to the generative AI service you’re using, completion of the Quality Audit may take from a few seconds to just over a minute.

10.4. Transforming data

By following the instructions you type in the prompt area, the generative AI will perform the transformations you request for each line of the dataset.

Warning

Lines are processed independently of each other. In other words, you cannot use data from one row to transform another row. Use window functions if you need to include the neighboring rows of a record in your calculation.

10.4.1. Prompt examples

  • Separate first and last names

  • Separate address components into 3 columns: “Street”, “Postcode” and “City”.

  • Format phone numbers

  • Capitalize the first letter of each word

  • Delete the “XX” prefix from column C

  • Delete words of less than 3 letters in column C

  • etc.

Give your prompts as precisely as possible.

You can also use this transformation to create or move columns.

However, you cannot add or delete rows with this transformation.

10.4.2. Best practices

  1. To improve robustness and maintainability, opt for a sequence of simple, readable transformations rather than trying to make a single transformation containing several complex instructions.

  2. Don’t hesitate to give concrete examples of what you’re trying to achieve.

  3. If you don’t, the AI will do it for you, which may not be what you want.

Note

This transformation only sends requests to the generative AI when your prompt is analyzed. Then code is generated. It is this code that will be executed when the Flow is run, which has two advantages:

  • The flow is fast.

  • Flow execution, whatever the volume of data, does not consume any API tokens, and therefore entails no financial cost.

10.4.3. Configuration

image6

  1. Type your instructions in the prompt box

  2. The Sample data before transformation tab displays data (before transformation) in the bottom table (maximum 100 rows).

  3. Click on the Transform data button: the generative AI will return a sequence of instructions to correct and/or transform each line. Tale of Data will check these instructions, then apply them to the sample so that the user can preview the results.

10.4.4. Analysis of results

Once the transformation has been applied to the sample, the transformed data is displayed in the Sample data after transformation tab.

image7

By default, only columns involved in the transformation are displayed. If the transformation has created new columns, these are displayed on a light blue background (click on the Show all columns checkbox to display all columns in the transformed dataset).

The Sample data after transformation tab allows the user to check that the result of the transformation corresponds to his expectations.

A more advanced, but more technical, check is possible: simply click on the Generated code for transformation tab to access the generated code (this is the code that will be invoked when the flow is executed).

The generated code is readable, concise and easy to check by anyone familiar with scripts or macros:

image8

Click on the red Apply button at bottom left to actually apply the preparation and integrate it into the flow :

image9

10.4.5. Editing the transformation

Like most of the transformations in the preparation editor, the AI-Generative transformation is easy to edit:

Click on the pencil icon associated with the transformation in the sliding pane of the preparation history:

image10

Simply modify your pastille1 prompt to your liking and reapply the transformation:

image11