10. Generative AI features
This section documents the configuration and use of generative AI for data transformation in the preparation editor.
10.1. Introduction
This transformation makes it possible to use natural language for :
Obtain a quality diagnosis of the state of the dataset before transformation.
Transform the rows of the dataset.
The transformation is available here :
Although the transformation is designed to use all types of generative AI (ChatGPT, Mistral AI, Llama-2, Falcon LLM,…) only ChatGPT 4-turbo (“gpt-4-1106-preview”) is supported for the moment, as it is the only one to offer natural language understanding compatible with the use of Tale of Data by non-computer scientists.
To use this feature, you need an OpenAI API key authorizing access to ChatGPT 4 turbo (LLM model code name: “gpt-4-1106-preview”).
10.2. Setting your API key
Select the Security and Authorizations item from the user drop-down menu at top left:
In the dialog window that opens, in the API ChatGPT tab, enter :
the URL of the API access point (e.g. https://api.openai.com/v1/)
Your API key
Important
Click on the Save button before closing the window to activate encrypted storage of your API key.
10.3. Obtain a quality diagnosis
Click on the Quality Audit button in the top right-hand corner to generate a quality diagnostic for the sample data displayed.
Note
Whether it’s a quality audit or a data transformation, Tale of Data only sends the generative AI a sample of the data displayed in the preparation editor. This sample contains no more than a hundred lines.
Once configured and validated by the user on the sample, the data transformation will be applied to the entire dataset when the flow is executed.
For greater efficiency, the Renew sample button is used to create a new sampling of one hundred lines to be supplied to the generative AI.
Generative AI will create a list of anomalies detected in the data sample currently being edited (before transformation). Once generated, this list will be displayed in the Quality audit before transformation tab:
Hint
Please note that, depending on the bandwidth available for calls to the generative AI service you’re using, completion of the Quality Audit may take from a few seconds to just over a minute.
10.4. Transforming data
By following the instructions you type in the prompt area, the generative AI will perform the transformations you request for each line of the dataset.
Warning
Lines are processed independently of each other. In other words, you cannot use data from one row to transform another row. Use window functions if you need to include the neighboring rows of a record in your calculation.
10.4.1. Prompt examples
Separate first and last names
Separate address components into 3 columns: “Street”, “Postcode” and “City”.
Format phone numbers
Capitalize the first letter of each word
Delete the “XX” prefix from column C
Delete words of less than 3 letters in column C
etc.
Give your prompts as precisely as possible.
You can also use this transformation to create or move columns.
However, you cannot add or delete rows with this transformation.
10.4.2. Best practices
To improve robustness and maintainability, opt for a sequence of simple, readable transformations rather than trying to make a single transformation containing several complex instructions.
Don’t hesitate to give concrete examples of what you’re trying to achieve.
If you don’t, the AI will do it for you, which may not be what you want.
Note
This transformation only sends requests to the generative AI when your prompt is analyzed. Then code is generated. It is this code that will be executed when the Flow is run, which has two advantages:
The flow is fast.
Flow execution, whatever the volume of data, does not consume any API tokens, and therefore entails no financial cost.
10.4.3. Configuration
Type your instructions in the prompt box
The Sample data before transformation tab displays data (before transformation) in the bottom table (maximum 100 rows).
Click on the Transform data button: the generative AI will return a sequence of instructions to correct and/or transform each line. Tale of Data will check these instructions, then apply them to the sample so that the user can preview the results.
10.4.4. Analysis of results
Once the transformation has been applied to the sample, the transformed data is displayed in the Sample data after transformation tab.
By default, only columns involved in the transformation are displayed. If the transformation has created new columns, these are displayed on a light blue background (click on the Show all columns checkbox to display all columns in the transformed dataset).
The Sample data after transformation tab allows the user to check that the result of the transformation corresponds to his expectations.
A more advanced, but more technical, check is possible: simply click on the Generated code for transformation tab to access the generated code (this is the code that will be invoked when the flow is executed).
The generated code is readable, concise and easy to check by anyone familiar with scripts or macros:
Click on the red Apply button at bottom left to actually apply the preparation and integrate it into the flow :
10.4.5. Editing the transformation
Like most of the transformations in the preparation editor, the AI-Generative transformation is easy to edit:
Click on the pencil icon associated with the transformation in the sliding pane of the preparation history:
Simply modify your prompt to your liking and reapply the transformation: