Getting Started Guide

Note

A video tutorial for this section of the documentation is available here.

Tale of Data is a software program designed for business users to restore their confidence in their data and therefore enable them to make good decisions.

Tale of Data offers the following functions that do not require coding:

  • Intelligent data reliability,

  • Data compliance control,

  • Discovery of the datasets disseminated in the Information System,

  • Semantic mapping (nature of data) and anomaly mapping,

  • Combination of heterogeneous data sources, data augmentation,

  • Automated monitoring, notifications and remediation.

Basic principles

Audited, business-verified data for informed decision-making that maximises results and minimises risk!

starter-image1

Definitions

Flow

A flow is a graphically designed form of processing made up of:

  • Input data,

  • A set of transformations,

  • Output data.

Node

A node is represented by icons in a flow.

Nodes are used to represent, for example:

  • starter-image2 Input data, called a ‘source node’,

  • starter-image3 Output data called a ‘sink node’,

  • Processing to transform, correct or supplement data. A filter node starter-image4 or enrichment node starter-image5 can be used, for example, when processing our flow in this section.

Tale of Data icons

Tale of Data icons are there to help you.

starter-image6 An asterisk means that field completion is mandatory.

starter-image7 A tooltip containing an exclamation mark indicates that a parameter is missing. The same icon will also help you fix this when you hover your mouse over it.

Home page

Enter your login and password to display the home page.

starter-image8

The Tale of Data main menu is always visible on the left of the screen.

starter-image9

Go to the menu by clicking your user profile (top left) to:

  • Set your preferences,

  • Download PDF documents

  • Log out.

starter-image10

Catalog

Access the catalog from the main Tale of Data menu to:

  • access datasets

  • access the list of repositories

  • add new data sources

Creating your first flow

Aim

As part of this Getting Started Guide, we are going to create a simple flow. We will take data from a file, filter some rows and write the results to a different file.

To do this, you must create and configure:

starter-image11 A source node,

starter-image12 A sink node,

starter-image13 A filter node.

Hint

The flow will familiarise new users with Tale of Data concepts and help them get started. The full potential of Tale of Data will become clear with practice.

Creating the flow

In the home page:

  1. Click New Flow

starter-image14

  1. Name the flow, e.g “My first flow

starter-image15

  1. Click OK

  2. You are now in the Flow Designer interface.

starter-image16

Flow Designer

starter-image21

starter-image17 Toolbar

(from which you can drag and drop nodes onto the canvas).

starter-image18 Canvas

(on which you can compose your flow).

starter-image19 Configuration zone

(for entering and selecting settings for the nodes used in the flow).

starter-image20 Preview zone

(for viewing input and output data for the selected node in the flow).

Note

The preparation node starter-image25 is the only node that will be configured in a different interface.

Hint

This is a partial preview and only covers a sample of rows.

Adding a data source

Drag and drop the source node icon starter-image27 from the toolbar to the canvas.

starter-image28

A source node starter-image29 lets you retrieve records (by connecting to files or databases).

Configuring the source node

  1. You may be able to name your source node starter-image30.

    For example: “My data source”

    If you do not do this, the node will automatically take the name of the selected file or table.

  2. In “Type of Data Source”, select Tale of Data file system.

    starter-image31

    Note

    The Existing Data Sources field auto completes.

  3. Click Select

    starter-image32

    A new window will open

    starter-image33

  4. In “Upload Files”, click “Select Files”.

    starter-image34

  5. Import the demonstration file provided with the starter guide (My_Data.csv).

    The file will now appear in “File Selection”.

    starter-image35

  6. Click “My_Data.csv” to select the file.

    starter-image36

    The Select button will light up to show it has been activated.

  7. Click Select

    The Flow Designer interface will reappear with your source node starter-image37 now configured.

    starter-image38

Note

The icon starter-image39 will indicate that the source is not yet ready for use. It needs to be linked to a sink node starter-image40.

Note

A preview of the data in the imported file will be displayed in the preview zone at the bottom of the screen.

Adding a data sink

Drag and drop the sink node icon starter-image41 from the toolbar to the canvas beside the source node starter-image42.

starter-image43

A sink node starter-image44 lets you send records to a storage system.

Configuring the sink node

  1. You may be able to name your sink node starter-image45.

    For example: “Processed data”

    If you do not do this, the node will automatically take the name of the selected file or table.

  2. In “Type of Data Source”, select Tale of Data file system.

    starter-image46

    Note

    The Existing Data Sources field auto completes.

  3. From the drop-down list in “File Types”, select “XLSX (in Excel 2007)”.

    starter-image47

    Important

    An orange bar means that you must click Apply to save the node configuration

    You will lose the node configuration if you do not click Apply.

A new area will appear in the configuration zone.

starter-image48

  1. In Excel (XLSX) Options, click Select.

    A new window will open

  2. In Output File, complete the following two fields:

    • File Name: the name to be given to the Excel file,

      For example: “Processed data”

    • “Sheet name”, name of the spreadsheet in which the data will be stored.

      For example, “Sheet_1”

    starter-image49

    Important

    To activate Select, exit the data entry field.

    When you have completed and exited the Sheet Name field, the button will activate and light up.

  3. Click Select

    Flow Designer will reappear with your configured sink node starter-image50.

    starter-image51

Note

The icon starter-image52 will indicate that the source is not yet ready for use. It needs to be linked to a sink node starter-image53.

Save mode

Overwrite

to overwrite the table or file whenever the flow is run.

Append

to add records to the end of an existing table or file.

Create

to create a new table or file. An error will be flagged if the table (or file) already exists.

Note

Other modes may exist, depending on the type of connector used. The three modes listed here are common to almost all Tale of Data connectors and are the most commonly used.

Linking nodes

To link two items in the flow:

  1. Click the source node starter-image54.

  2. Drag the mouse to the sink node starter-image55, holding the left mouse button down.

    Note

    An arrow icon will appear starter-image56.

  3. Release the left mouse button when you are over the sink node starter-image57.

    starter-image58

    You have now created a link between two nodes.

    starter-image59

Note

The tooltip with the exclamation mark has disappeared: the link is now valid and the nodes are ready for use.

Adding a filter node

We want to filter the rows from the input file and only keep those lines that have France in the Country column. To do this, we are going to add a filter node starter-image60 to our flow to act as the intermediary between input and output data.

  1. Drag and drop the filter node icon starter-image60 onto the link between the source node starter-image61 and the sink node starter-image62.

    starter-image63

    Filter nodes starter-image64 let you filter your data.

    The filter node starter-image65 is added to the flow.

    starter-image66

    Note

    A flow can also be constructed from left to right: sink node starter-image67, then filter node starter-image68 and finally source node starter-image69.

Configuring the filter node

  1. In “Predicates”, click on the arrow to access the drop-down list.

    starter-image71

    The content of the drop-down list depends on which file was uploaded..

  2. Select Country. Two new fields will appear.

    starter-image72

  3. In the blank field, enter France. By adding the condition “equal to”, the filter will be sensitive to the (upper/lower) case of the input value.

    starter-image73

  4. Click Apply.

    A preview of the filtered data will display in the data preview zone at the bottom of the screen.

    starter-image74

Note

When clicked, the sink node starter-image75 will appear to show all the results but in fact will show only a sample of them.

The flow must be run to ensure the entire dataset has been filtered.

Hint

You can name the filter in the Name field.

This helps document the flow. If you choose a particular name, this will appear in the PDF flow documents when they are produced (clearly detailing how the flow works).

starter-image70

Running the Flow

We will run the flow to obtain its results.

  1. Click Run on the right of the toolbar.

    starter-image76

  2. The scheduling window will open

    Click Run Now

    starter-image77

    A message will appear that the flow has been successfully planned starter-image78 and run. starter-image79

    starter-image80

starter-image81

Congratulations on creating and running your first flow!

Downloading results

We now want to retrieve the flow results.

  1. Click the sink node starter-image82 to select it.

    starter-image83

    The Download button starter-image84 will appear.

  2. Click Download starter-image85 to download the file.

Compare the files before and after processing

You can compare the source and sink files.

My_Data source file

starter-image86

“My processed data” sink file

starter-image87

Note

This first flow is just an introductory example. Far more complex (but easy to use) processes are also available.

For example, you can:

  • Run the same process on other data without having to recreate the flow (= reusing flows),

  • Schedule (daily, weekly, etc.) runs

  • Process huge amounts of data (billions of rows)

  • Launch configurable notifications for all types of anomalies in the data.

Moving on

Managing your flows

starter-image88

You can now see your flow in the My Recent Flows section of the home page.

  1. Click See all my Flows

    starter-image89

  2. Select the flow you want to alter (by checking the box).

starter-image95

starter-image90 Open the flow

starter-image91 Rename the flow

starter-image92 Duplicate the flow

starter-image93 Delete the flow

starter-image94 Share the flow

Tip

You can e.g. rename and delete flows via this interface by using the sidebar.

Summary of the palette of processing nodes on the toolbar

Use the reference guide to find detailed information on all Tale of Data processing tools. The following is a brief summary:

node-list-s-image25 Preparation function

A preparation function allows applying a series of transformations to input data, utilizing a range of about a hundred possible types of operations, such as formatting, straightening, deduplication, harmonizing, enriching, and setting and applying validation rules.

node-list-s-image4 Filter

A filter node allows selecting fields and records to be sent to each of its outputs.

node-list-s-image102 Validation

A validation node sends valid records to its first output and invalid records to its second output, if it has one.

node-list-s-image96 Dissemination

A broadcast node allows duplicating each input record across all outputs.

node-list-s-image98 Joining

A join node allows adding information to data, corresponding to adding columns (SQL-style join).

node-list-s-image99 Enrichment

An enrichment node allows, notably using fuzzy matching, to add new fields to a dataset (referred to as the dataset to be enriched or dataset no. 1) from an enrichment dataset (= dataset no. 2, connected by a blue link).

node-list-s-image97 Union

A union node allows adding multiple datasets at the input (stacking), corresponding to adding rows (SQL-style union).

node-list-s-image101 Sorting

A sort node allows sorting input records based on various criteria.

node-list-s-image100 Aggregation

An aggregation node enables the creation of cross-tabulations.

node-list-i512 Window function

A window function performs, for each row of the input dataset, one or more calculations on a set of records that are linked to the current record of the input dataset.

node-list-i159 Repository

Repositories allow repairing or enriching datasets with sophisticated matching algorithms.