4. Data processing: flow tools

In the following sections we will start with nodes

which allow you to configure input and target data for flows.

We will then review each type of node on the list below. Like sources and sinks, these can be dropped onto the canvas using the Tale of Data flow editor toolbar:

node-list-s-image25 Preparation function

A preparation function allows applying a series of transformations to input data, utilizing a range of about a hundred possible types of operations, such as formatting, straightening, deduplication, harmonizing, enriching, and setting and applying validation rules.

node-list-s-image4 Filter

A filter node allows selecting fields and records to be sent to each of its outputs.

node-list-s-image102 Validation

A validation node sends valid records to its first output and invalid records to its second output, if it has one.

node-list-s-image96 Dissemination

A broadcast node allows duplicating each input record across all outputs.

node-list-s-image98 Joining

A join node allows adding information to data, corresponding to adding columns (SQL-style join).

node-list-s-image99 Enrichment

An enrichment node allows, notably using fuzzy matching, to add new fields to a dataset (referred to as the dataset to be enriched or dataset no. 1) from an enrichment dataset (= dataset no. 2, connected by a blue link).

node-list-s-image97 Union

A union node allows adding multiple datasets at the input (stacking), corresponding to adding rows (SQL-style union).

node-list-s-image101 Sorting

A sort node allows sorting input records based on various criteria.

node-list-s-image100 Aggregation

An aggregation node enables the creation of cross-tabulations.

node-list-i512 Window function

A window function performs, for each row of the input dataset, one or more calculations on a set of records that are linked to the current record of the input dataset.

node-list-i159 Repository

Repositories allow repairing or enriching datasets with sophisticated matching algorithms.