4. Data processing: flow tools

In the following sections we will start with nodes

source, and
sink,

which allow you to configure input and target data for flows.

We will then review each type of node on the list below. Like sources and sinks, these can be dropped onto the canvas using the Tale of Data flow editor toolbar:

Preparation function

A preparation function allows applying a series of transformations to input data, utilizing a range of about a hundred possible types of operations, such as formatting, straightening, deduplication, harmonizing, enriching, and setting and applying validation rules.

Filter

A filter node allows selecting fields and records to be sent to each of its outputs.

Validation

A validation node sends valid records to its first output and invalid records to its second output, if it has one.

Dissemination

A broadcast node allows duplicating each input record across all outputs.

Joining

A join node allows adding information to data, corresponding to adding columns (SQL-style join).

Enrichment

An enrichment node allows, notably using fuzzy matching, to add new fields to a dataset (referred to as the dataset to be enriched or dataset no. 1) from an enrichment dataset (= dataset no. 2, connected by a blue link).

Union

A union node allows adding multiple datasets at the input (stacking), corresponding to adding rows (SQL-style union).

Sorting

A sort node allows sorting input records based on various criteria.

Aggregation

An aggregation node enables the creation of cross-tabulations.

Window function

A window function performs, for each row of the input dataset, one or more calculations on a set of records that are linked to the current record of the input dataset.

Repository

Repositories allow repairing or enriching datasets with sophisticated matching algorithms.