4. Data processing: flow tools
which allow you to configure input and target data for flows.
We will then review each type of node on the list below. Like sources and sinks, these can be dropped onto the canvas using the Tale of Data flow editor toolbar:
A preparation function allows applying a series of transformations to input data, utilizing a range of about a hundred possible types of operations, such as formatting, straightening, deduplication, harmonizing, enriching, and setting and applying validation rules.
A filter node allows selecting fields and records to be sent to each of its outputs.
A validation node sends valid records to its first output and invalid records to its second output, if it has one.
A broadcast node allows duplicating each input record across all outputs.
A join node allows adding information to data, corresponding to adding columns (SQL-style join).
An enrichment node allows, notably using fuzzy matching, to add new fields to a dataset (referred to as the dataset to be enriched or dataset no. 1) from an enrichment dataset (= dataset no. 2, connected by a blue link).
A union node allows adding multiple datasets at the input (stacking), corresponding to adding rows (SQL-style union).
A sort node allows sorting input records based on various criteria.
An aggregation node enables the creation of cross-tabulations.
A window function performs, for each row of the input dataset, one or more calculations on a set of records that are linked to the current record of the input dataset.
Repositories allow repairing or enriching datasets with sophisticated matching algorithms.