4.5. Filter node

If you want to learn more about this feature, an e-learning tutorial is available below:

Using Filter Nodes

../../_images/N_EN_14.png

Filter nodes are awesome in what you can do with them. Did you know you could set up several outputs with a single one?

Advanced usage of the preparation editor : all about filtering

../../_images/N_EN_1.png

This video presents how to use filters in the preparation editor. These can be extremely useful for looking at your data and exploring it. Furthermore it can also help in cases where you would quickly like to apply transformations only on a subset of your data. Both uses are illustrated and should help you become very proficient with this tool.

4.5.1. Description

Icon: image102

  • Number of inputs: 1.

  • Number of outputs: 1 or more.

Definition

A filter node lets you select the fields and records to be sent to each of its outputs.

Configuration

For each filter node output, you can:

  • Select a subset of fields and rearrange them (by dragging and dropping in the columns list) image103.

  • Select a subset of records by setting conditions in the predicate zone image104.

image105

Tip

If no condition is assigned to the last output link of a filter node, that last link will retrieve all the records that failed to meet any of the previous output conditions.

Visual example

image106

This example shows how conditions are added to configure filtering.

This filter node will retrieve values equal to United Kingdom from the Country column.

Practical examples

Note

Example 1: Basic Filtering by Column Value

Before operation:

ID

Name

Age

Country

1

Alice

30

USA

2

Bob

25

Canada

3

Charlie

35

USA

4

Diana

40

United Kingdom

In the configuration zone of the filter node:

  • Predicate: Country = “USA”

  • Selected Columns: ID, Name, Age

  • Select All: True

After operation

ID

Name

Age

1

Alice

30

3

Charlie

35

Note

Example 2: Multiple Output Configurations with Fallback

Before operation:

ID

Name

Age

Country

1

Alice

30

USA

2

Bob

25

Canada

3

Charlie

35

USA

4

Diana

40

United Kingdom

In the configuration zone of the filter node:

  • Output 1:

    • Predicate: Age > 30

    • Selected Columns: ID, Name

  • Output 2:

    • Predicate: Country = “Canada”

    • Selected Columns: Name, Age

  • Output 3 (Fallback):

    • Predicate: (No explicit predicate; falls back to rows not meeting other predicates)

After operation

  • Output 1:

    ID

    Name

    3

    Charlie

    4

    Diana

  • Output 2:

    Name

    Age

    Bob

    25

  • Output 3:

    ID

    Name

    Age

    Country

    1

    Alice

    30

    USA

4.5.2. Configuration examples

4.5.2.1. Is the greatest or smallest in the column

You may wish to filter a dataset by the greatest or smallest value in a column.

A classic example of this is retrieval of the latest rows added to a dataset. By applying the ‘is the greatest in the column’ filter to the Date of Purchase column, only recently imported rows will be retrieved.

image107

In the configuration zone of the filter node:

  • image108 Click the label to open the window you want.

  • image109 Under Comparison, select ‘is the greatest in the column’ or ‘is the smallest in the column’.

4.5.2.2. Lookup function (search in the column of a table outside the flow)

You can also use the filter node to search (lookup) a column in another dataset to check a condition:

Click the predicates operator image110.

image111

You can select a predicates operator in the dialog window that will open.

There are four operators for looking up condition values in the column of another table image112 :

  • Is in the column.

  • Is not in the column.

  • Contains at least one column item.

  • Contains no column items.

Note

The Ignore character case option* image113 is available for the four operators.

image114

Once one of the four operators has been selected, click the button to the right of the text box image115:

image116

Now just select the column to be searched for data for the right section of the condition:

image117

The example above shows the retrieval of rows in our dataset whose status is not on the reference list (us_states.csv) clearly because of spelling mistakes:

image118