4.1.2. Record sampling and limits

Two parameters are particularly important to the easy use of flows:

Sampling (%)

Use this option if you do not want to read the N first rows of data because you need a representative sample for future processing purposes.

This is a randomized sample of records image75 expressed as a percentage of the total size of the dataset. On performance grounds, the percentage is an estimate.

A value of 100 means: no sampling.

Note

Sampling occurs only during flow design. During production, all records from the source will be used.

When a data source is modified, the changes will be applied automatically as soon as the source configuration becomes valid.

Record limit

This is the number of records image76 to be read in the source. The number MUST be positive.

If you do not enter a value, this means you want to read all the records. We strongly advise against this at the flow design stage. If the dataset is very big, it will make processing and transformation configuration less smooth. We therefore advise limiting the number of records during the flow design stage.

Note

The limit on the number of records will be taken into account after sampling.

The limit on the number of records is taken into account only during the flow design stage . During production, all records from the source will be used.