Using the field mapping control
When merging several input datasets into a single output dataset, the field structure and contents are a consideration. Each input dataset will contain fields that also exist in other input datasets, as well as fields that are unique to only that dataset. How these fields are managed determines the field structure and content in the output dataset. The Field Mapping control allows you to define this output dataset field structure.
All input dataset fields will be mapped to the output dataset. When there is field duplication (based solely on name) between all the inputs, the output dataset field will be a combination of each occurrence. All unique input dataset fields (those not found in other input datasets) are also mapped to the output dataset.
It is possible for a field map's subfields to be of varying data types. In this case, the output field's data type is set to the data type of the first input dataset, and all other subfields are cast to this type. For example, the first input field is text, and the second input field (of the same name) is double. The output data type will be text, and the values in the second input field will be converted to this type. There may be instances when conversion is not possible, and errors will be raised during execution that state where the problem is. For instance, a BLOB field can't be converted to any data type other than BLOB. Also, an alphanumeric field would have to be truncated (start point and end point stated) to remove the alphabetic portion of its values if the field is to be converted as numeric type.
Field mapping is displayed graphically in a catalog tree. All root (top-level) entries are the output dataset fields. You will see their names and their default output data types. The default output field data type is the same as the first input's data type.
Each root-level entry is expandable. When expanded, it will display all subfields from which data will be gathered to populate the output field. For each occurrence of a field (by name) in the input dataset, a subfield entry will appear, showing its source and its data type. The first input dataset's occurrence of a field will be the first to appear in the subfield list.
This is the default structure of the output dataset fields. You may, at any time, add or delete input datasets; add, delete, or rename output dataset fields; and add, delete, rename, or custom-format subfields.
Modifying the default field mappings
The default field mapping structure can be modified at any time. Output fields can be added, deleted, or renamed. Subfields can also be added, deleted, or renamed, and in the case of text output fields, they may be formatted. In the field mapping tree view, shortcut menus (accessed by right-clicking a selected field, subfield, or in the white space) can be used to make these modifications.
The shortcut menu for output fields has the following options: Add Input Field, Delete, Rename, and Properties.
- Add Input Field opens a dialog box that lists all input dataset fields. Choosing an available field adds it to the selected output field's subfield list.
- Delete removes the selected field from the output dataset.
- Rename allows you to alter the output field's name.
- Properties allows you to alter the selected field's properties. You can change the field's name, type, type properties, merge rule, and delimiter (if the Join merge rule is selected).
The shortcut menu for subfields has the following options: Delete and Format (text fields only).
- Delete removes the selected subfield. If the first subfield is the one deleted, the output field's data type will not change to reflect the new first position subfield. If you want to change the output field's data type, you must do so manually (using the Properties dialog box), or use the Reset option on the field mapping windows context menu.
- Format allows you to alter the field width, start position, and end position, then have the changes applied to all the other subfields of the same name.
If the field width is decreased, the output row values will be truncated to fit. Truncation also occurs if the start point and endpoint values are altered. The Start Position value is the start point from which row values will be truncated. Row value widths start at zero.
Values that are longer than the End Position value will be truncated at the specified point. For example, a text field with a row value of "environment", a width of 12, a start position of 3, and an end position of 7 would appear in the output field as "ironm".
The shortcut menu of the field mapping pane (white space) has the following options: Add Output Field and Reset.
- Add Output Field allows you to add a new field. A secondary dialog box opens that allows you to alter the name, data type, data type properties, merge rule, and a delimiter (if the Join merge rule is selected).
- The Reset option will set all output field properties back to the default. All modifications will be removed.
Merge rules
Each output field allows you to set certain properties, such as name, type, and merge rule. The merge rules allow you to specify how values from two or more input fields (subfields) are merged into a single output value. Null values are excluded from all statistical calculations. There are several merge rules that you can use:
- First—Use the first subfield's values to populate the rows of the output field.
- Last—Use the last subfield's values to populate the rows of the output field.
- Join—Concatenate (join) all subfield values and use this value to populate the rows of the output field. You can use a delimiter to separate the various input values. If no delimiter is used, all values will be joined into one continuous string.
- Sum—Calculate the total of all subfield values, then divide by the number of input subfields.
- Mean—Calculate the mean (average) of all subfield values, and use the result to populate the rows of the output field.
- Median—Calculate the median (middle) value, and use the value to populate the rows of the output field. The median is the middle value, which separates the total set of values into lower and upper halves. If there is an even number of values, the average of the two middle values will be used. For example, the median of 1, 6, 7, 100 is 6.5.
- Mode—Use the value with the highest frequency (most common). If there is more than one value with the highest frequency (i.e., a tie), then the list of Mode values will be sorted (smallest first for numeric, alphabetical for text), and the first one will be used.
- Min—Find the minimum value of all subfields and use it to populate the rows of the output field.
- Max—Find the maximum value of all subfields and use it to populate the rows of the output field.
- Standard deviation—Use the standard deviation classification method on all subfield values.
Standard deviation should not be performed on a single input, because values can't be divided by zero, so standard deviation is not a valid option for single inputs. The Merge tool will not fail if standard deviation is chosen for an output field that only has a single subfield. In this instance, a value of zero will be entered in the rows of the output field.
- Count—Find the number of values included in statistical calculations. This counts each value except null values.
Setting the Field Map parameter in scripting
In scripting, field mapping can be done in one of the following ways:
- Entering a string value for a field map parameter—this is recommended only when the number of fields is small and little (if any) alteration of the fields is required.
- Create and use a FieldMappings object.