c) Study your Data

Once you have specified a table to analyse, the MinedTree object will scan the data source to collect information on the data fields within the table. This process may therefore take some time. Examining the information is your first step towards understanding the data. The Field Usage tab displays a table showing the fields. For each field it shows the field name, type, values/range, groupings of values, usage and rank – see Mined Tree object – Field Usage tab section.

The field names shown will be the ones used in the table by default. You can rename the data fields within the MinedTree object, but note that this does not alter the original name in the table.

The type of each field is taken from the data as numeric, date/time or discrete (i.e. text). For numeric and date/time fields you can see the minimum, maximum, average and standard deviation values. For discrete fields you can see the number of discrete values and the frequency of each value.

Having studied the basic information about the data, now is the time to decide which field is going to be the outcome of your analysis and which fields will be the attributes of your analysis. Remember that the MinedTree object will generate a tree classifying the outcome field using the attributes fields. You can also decide at this stage if any of the fields are to be excluded from the current analysis. Change the usage of one field to be the Outcome field and the others to Attribute or Excluded as required.

Excluding fields from the target table

By default, all the fields in your target table will be made available on the Data Window for analysis. However, when you first open up a table you are given the option to exclude fields by changing the usage.

Excluding fields will speed up processing and you should therefore exclude any fields when you are sure that they will have no use or influence on your analysis project, for example:

·Fields that are unique index identifiers. In general, a field that holds a 'customer account code' or a 'product code' would be of no predictive value.

·Telephone numbers or area codes. Fields like this need more thought as in some instances they might be needed, but rarely in their original 'as is' form. For example, you may want to use these to identify geographic groups. However, it is more likely that only the main exchange number or first part of an area code would be suitable and the field would therefore need pre-processing to achieve this. There are also possibilities such as 'telephone given' Yes/No would be a useful field, whereas the actual number would not - such as on a credit loan application form.