![]() | c) Study your Data | e) Group the values of your attributes | ![]() |
For discrete outcomes when using tree induction: while your outcome can have many discrete values, we recommend that you keep an individual analysis focused on a small number (<4) of discrete values. This will help you in understanding and interpreting the resulting trees. If your outcome field has a large number of discrete values then group these values into a small number of categories. An example of this is the grouping of occupations into categories. Use subjective criteria and/or the field information (frequencies of discrete values) to help you to define outcome groups. Alternatively, isolate the outcome value(s) of interest in one group against the rest of values. That is, you effectively could profile one outcome value against all others.
If your outcome field is numeric or date/time then you can either leave it 'as is' or treat it as you would treat a discrete outcome with too many values, by defining outcome groups based on value thresholds. For example, if your outcome field is age then you can define age groups such as <31, 31-45, >45. These ranges are defined according to the value thresholds of interest. As a rule, leave the outcome field as numeric if you do not have particular numeric ranges of interest or focus.
Your outcome groups should define conditions of interest which you want to be related to other data fields (attributes). For example, how 'age < 31' or 'occupation = manager or director' are related to other attributes.