DTREG is the ideal tool for modeling business and
medical data with categorical variables such as sex, race and marital status.
Decision trees present a clear, logical model
that can be understood easily by people who are not mathematically inclined.
If you have a need for linear or nonlinear regression
analysis, check out the NLREG program.
You also should check out the News Rover program that automatically scans Usenet newsgroups, downloads messages of interest to you, decodes binary file attachments, reconstructs files split across multiple messages, and eliminates spam and duplicate files. News Rover also has a built-in MP3 music search engine and can quickly locate music files on any Usenet newsgroup.
Software For Predictive Modeling and Forecasting
DTREG also can perform time series analysis and forecasting.
DTREG includes Correlation, Factor Analysis, Principal Components Analysis, and PCA Transformations of variables
The process of extracting useful information from a set of data values is
called “data mining”. This data can be used to create
models to make predictions.
Many techniques have been developed for predictive modeling,
and there is an art to selecting and applying
the best method for a particular situation.
DTREG implements the most powerful predictive
modeling methods that have been developed indlucing,
Decision Tree Forests as well as
Support Vector Machine,
Gene Expression Programming and Symbolic Regression,
Linear Discriminant Analysis,
Linear Regression models and
Logistic Regression models.
Benchmarks have shown these methods to be
highly effective for analyzing and modeling many types of data.
Predictive modeling has great commercial and scientific value.
Consider these cases:
- A company has collected data showing how much of their product consumers buy.
For each consumer, the company has demographic and economic information such as age, gender,
education, hobbies, income and occupation. Since the company has a limited advertising budget,
they want to determine how to use the demographic data to predict which people are the most likely
buyers of their product so they can focus their advertising on that group.
A predictive model is an excellent tool for this type of analysis because it shows which combination
of attributes best predict the purchase of the product.
And, a predictive model can be used to
“score” a set of individuals and
rank them by the probability
that they will respond positively to a marketing effort. Lift and Gain
tables and charts generated by DTREG are an ideal way to gauge the potential marketing benefits
from a predictive model. Predictive models are a valuable tool for customer
relationship management (CRM).
- A political campaign wants to maximize the turnout of their supporters on Election Day.
Exit polling has been done during previous elections giving a breakdown of voting patterns by precinct,
race, gender, age and other factors.
DTREG can analyze this data and generate a predictive model
identifying which sets of voters should be targeted for get-out-the-vote
efforts for upcoming elections.
- A bank wants to reduce the default rate on personal loans.
Using historical data collected for previous borrowers, the bank can use DTREG to generate
a predictive model that can then be used to
“score” candidate borrowers
to predict the likelihood that they will default on their loans.
- An emergency room treats patients with chest pain. Based on factors such as blood pressure,
age, gender, severity of pain, location of pain, and other measurements, the caregiver must decide
whether the pain indicates a heart attack or some less critical problem.
A predictive model can be generated to decide which patients require
Features of Decision Tree Based Models:
- Decision trees are easy to build.
Just feed a dataset into DTREG, and it will do all the work
of building a decision tree, support vector machine (SVM),
gene expression programming, K-Means clustering,
linear discriminant function, linear regression or logistic regression model.
- Decision trees are easy to understand.
Decision trees provide a clear, logical representation of the data model.
They can be understood and used by people who are not mathematically gifted.
- Decision trees handle both continuous and categorical variables.
such as gender, race, religion, marital status and geographic region are difficult to model
using numerically-oriented techniques such as regression.
In contrast, categorical variables are handled easily by decision trees.
- Decision trees can perform classification as well as regression.
The predicted value from a decision tree is not simply a numerical value but can be a predicted
category such as male/female, malignant/benign, frequent buyer/occasional buyer, etc.
- Decision trees automatically handle interactions between variables.
There may be significant differences between men/women, people living in the North and the South, etc.;
these effects are known as variable interactions.
Decision trees automatically deal with these interactions by partitioning the cases and then
analyzing each group separately.
- Highly accurate "ensemble" tree models.
DTREG provides classical, single-tree models and also
Decision Tree Forest models.
For many applications these "ensemble" tree method produce the most
accurate results of any modeling methods.
- Decision trees identify important variables.
By examining which variables are used to split nodes near the top of the tree, you can
quickly determine the most important variables. DTREG carries this further by analyzing all
of the splits generated by each variable and the selection of surrogate splitters.
A table ranking overall variable importance is included in the analysis report.
Features of Neural Network models:
- Wide applicability.
Neural networks have been successfully applied to a wide variety of
classification and regression problems.
Neural networks have the theoretical capability of modeling any type of function.
Probabilistic neural networks are extremely accurate and fast to train.
- DTREG variety. DTREG supports 3- and 4-layer
perceptron network models, Radial Basis Function (RBF) neural networks,
self-organizing Cascade Correlation neural networks, GMDH polynomial networks,
Probabilistic neural networks and General Regression neural networks.
- Automated architecture.
DTREG includes an automated search for the optimal number of hidden neurons.
Features of Support Vector Machine (SVM) models:
- SVM is a modern outgrowth of artificial neural networks.
Support Vector Machine models are close cousins to neural networks.
In fact, a SVM model using a sigmoid kernel function is equivalent
to a two-layer, feed-forward neural network.
- Highly accurate models.
- Classification and Regression analyses.
The DTREG implementation of SVM models supports binary and multi-class
classification problems as well are regression. DTREG implements the most
popular kernel functions including radial basis functions, sigmoid,
polynomial and linear.
- Automatic grid search and pattern search for optimal parameters.
The accuracy of SVM models depends on selecting appropriate parameter
values. DTREG provides an automatic grid and pattern search facility that allows it
to iterate through ranges of parameters and perform cross-validation to
find the optimal parameter values.
- Model building performance.
The DTREG implementation of SVM is capable of handling very large problems.
Kernel matrix row caching, shrinking heuristics to eliminate outlying vectors
and an SMO-type algorithm are used to boost the speed of modeling.
- Continuous, categorical and non-numeric variables.
DTREG supports continuous and categorical (nominal) variables. Categorical
variables can have symbolic values such as "Male"/"Female", "Live"/"Die", etc.
- Missing value substitution. If there are scattered missing values
for predictor variables, DTREG can replace those missing values with
median values so that the case can be salvaged and the other, non-missing
variable values used to the maximum extent.
- V-fold cross validation.
DTREG provides V-fold cross validation both during the search process
to select the optimal parameters and as a verification method for the
final model. You also have the option of using a hold-back sample for
Features of Gene Expression Programming - Symbolic Regression models:
- Gene Expression Programming is a new, highly efficient genetic
algorithm that evolves symbolic expressions to fit data.
- GEP expressions are usually very compact and ideal for implementation
in real-time control systems with embedded processors.
- DTREG can evolve both mathematical and logical expressions.
- DTREG fully supports categorical target and predictor variables.
- Parsimony pressure and post-training simplification can be used to
- Random constants are supported and nonlinear regression is used to
optimize their final values.
Time Series Analysis and Forecasting
The Enterprise Version of DTREG includes a full time series modeling and forecasting facility.
Some of the features are:
- Automatic generation of lag, moving average, slope and trend variables.
- Intervention variables
- Automatic trend removal and variance stabilization
- Autocorrelation calculation
- Validation using hold-out rows at the end of the series
- Several charts showing actual, validation, predicted, trend and residual values.
- Ease of use. DTREG is a robust application that is installed easily on any Windows system.
DTREG reads Comma Separated Value (CSV) data files that are easily created from almost any data source.
Once you create your data file, just feed it into DTREG, and let DTREG do all of the work of creating
a decision tree, Support Vector Machine, K-Means clustering,
Linear Discriminant Function, Linear Regression or Logistic Regression model.
Even complex analyses can be set up in minutes.
- Classification and Regression Trees. DTREG can build Classification Trees where the
target variable being predicted is categorical and Regression Trees where the target variable is
continuous like income or sales volume.
- Single-tree, TreeBoost, Decision Tree Forests, Support Vector Machine,
K-Means clustering, Linear Discriminant Analysis, Linear Regression and
By simply checking a button, you can
direct DTREG to build a classic single-tree model, a
TreeBoost model consisting of a series of trees
a Decision Tree Forest,
a Neural Network,
a Support Vector Machine,
a Gene Expression Programming,
a K-Means Clustering,
a Linear Discriminant Analysis function
a Linear Regression model.
or a Logistic Regression model.
- Automatic tree pruning. DTREG uses V-fold cross-validation to determine the optimal tree
size. This procedure avoids the problem of "overfitting" where the generated tree fits the training
data well but does not provide accurate predictions of new data.
- Surrogate variables for missing data.
DTREG uses a sophisticated technique involving
"surrogate variables" to handle cases
with missing values. This allows cases with some available
values and some missing values to be utilized to the maximum extent when building the model.
It also enables DTREG to predict the values of cases that have missing values.
- Visual display of the tree. DTREG can display the generated decision tree on the screen,
write it to a .jpg or .png disk file or print it. When printed, DTREG uses a sophisticated technique
for paginating trees that cross multiple pages.
- DTREG accepts text data as well as numeric data.
If you have categorical variables with data values such as “Male”, “Female”, “Married”,
“Protestant”, etc., there is no need to code them as numeric values.
- Data Transformation Language (DTL). DTREG includes a full
Data Transformation Language (DTL) programming language for
transforming variables, creating new variables and selecting which cases are to be
included in the analysis.
- Project files for saving analyses. DTREG saves all of the information about variables,
analysis parameters as well as the generated report and tree in a project file. You can later
open the project file, alter parameters or rerun it with a different dataset.
- Scoring to predict values. Once a decision tree has been built, you can use DTREG to
"score" a new dataset and predict values for the target variable.
- Generated scoring source code. The "Translate" function in DTREG
generates C, C++ and SAS® source code to compute predicted values. This source
code can be included in application programs to perform high performance scoring
of large volumes of data.
- Heavy duty capability. The Enterprise Version of DTREG can
handle an unlimited number of
data rows -- hundreds of thousands or millions are no problem. DTREG can build classification trees
with predictor variables that have hundreds of categories by using an efficient clustering algorithm.
Many other decision tree programs limit predictor variables to 16 or less categories.
- DTREG .NET Class Library. The DTREG .NET Class Library can be called
from application programs to generate models and compute predicted target values using a model generated by
Roadmap to Understanding and Building Predictive Models
Download demonstration copy
Download manual for DTREG.
Download manual for DTREG .NET Class Library.
Google Scholar search for published articles citing DTREG.
The author of DTREG is available for consulting on data modeling and
data mining projects.
Contact via e-mail for information.