DTREG
  • DTREG is the ideal tool for modeling business and medical data with categorical variables such as sex, race and marital status.

  • Decision trees present a clear, logical model that can be understood easily by people who are not mathematically inclined.

  • If you have a need for linear or nonlinear regression analysis, check out the NLREG program.

  • You also should check out the News Rover program that automatically scans Usenet newsgroups, downloads messages of interest to you, decodes binary file attachments, reconstructs files split across multiple messages, and eliminates spam and duplicate files. News Rover also has a built-in MP3 music search engine and can quickly locate music files on any Usenet newsgroup.

    DTREG
    Software For Predictive Modeling and Forecasting

    DTREG offers the most powerful predictive modeling methods:

    DTREG also can perform time series analysis and forecasting.

    DTREG includes Correlation, Factor Analysis, Principal Components Analysis, and PCA Transformations of variables

    The process of extracting useful information from a set of data values is called “data mining”. This data can be used to create models to make predictions. Many techniques have been developed for predictive modeling, and there is an art to selecting and applying the best method for a particular situation. DTREG implements the most powerful predictive modeling methods that have been developed indlucing, TreeBoost and Decision Tree Forests as well as Neural Networks, Support Vector Machine, Gene Expression Programming and Symbolic Regression, K-Means Clustering, Linear Discriminant Analysis, Linear Regression models and Logistic Regression models.

    Benchmarks have shown these methods to be highly effective for analyzing and modeling many types of data.

    DTREG Training Tutorials

    Predictive modeling has great commercial and scientific value. Consider these cases:
    • A company has collected data showing how much of their product consumers buy. For each consumer, the company has demographic and economic information such as age, gender, education, hobbies, income and occupation. Since the company has a limited advertising budget, they want to determine how to use the demographic data to predict which people are the most likely buyers of their product so they can focus their advertising on that group. A predictive model is an excellent tool for this type of analysis because it shows which combination of attributes best predict the purchase of the product. And, a predictive model can be used to “score” a set of individuals and rank them by the probability that they will respond positively to a marketing effort. Lift and Gain tables and charts generated by DTREG are an ideal way to gauge the potential marketing benefits from a predictive model. Predictive models are a valuable tool for customer relationship management (CRM).

    • A political campaign wants to maximize the turnout of their supporters on Election Day. Exit polling has been done during previous elections giving a breakdown of voting patterns by precinct, race, gender, age and other factors. DTREG can analyze this data and generate a predictive model identifying which sets of voters should be targeted for get-out-the-vote efforts for upcoming elections.

    • A bank wants to reduce the default rate on personal loans. Using historical data collected for previous borrowers, the bank can use DTREG to generate a predictive model that can then be used to “score” candidate borrowers to predict the likelihood that they will default on their loans.

    • An emergency room treats patients with chest pain. Based on factors such as blood pressure, age, gender, severity of pain, location of pain, and other measurements, the caregiver must decide whether the pain indicates a heart attack or some less critical problem. A predictive model can be generated to decide which patients require immediate attention.

    Features of Decision Tree Based Models:

    • Decision trees are easy to build. Just feed a dataset into DTREG, and it will do all the work of building a decision tree, support vector machine (SVM), gene expression programming, K-Means clustering, linear discriminant function, linear regression or logistic regression model.
    • Decision trees are easy to understand. Decision trees provide a clear, logical representation of the data model. They can be understood and used by people who are not mathematically gifted.
    • Decision trees handle both continuous and categorical variables. Categorical variables such as gender, race, religion, marital status and geographic region are difficult to model using numerically-oriented techniques such as regression. In contrast, categorical variables are handled easily by decision trees.
    • Decision trees can perform classification as well as regression. The predicted value from a decision tree is not simply a numerical value but can be a predicted category such as male/female, malignant/benign, frequent buyer/occasional buyer, etc.
    • Decision trees automatically handle interactions between variables. There may be significant differences between men/women, people living in the North and the South, etc.; these effects are known as variable interactions. Decision trees automatically deal with these interactions by partitioning the cases and then analyzing each group separately.
    • Highly accurate "ensemble" tree models. DTREG provides classical, single-tree models and also TreeBoost and Decision Tree Forest models. For many applications these "ensemble" tree method produce the most accurate results of any modeling methods.
    • Decision trees identify important variables. By examining which variables are used to split nodes near the top of the tree, you can quickly determine the most important variables. DTREG carries this further by analyzing all of the splits generated by each variable and the selection of surrogate splitters. A table ranking overall variable importance is included in the analysis report.

    Features of Neural Network models:

    • Wide applicability. Neural networks have been successfully applied to a wide variety of classification and regression problems. Neural networks have the theoretical capability of modeling any type of function.
    • Accuracy. Probabilistic neural networks are extremely accurate and fast to train.
    • DTREG variety. DTREG supports 3- and 4-layer perceptron network models, Radial Basis Function (RBF) neural networks, self-organizing Cascade Correlation neural networks, GMDH polynomial networks, Probabilistic neural networks and General Regression neural networks.
    • Automated architecture. DTREG includes an automated search for the optimal number of hidden neurons.

    Features of Support Vector Machine (SVM) models:

    • SVM is a modern outgrowth of artificial neural networks. Support Vector Machine models are close cousins to neural networks. In fact, a SVM model using a sigmoid kernel function is equivalent to a two-layer, feed-forward neural network.
    • Highly accurate models.
    • Classification and Regression analyses. The DTREG implementation of SVM models supports binary and multi-class classification problems as well are regression. DTREG implements the most popular kernel functions including radial basis functions, sigmoid, polynomial and linear.
    • Automatic grid search and pattern search for optimal parameters. The accuracy of SVM models depends on selecting appropriate parameter values. DTREG provides an automatic grid and pattern search facility that allows it to iterate through ranges of parameters and perform cross-validation to find the optimal parameter values.
    • Model building performance. The DTREG implementation of SVM is capable of handling very large problems. Kernel matrix row caching, shrinking heuristics to eliminate outlying vectors and an SMO-type algorithm are used to boost the speed of modeling.
    • Continuous, categorical and non-numeric variables. DTREG supports continuous and categorical (nominal) variables. Categorical variables can have symbolic values such as "Male"/"Female", "Live"/"Die", etc.
    • Missing value substitution. If there are scattered missing values for predictor variables, DTREG can replace those missing values with median values so that the case can be salvaged and the other, non-missing variable values used to the maximum extent.
    • V-fold cross validation. DTREG provides V-fold cross validation both during the search process to select the optimal parameters and as a verification method for the final model. You also have the option of using a hold-back sample for verification.

    Features of Gene Expression Programming - Symbolic Regression models:

    • Gene Expression Programming is a new, highly efficient genetic algorithm that evolves symbolic expressions to fit data.
    • GEP expressions are usually very compact and ideal for implementation in real-time control systems with embedded processors.
    • DTREG can evolve both mathematical and logical expressions.
    • DTREG fully supports categorical target and predictor variables.
    • Parsimony pressure and post-training simplification can be used to simplify expressions.
    • Random constants are supported and nonlinear regression is used to optimize their final values.

    Time Series Analysis and Forecasting

    The Enterprise Version of DTREG includes a full time series modeling and forecasting facility. Some of the features are:

    • Automatic generation of lag, moving average, slope and trend variables.
    • Intervention variables
    • Automatic trend removal and variance stabilization
    • Autocorrelation calculation
    • Validation using hold-out rows at the end of the series
    • Several charts showing actual, validation, predicted, trend and residual values.

    DTREG Features

    • Ease of use. DTREG is a robust application that is installed easily on any Windows system. DTREG reads Comma Separated Value (CSV) data files that are easily created from almost any data source. Once you create your data file, just feed it into DTREG, and let DTREG do all of the work of creating a decision tree, Support Vector Machine, K-Means clustering, Linear Discriminant Function, Linear Regression or Logistic Regression model. Even complex analyses can be set up in minutes.
    • Classification and Regression Trees. DTREG can build Classification Trees where the target variable being predicted is categorical and Regression Trees where the target variable is continuous like income or sales volume.
    • Single-tree, TreeBoost, Decision Tree Forests, Support Vector Machine, K-Means clustering, Linear Discriminant Analysis, Linear Regression and Logistic Regression. By simply checking a button, you can direct DTREG to build a classic single-tree model, a TreeBoost model consisting of a series of trees a Decision Tree Forest, a Neural Network, a Support Vector Machine, a Gene Expression Programming, a K-Means Clustering, a Linear Discriminant Analysis function a Linear Regression model. or a Logistic Regression model.
    • Automatic tree pruning. DTREG uses V-fold cross-validation to determine the optimal tree size. This procedure avoids the problem of "overfitting" where the generated tree fits the training data well but does not provide accurate predictions of new data.
    • Surrogate variables for missing data. DTREG uses a sophisticated technique involving "surrogate variables" to handle cases with missing values. This allows cases with some available values and some missing values to be utilized to the maximum extent when building the model. It also enables DTREG to predict the values of cases that have missing values.
    • Visual display of the tree. DTREG can display the generated decision tree on the screen, write it to a .jpg or .png disk file or print it. When printed, DTREG uses a sophisticated technique for paginating trees that cross multiple pages.
    • DTREG accepts text data as well as numeric data. If you have categorical variables with data values such as “Male”, “Female”, “Married”, “Protestant”, etc., there is no need to code them as numeric values.
    • Data Transformation Language (DTL). DTREG includes a full Data Transformation Language (DTL) programming language for transforming variables, creating new variables and selecting which cases are to be included in the analysis.
    • Project files for saving analyses. DTREG saves all of the information about variables, analysis parameters as well as the generated report and tree in a project file. You can later open the project file, alter parameters or rerun it with a different dataset.
    • Scoring to predict values. Once a decision tree has been built, you can use DTREG to "score" a new dataset and predict values for the target variable.
    • Generated scoring source code. The "Translate" function in DTREG generates C, C++ and SAS® source code to compute predicted values. This source code can be included in application programs to perform high performance scoring of large volumes of data.
    • Heavy duty capability. The Enterprise Version of DTREG can handle an unlimited number of data rows -- hundreds of thousands or millions are no problem. DTREG can build classification trees with predictor variables that have hundreds of categories by using an efficient clustering algorithm. Many other decision tree programs limit predictor variables to 16 or less categories.
    • DTREG .NET Class Library. The DTREG .NET Class Library can be called from application programs to generate models and compute predicted target values using a model generated by DTREG.

    Roadmap to Understanding and Building Predictive Models

    Download demonstration copy of DTREG.

    Download manual for DTREG.

    Download manual for DTREG .NET Class Library.

    Order DTREG.

    Google Scholar search for published articles citing DTREG.

    The author of DTREG is available for consulting on data modeling and data mining projects.
    Contact via e-mail for information.