Software




 
Model Monitor: A Toolkit for evaluating, comparing, and monitoring the effectiveness of classification models under distribution shift

Model Monitor is a Java toolkit for the systematic evaluation of classifiers under changes in distribution. It provides methods for detecting distribution shifts in data, comparing the performance of multiple classifiers under shifts in distribution, and evaluating the robustness of individual classifiers to distribution change. As such, it allows users to determine the best model (or models) for their data under a number of potential scenarios. Additionally, Model Monitor is fully integrated with the WEKA machine learning environment, so that a variety of commodity classifiers can be used if desired.

Techniques implemented in this package come primarily from the following sources:

  • D.A. Cieslak and N.V. Chawla "Detecting Fracture Points in Classifier Performance", 7th IEEE Conference on Data Mining, pp. 123-132, 2007
  • D.A. Cieslak and N.V. Chawla "A Framework for Monitoring Classifiers' Performance: When and Why Failure Occurs?", Knowledge and Information Systems 2008
Download Manual Paper
Condor Grid Analysis Software Package (GASP)

Whether you are a first time Condor user or an advanced system administrator, job failure on the grid is inevitible. In a submission batch of 1000 jobs, one might observe 500 job failures, leaving the user with several questions: Why are some jobs evicted multiple times? Why do some jobs create Shadow Exceptions? Is a group of machines incapable of running a particular submission? All of these are difficult to answer due to the scale of the machine pool and jobs submitted. Failure may appear to occur at random, but often there is a pattern and the Condor Grid Analysis Software Package (GASP) is the tool to help you find it.

This software implements work from the following publications:

  • Troubleshooting Thousands of Jobs on Production Grids Using Data Mining Techniques, David Cieslak, Nitesh Chawla, and Douglas Thain, IEEE Grid Computing, September 2008.
  • Short Paper: Troubleshooting Distributed Systems via Data Mining, David Cieslak, Douglas Thain, Nitesh Chawla, IEEE Symposium on High Performance Distributed Computing (HPDC), Paris, France, June 2006.
Download Instructions Paper
WEKA SMOTE Implementation

Located here is a SMOTE supervised instance filter implemented in Java for WEKA.

Download - -
Perl/C SMOTE+Undersample Wrapper Implementation

Learning from imbalanced data sets presents a convoluted problem both from the modeling and cost standpoints. In particular, when a class is of great interest but occurs relatively rarely such as in cases of fraud, instances of disease, and regions of interest in large-scale simulations, there is a corresponding high cost for misclassification of rare events. Under such circumstances, generating models with high minority class accuracy and with lower total misclassification cost is necessary. It becomes important to apply resampling and/or cost-based reweighting to improve the prediction of the minority class. However, the question remains on how to effectively apply the sampling strategy. To that end, we provide a wrapper paradigm that discovers the amount of re-sampling for a dataset. This method has produced favorable results compared to other imbalance methods and some cost-sensitive learning methods --- MetaCost and Cost-Sensitive Classifier. In addition, we also obtain the lowest cost per test example compared to any result we are aware of for the KDD Cup-99 intrusion detection dataset.

Download - -
C Hellinger Distance Decision Tree Implementation

Located here is software to train Hellinger Distance Decision Trees written in C.

Download - -


free web stats