 |
 |
 |
|
Software
|   |
| Model Monitor: A Toolkit for evaluating, comparing, and monitoring the effectiveness of classification models under distribution shift |
Model Monitor is a Java toolkit for the systematic evaluation of classifiers under changes in distribution. It provides methods for detecting distribution shifts in data, comparing the performance of multiple classifiers under shifts in distribution, and evaluating the robustness of individual classifiers to distribution change. As such, it allows users to determine the best model (or models) for their data under a number of potential scenarios. Additionally, Model Monitor is fully integrated with the WEKA machine learning environment, so that a variety of commodity classifiers can be used if desired.
Techniques implemented in this package come primarily from the following sources:
- D.A. Cieslak and N.V. Chawla "Detecting Fracture Points in Classifier Performance", 7th IEEE Conference on Data Mining, pp. 123-132, 2007
- D.A. Cieslak and N.V. Chawla "A Framework for Monitoring Classifiers' Performance: When and Why Failure Occurs?", Knowledge and Information Systems 2008
|
Download |
Manual |
Paper |
|
|
| Condor Grid Analysis Software Package (GASP) |
Whether you are a first time Condor user or an advanced system administrator, job failure on the grid is inevitible. In a submission batch of 1000 jobs, one might observe 500 job failures, leaving the user with several questions: Why are some jobs evicted multiple times? Why do some jobs create Shadow Exceptions? Is a group of machines incapable of running a particular submission? All of these are difficult to answer due to the scale of the machine pool and jobs submitted. Failure may appear to occur at random, but often there is a pattern and the Condor Grid Analysis Software Package (GASP) is the tool to help you find it.
This software implements work from the following publications:
- Troubleshooting Thousands of Jobs on Production Grids Using Data Mining Techniques, David Cieslak, Nitesh Chawla, and Douglas Thain, IEEE Grid Computing, September 2008.
- Short Paper: Troubleshooting Distributed Systems via Data Mining, David Cieslak, Douglas Thain, Nitesh Chawla, IEEE Symposium on High Performance Distributed Computing (HPDC), Paris, France, June 2006.
|
Download |
Instructions |
Paper |
|
|
| WEKA SMOTE Implementation |
Located here is a SMOTE supervised instance filter implemented in Java for WEKA.
|
Download |
- |
- |
|
|
| Perl/C SMOTE+Undersample Wrapper Implementation |
Learning from imbalanced data sets presents a convoluted problem both from
the modeling and cost standpoints.
In particular, when a
class is of great interest but occurs relatively rarely such as in cases of
fraud, instances of disease, and regions of interest in large-scale
simulations, there is a corresponding high cost for
misclassification of rare events. Under such circumstances, generating
models with high minority class accuracy and with lower total
misclassification cost is necessary. It becomes important to apply resampling and/or cost-based
reweighting to improve the prediction of the minority class. However,
the question remains on how to effectively apply the sampling
strategy.
To that end,
we provide a wrapper paradigm that discovers
the amount of re-sampling for a dataset.
This method has produced favorable results compared to other imbalance methods and
some cost-sensitive learning methods --- MetaCost and
Cost-Sensitive Classifier.
In addition, we also obtain
the lowest cost per test example compared to any result we are aware of for the
KDD Cup-99 intrusion detection dataset.
|
Download |
- |
- |
|
|
| C Hellinger Distance Decision Tree Implementation |
Located here is software to train Hellinger Distance Decision Trees written in C.
|
Download |
- |
- |
|
|
|