CUBIST

CUBIST

Data mining is all about extracting patterns from an organization's stored or warehoused data. These patterns can be used to gain insight into aspects of the organization's operations, and to predict outcomes for future situations as an aid to decision-making.
Consultants/PTSAs
Faculty
Researchers
Staff
Students

Linux installation instructions

Use modules to load the applications available in the scientific application stack.
Please contact IT Linux Support or open a ticket with the IT Help Desk to install a new version of this application.

Loading CUBIST from the scientific application stack

$ module load cubist
$ cubist

Cubist builds rule-based predictive models that output values, complementing See5/C5.0 that predicts categories. For instance, See5/C5.0 might classify the percentage yield from some process as "high", "medium", or "low", whereas Cubist would output a number such as "7.3".

Cubist is a powerful tool for generating rule-based models that balance the need for accurate prediction against the requirements of intelligibility. Cubist models generally give better results than those produced by simple techniques such as multivariate linear regression, while also being easier to understand than neural networks.

Some important features:

  • Cubist has been designed to analyze substantial databases containing hundreds of thousands to millions of records and tens to thousands of numeric or nominal fields. If you have used neural networks or similar modeling tools, you'll be surprised by Cubist's speed! (Cubist also takes advantage of processors with up to eight cores in one or more CPUs (including Intel Hyper-Threading) to speed up model-building.)
  • To maximize interpretability, Cubist models are expressed as collections of rules, where each rule has an associated multivariate linear model. Whenever a situation matches a rule's conditions, the associated model is used to calculate the predicted value.
  • Cubist is available for Windows 7/8/10 and Linux.
  • Cubist is easy to use and does not presume advanced knowledge of Statistics or Machine Learning (although these don't hurt, either!)
  • RuleQuest provides C source code so that models constructed by Cubist can be embedded in your organization's own systems.