1.4 What kind of patterns can be mined

Data Mining for Generalisation

  • Data warehousing
    • multidimensional data model
    • Data cube technology
    • OLAP (online analytical processing)
  • Multidimensional class or concept description: Characterization and discrimination
  • Generalize, summarize, and contrast data characteristics, e.g., dry vs. wet region
  • Data characterised by a user specified class description can be retrieved from a warehouse
  • Data discrimination can compare features of selected classes
  • OLAP operations can be used to summarise data along user-specified dimensions

Frequent Patterns, Association and Correlation Analysis

  • Frequent patterns (or frequent itemsets)
    • What prescribed drugs are frequently taken together? What welfare payments are frequently received together?
  • Association, correlation vs. causality
    • A typical association rule: Tertiary Education -> Atheist  [10%, 20%]  (support, confidence)
    • Are strongly associated items also strongly correlated?
  • How to mine such patterns and rules efficiently in large datasets?
  • How to use such patterns for classification, clustering, and other applications?

Classification and Regression for Prediction

  • Classification  

    • Construct models (functions) based on some training examples (supervised learning)
    • Describe and distinguish classes or concepts for future prediction
    • Predict unknown discrete class labels
    • e.g., classify countries based on climate, or classify cars based on fuel efficiency
  • Regression (also called numerical prediction)

  • Construct models based on some training examples (supervised learning)

  • Predict unknown continuous values

  • e.g. predict weight from height and age, or predict precipitation based on geo-location and cloud patterns.

  • Typical methods - Decision trees, naïve Bayesian classification, support vector machines, neural networks, rule-based classification, pattern-based classification, logistic regression, …

  • Typical applications - Credit card or taxation fraud detection, direct marketing, classifying stars, diseases,  web-pages, …


Clustering/ Cluster Analysis