Data Warehousing and Data Mining (DWDM)

Objectives: Students will be enabled to understand and implement classical models and algorithms in data warehousing and data mining. They will learn how to analyze the data, identify the problems, and choose the relevant models and algorithms to apply. They will further be able to assess the strengths and weaknesses of various methods and algorithms and to analyze their behavior.
Syllabus

  • Data warehousing
  • SQL OLAP extensions
  • Multi-dimensional Join
  • Data warehouse performance
  • Data Analysis and Uncertainty
  • Classification and Prediction
  • Cluster Analysis
  • Association rules
Organization
The course organization is divided in two parts that are thaught in parallel: a data warehousing part and a data mining part. The exercises consist in doing a project alone or in groups of 2-3 students (more details below). Textbooks

Data Warehousing
  • M. Golfarelli, S. Rizzi. Data Warehouse Design: Modern Principles and Methodologies. McGraw-Hill, 2009. (recommended!)
  • R. Kimball, "The Data Warehouse Toolkit", 2nd edition.
  • W. H. Inmon, "Building the Data Warehouse", 3rd edition.
  • Selected papers
Data Mining

Lectures and Lecture Notes

The lecture notes for this course will be updated as we progress through the semester. The lecture notes of the DM part can be found in the syllabus of the data mining web page, while the lecture notes of the DW part can be found in the syllabus of the data warehousing web page.

Data Warehousing Part
1. WE, 03.10.2012 Data warehousing: introduction, business intelligence, data integration, OLTP vs. OLAP, methodological framework, DW definition [slides]
2. WE, 10.10.2012 Data warehousing: multidimensional modeling, cubes, facts, dimensions, DW design [slides]
3. WE, 17.10.2012 Data warehousing: more about dimensions, star scheme, snowflake scheme, DW implementation, DW applications [previous lecture]
4. WE, 24.10.2012 Data warehousing: case studies [slides]
5. WE, 31.10.1012 SQL OLAP extensions: SQL query expression, crosstabs, group by extensions, rollup, cube, grouping sets [slides] [sql]
6. WE, 07.11.2012 SQL OLAP extensions: analytic/window functions, ranking, moving window aggregates, densification [slides]
7. WE, 14.11.2012 Generalized multi-dimensional join: GMDJ definition, evaluation algorithms [slides] [Akinde et al. 11] [Chatziantoniou et al. 01] [Akinde et al. 02] [sql]
8. WE, 21.11.2012 Generalized multi-dimensional join: subqueries, optimization rules, reducing range to point queries, late initialization of result table, distributed evaluation [slides]
9. WE, 28.11.2012 DW performance: pre-aggregation, lattice framework, view selection [slides] [Harinarayan et al. 96] [Wu and Buchmann 98]
10. WE, 05.12.2012 DW performance: view selection, view maintenance, bitmap indexing [previous lecture]
11. WE, 12.12.2012 Extract-Transform-Load: ETL process, building dimensions and fact tables, extract, transform, load. [slides]
12. WE, 19.12.2012 Advanced modeling: changing dimensions, large-scale dimensional modeling, project management. [slides]
Data Mining Part
1. Tuesday, 09.10.2012 Data Mining: Introduction [slides]
2. Tuesday, 16.10.2012 Data Mining: Getting to know your data [slides]
3. Tuesday, 23.10.2012 Data Mining: Statistics [slides]
4-5. Tuesday, 06.11.2012 and 13.11.2012 Data Mining: Pattern Mining [slides]
6. Tuesday, 20.11.2012 Data Mining: Clustering: Partitioning Methods[slides]
7. Tuesday, 27.11.2012 Data Mining: Clustering: Hierarchical Methods [slides]
8. Tuesday, 04.12.2012 Data Mining: Density-based Methods and High Dimensional Clustering [slides]
9. Tuesday, 11.12.2012 Data Mining: Classification: Decision Trees [slides]
10. Tuesday, 08.01.2013 Data Mining: Classification: Bayes Classifier [slides]
11-12. Tuesday, 15.01.2013 Data Mining: Classification: Rule-based Classification, Lazy Learners, Prediction, Evaluation (to be updated next week) [slides]

  • Digg
  • Del.icio.us
  • StumbleUpon
  • Reddit
  • RSS

1 nhận xét:

subhi nói...

thx

Đăng nhận xét