Objectives: Students will be enabled to understand and
implement classical models and algorithms in data warehousing and data
mining. They will learn how to analyze the data, identify the
problems, and choose the relevant models and algorithms to apply.
They will further be able to assess the strengths and weaknesses of
various methods and algorithms and to analyze their behavior.
Syllabus
- Data warehousing
- SQL OLAP extensions
- Multi-dimensional Join
- Data warehouse performance
- Data Analysis and Uncertainty
- Classification and Prediction
- Cluster Analysis
- Association rules
The course organization is divided in two parts that are thaught in parallel: a data warehousing part and a data mining part. The exercises consist in doing a project alone or in groups of 2-3 students (more details below). Textbooks
Data Warehousing
- M. Golfarelli, S. Rizzi. Data Warehouse Design: Modern Principles and Methodologies. McGraw-Hill, 2009. (recommended!)
- R. Kimball, "The Data Warehouse Toolkit", 2nd edition.
- W. H. Inmon, "Building the Data Warehouse", 3rd edition.
- Selected papers
- Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", Second Edition, 2006
- Margaret H. Dunham, "Data Mining: Introductory and Advanced Topics", Prentice Hall, 2003, ISBN: 0-13-088892-3
- Simon Haykin, "Neural Networks: A Comprehensive Foundation", Prentice Hall, 2005, ISBN: 0-13-147139-2
- Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, "Introduction to Data Mining", Pearson Addison Wesley, 2005, ISBN: 0-32-132136-7
Lectures and Lecture Notes
The lecture notes for this course will be updated as we progress through the semester. The lecture notes of the DM part can be found in the syllabus of the data mining web page, while the lecture notes of the DW part can be found in the syllabus of the data warehousing web page.Data Warehousing Part
1. | WE, 03.10.2012 | Data warehousing: introduction, business intelligence, data integration, OLTP vs. OLAP, methodological framework, DW definition [slides] |
2. | WE, 10.10.2012 | Data warehousing: multidimensional modeling, cubes, facts, dimensions, DW design [slides] |
3. | WE, 17.10.2012 | Data warehousing: more about dimensions, star scheme, snowflake scheme, DW implementation, DW applications [previous lecture] |
4. | WE, 24.10.2012 | Data warehousing: case studies [slides] |
5. | WE, 31.10.1012 | SQL OLAP extensions: SQL query expression, crosstabs, group by extensions, rollup, cube, grouping sets [slides] [sql] |
6. | WE, 07.11.2012 | SQL OLAP extensions: analytic/window functions, ranking, moving window aggregates, densification [slides] |
7. | WE, 14.11.2012 | Generalized multi-dimensional join: GMDJ definition, evaluation algorithms [slides] [Akinde et al. 11] [Chatziantoniou et al. 01] [Akinde et al. 02] [sql] |
8. | WE, 21.11.2012 | Generalized multi-dimensional join: subqueries, optimization rules, reducing range to point queries, late initialization of result table, distributed evaluation [slides] |
9. | WE, 28.11.2012 | DW performance: pre-aggregation, lattice framework, view selection [slides] [Harinarayan et al. 96] [Wu and Buchmann 98] |
10. | WE, 05.12.2012 | DW performance: view selection, view maintenance, bitmap indexing [previous lecture] |
11. | WE, 12.12.2012 | Extract-Transform-Load: ETL process, building dimensions and fact tables, extract, transform, load. [slides] |
12. | WE, 19.12.2012 | Advanced modeling: changing dimensions, large-scale dimensional modeling, project management. [slides] |
1. | Tuesday, 09.10.2012 | Data Mining: Introduction [slides] |
2. | Tuesday, 16.10.2012 | Data Mining: Getting to know your data [slides] |
3. | Tuesday, 23.10.2012 | Data Mining: Statistics [slides] |
4-5. | Tuesday, 06.11.2012 and 13.11.2012 | Data Mining: Pattern Mining [slides] |
6. | Tuesday, 20.11.2012 | Data Mining: Clustering: Partitioning Methods[slides] |
7. | Tuesday, 27.11.2012 | Data Mining: Clustering: Hierarchical Methods [slides] |
8. | Tuesday, 04.12.2012 | Data Mining: Density-based Methods and High Dimensional Clustering [slides] |
9. | Tuesday, 11.12.2012 | Data Mining: Classification: Decision Trees [slides] |
10. | Tuesday, 08.01.2013 | Data Mining: Classification: Bayes Classifier [slides] |
11-12. | Tuesday, 15.01.2013 | Data Mining: Classification: Rule-based Classification, Lazy Learners, Prediction, Evaluation (to be updated next week) [slides] |
1 nhận xét:
thx
Đăng nhận xét