Data Mining Concepts and Techniques 3rd 读书笔记(1)
程序员文章站
2022-03-09 17:43:32
...
=============第一章:DM介绍=================
Data mining的范畴:
- data collection and database creation
- data management (including data storage and retrieval, and database transaction processing)
- advanced data analysis (involving data warehousing and data mining).
Data mining的步骤:
- Data cleaning (to remove noise and inconsistent data)
- Data integration (where multiple data sources may be combined)
- Data selection (where data relevant to the analysis task are retrieved fromthe database)
- Data transformation (where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations, for instance)
- Data mining (an essential process where intelligent methods are applied in order to extract data patterns)
- Pattern evaluation (to identify the truly interesting patterns representing knowledge based on some interestingness measures)
- Knowledge presentation (where visualization and knowledge representation techniques are used to present the mined knowledge to the user)
Data来源:db;dw;交易数据;文本;多媒体数据;流数据;web数据。
Data mining的分类——2大类 Descriptive mining 和 Predictive mining:
- Concept/Class Description: Characterization and Discrimination
- Mining Frequent Patterns, Associations, and Correlations
- Classification and Prediction
- Cluster Analysis
- Outlier Analysis
- Evolution Analysis
有意义的pattern:
- easily understood by humans
- valid on new or test data with some degree of certainty
- potentially useful
- novel
DM任务的要素(书本中用DMQL来描述这些要素)
- The set of task-relevant data to be mined
- The kind of knowledge to be mined
- The background knowledge to be used in the discovery process
- The interestingness measures and thresholds for pattern evaluation
- The expected representation for visualizing the discovered patterns
- Mining methodology and user interaction issues
- Performance issues
- Issues relating to the diversity of database types