欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Data Mining Concepts and Techniques 3rd 读书笔记(1)

程序员文章站 2022-03-09 17:43:32
...

=============第一章:DM介绍=================

Data mining的范畴:

  • data collection and database creation
  • data management (including data storage and retrieval, and database transaction processing)
  • advanced data analysis (involving data warehousing and data mining).


Data mining的步骤:

  1. Data cleaning (to remove noise and inconsistent data)
  2. Data integration (where multiple data sources may be combined)
  3. Data selection (where data relevant to the analysis task are retrieved fromthe database)
  4. Data transformation (where data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations, for instance)
  5. Data mining (an essential process where intelligent methods are applied in order to extract data patterns)
  6. Pattern evaluation (to identify the truly interesting patterns representing knowledge based on some interestingness measures)
  7. Knowledge presentation (where visualization and knowledge representation techniques are used to present the mined knowledge to the user)


Data来源:db;dw;交易数据;文本;多媒体数据;流数据;web数据。

Data mining的分类——2大类 Descriptive mining 和 Predictive mining:

  • Concept/Class Description: Characterization and Discrimination
  • Mining Frequent Patterns, Associations, and Correlations
  • Classification and Prediction
  • Cluster Analysis
  • Outlier Analysis
  • Evolution Analysis


有意义的pattern:

  • easily understood by humans
  • valid on new or test data with some degree of certainty
  • potentially useful
  • novel

DM任务的要素(书本中用DMQL来描述这些要素)

  • The set of task-relevant data to be mined
  • The kind of knowledge to be mined
  • The background knowledge to be used in the discovery process
  • The interestingness measures and thresholds for pattern evaluation
  • The expected representation for visualizing the discovered patterns
DM当前面临的主要问题
  • Mining methodology and user interaction issues
  • Performance issues
  • Issues relating to the diversity of database types