Thursday, 1 October 2015

Data Mining

Before understanding the concept of the data mining, we should go through the term data warehouse. Now the question is “What is data warehouse?”

DATA WAREHOUSE

 A data warehouse integrates data from multiple data sources. It is a system used for reporting and data analysis. They store current and historical data and are used for creating analytical reports for knowledge workers throughout the enterprise.  Examples of reports could range from annual and quarterly comparisons and trends to detailed daily sales analyses. This is a reason that data warehouse, also known as an Enterprise data warehouse (EDW).
Data warehousing is defined as a process of centralized data management and retrieval. Data warehousing, like data mining, is a relatively new term although the concept itself has been around for years. Data warehousing represents an ideal vision of maintaining a central repository of all organizational data.


DATA MINING



It is rightfully said that data is money in today’s world. Along with the transition to an app-based world comes the exponential growth of data. However, most of the data is unstructured and hence it takes a process and method to extract useful information from the data and transform it into understandable and usable form. This is where data mining comes into picture. Plenty of tools are available for data mining tasks using artificial intelligence, machine learning and other techniques to extract data.

Data mining is the  process of discovering patterns in large data sets with artificial intelligence, machine learning, statistics, and database systems. It is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases.





Data mining is primarily used today by companies with a strong consumer focus - retail, financial, communication, and marketing organizations. It enables these companies to determine relationships among "internal" factors such as price, product positioning, or staff skills, and "external" factors such as economic indicators, competition, and customer demographics. And, it enables them to determine the impact on sales, customer satisfaction, and corporate profits. Finally, it enables them to "drill down" into summary information to view detail transactional data.

Data mining has been used to:
  • Identify unexpected shopping patterns in supermarkets.
  • Optimize website profitability by making appropriate offers to each visitor.
  • Predict customer response rates in marketing campaigns.
  • Defining new customer groups for marketing purposes.
  • Predict customer defections: which customers are likely to switch to an alternative supplier in the near future.
  • Distinguish between profitable and unprofitable customers.
  • Improve yields in complex production processes by finding unexpected relationships between process parameters and defect rates.
  • Identify "wedge issues" and target political campaigns.
  • Identify suspicious (unusual) behavior, as part of a fraud detection process.

In short, Data Mining can be applied anywhere in your business or organization where you are interested in identifying and exploiting predictable outcomes.





Data mining Tasks

Data mining involves six common classes of tasks:

Anomaly detection  – The identification of unusual data records, that might be interesting or data errors that require further investigation.

Association rule learning  – Searches for relationships between variables. For example, a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis.

Clustering – is the task of discovering groups and structures in the data that are in some way or another "similar", without using known structures in the data.

Classification – is the task of generalizing known structure to apply to new data. For example, an e-mail program might attempt to classify an e-mail as "legitimate" or as "spam".

Regression – attempts to find a function which models the data with the least error.

Summarization – providing a more compact representation of the data set, including visualization and report generation.



No comments:

Post a Comment