Before understanding the concept of the data mining, we
should go through the term data warehouse. Now the question is “What is data warehouse?”
DATA WAREHOUSE
A data warehouse integrates data from multiple data sources. It is a system used for
reporting and data analysis. They store current and historical
data and are used for creating analytical reports for knowledge workers
throughout the enterprise. Examples of
reports could range from annual and quarterly comparisons and trends to
detailed daily sales analyses. This is a reason that data warehouse, also known
as an Enterprise data warehouse (EDW).
Data warehousing is defined as a process of centralized data
management and retrieval. Data warehousing, like data mining, is a relatively
new term although the concept itself has been around for years. Data
warehousing represents an ideal vision of maintaining a central repository of
all organizational data.
DATA MINING
It is rightfully said that data is money in today’s world. Along with the transition to an app-based world
comes the exponential growth of data. However, most of the data is unstructured
and hence it takes a process and method to extract useful information from the
data and transform it into understandable and usable form. This is where data
mining comes into picture. Plenty of tools are available for data mining tasks
using artificial intelligence, machine learning and other techniques to extract
data.
Data mining is the process of discovering patterns in large
data sets with artificial intelligence, machine learning, statistics, and
database systems. It is the process of analyzing data from
different perspectives and summarizing it into useful information - information
that can be used to increase revenue, cuts costs, or both. Data mining software
is one of a number of analytical tools for analyzing data. It allows users to
analyze data from many different dimensions or angles, categorize it, and
summarize the relationships identified. Technically, data mining is the process
of finding correlations or patterns among dozens of fields in large relational
databases.
Data mining is primarily used today by companies with a
strong consumer focus - retail, financial, communication, and marketing
organizations. It enables these companies to determine relationships among
"internal" factors such as price, product positioning, or staff
skills, and "external" factors such as economic indicators, competition,
and customer demographics. And, it enables them to determine the impact on
sales, customer satisfaction, and corporate profits. Finally, it enables them
to "drill down" into summary information to view detail transactional
data.
Data mining has been used to:
- Identify unexpected shopping patterns in supermarkets.
- Optimize website profitability by making appropriate offers to each visitor.
- Predict customer response rates in marketing campaigns.
- Defining new customer groups for marketing purposes.
- Predict customer defections: which customers are likely to switch to an alternative supplier in the near future.
- Distinguish between profitable and unprofitable customers.
- Improve yields in complex production processes by finding unexpected relationships between process parameters and defect rates.
- Identify "wedge issues" and target political campaigns.
- Identify suspicious (unusual) behavior, as part of a fraud detection process.
In short, Data Mining can be applied anywhere in your
business or organization where you are interested in identifying and exploiting
predictable outcomes.
Data mining Tasks
Data mining
involves six common classes of tasks:
Anomaly detection – The identification of unusual data
records, that might be interesting or data errors that require further
investigation.
Association rule
learning – Searches
for relationships between variables. For example, a supermarket might gather
data on customer purchasing habits. Using association rule learning, the
supermarket can determine which products are frequently bought together and use
this information for marketing purposes. This is sometimes referred to as market
basket analysis.
Clustering –
is the task of discovering groups and structures in the data that are in some
way or another "similar", without using known structures in the data.
Classification – is the task of generalizing known
structure to apply to new data. For example, an e-mail program might attempt to
classify an e-mail as "legitimate" or as "spam".
Regression – attempts to find a function which
models the data with the least error.
Summarization – providing a more compact
representation of the data set, including visualization and report generation.
No comments:
Post a Comment