Back to basics: What is Data Mining? - My Datafication

25 September, 2018

Back to basics: What is Data Mining?


Definition

Data Mining is the process of discovering patterns and generate new valuable information from large data sets to solve problems and support decision making. It uses methodologies and techniques from the intersection of data management, statistics, and machine learning to identify previously unknown patterns, classify and group data and summarize previously unknown relationships.

Data Mining Most Useful Techniques


1. Clustering
A descriptive data mining technique which aims to group data objects, so that data in the same cluster are similar to one another and dissimilar to the objects in other clusters.

2. Classification
A predictive data mining technique that assigns items of a collection to target classes, i.e. categories. The goal of classification is to accurately predict the target class for each case in the data.

3. Regression
A predictive data mining technique predict a continuous variable, e.g. stock price, given a particular data set. Regression and classification are used to solve similar problems, but they are frequently confused. Both are predictive data mining techniques, but regression is used to predict a numeric or continuous value while classification assigns data into discrete categories, i.e. predicts the bucket the data objects falls into.

4. Association Rules
A rule-based descriptive data mining technique that explores the given data set and finds frequent patterns, correlations, associations, or causal structures. Given a set of transactions, association rule mining looks for rules to predict the occurrence of a specific item based on the occurrences of the other items in the transaction.

The above techniques can be grouped in two big categories Supervised learning (Classification and Regression) and Unsupervised learning (Clustering and Association Rules) which differ on how they process data. Supervised learning algorithms are trained to learn the mapping function that can use the input to produce the output. On the other hand, Unsupervised learning algorithms have only input data, and no output. The goal of unsupervised learning is to model the underlying structure or distribution of the data to identify hidden patterns and extract previously unknown knowledge.

Data Mining Tools

1. Rapidminer
2. R
3. Python
3. Weka
4. KNIME
5. Microsoft Analysis Services

and many more...

You can find here more interesting definitions every data scientist should know! If you have any topic or definitions you would like to hear about, just leave a comment below. If you like the blog, don't forget to "Like" the page on Facebook to keep up-to-date with the new posts.

Bibliography

Data Science Central
Towards Data Science
Department of Statistics, Columbia University


13 comments:

  1. This is a broad scope of dialects and toolboxs utilized by Data Scientists. data science course in pune

    ReplyDelete
  2. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
    Correlation vs Covariance
    Simple linear regression
    data science interview questions

    ReplyDelete
  3. After reading your article I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article.

    Simple Linear Regression

    Correlation vs Covariance

    ReplyDelete
  4. You guardians do an astounding web diary, and have some unfathomable substance. Continue doing extraordinary. bookkeeper data entry

    ReplyDelete
  5. What I discovered was huge numbers of these projects had exceptionally deceptive promotions and sites with unbelievable guarantees. data entry assistant

    ReplyDelete
  6. Many people who purchase the click bank data entry product fail and data entry outsource companies give up with data entry jobs. It takes some time to realize that they were actually scammed and lost their time and money.

    ReplyDelete
  7. I was taking a gander at some of your posts on this site and I consider this site is truly informational! Keep setting up.. receipt data entry

    ReplyDelete
  8. As of late there have been accounts of individuals in high expansion nations, for example, Zimbabwe purchasing Bitcoin so as to clutch what riches they have instead of see its worth decrease under the foolishness of its focal financial framework. bitcoin mixer

    ReplyDelete
  9. The group at SNO Coins knows about the expectation to absorb information related with purchasing and utilizing SNO Coins for the new clients. coin master

    ReplyDelete
  10. This Was An Amazing ! I Haven't Seen This Type of Blog Ever ! Thankyou For Sharing data science course in Hyderabad

    ReplyDelete
  11. Actually I read it yesterday but I had some thoughts about it and today I wanted to read it again because it is very well written. Mining Management System

    ReplyDelete