Data mining

the process of extracting and discovering patterns in large data sets

Data mining is a term from computer science. Sometimes it is also called knowledge discovery in databases (KDD). Data mining is about finding new information in a lot of data.[1] The information obtained from data mining is hopefully both new and useful.

In many cases, data are stored so they can be used later. The data are saved with a goal. For example, a store wants to save what has been bought. They want to do this to know how much they should buy themselves, to have enough to sell later. Saving this information, makes a lot of data. The data are usually saved in a database. The reason data are saved is called the first use.

Later, the same data can also be used to get other information that was not needed for the first use. The store might want to know now what kind of things people buy together when they buy at the store. (Many people who buy pasta also buy mushrooms for example.) That kind of information is in the data, and is useful, but was not the reason the data were saved. This information is new and can be useful. It is a second use for the same data.

Finding new information that can also be useful from data is called data mining.

Different kinds of data mining change

For data, there are a lot of different kinds of data mining for getting new information. Usually, prediction is involved. There is uncertainty in the predicted results. The following is based on the observation that there is a small green apple in which we can structurally adjust our data.

Some kinds of data mining are:

  • Pattern recognition: Trying to find similarities in the rows in the database, in the form of rules. Small → green. (Small apples are often green)
  • Using a Bayesian network: Trying to make something that can say how the different data attributes are connected/influence each other. The size and the colour are related. So if you know something about the size, you can guess the colour.
  • Using a Neural network: Trying to make a model like a brain, which is hard to understand, but a computer can tell that if the apple is green it has a higher chance to be sour, if we tell the computer the apple is green. So this is like a black box model, we do not know how it works, but it works.
  • Using Classification tree: With all other knowledge, trying to say what one other thing about the thing we are looking at will be. Here is an apple with a size, a colour and shininess, what will it taste like?

References change

  1. "What is Data Mining? | Data Basecamp". 2022-04-06. Retrieved 2022-06-20.