Text data is ubiquitous and growing rapidly. Some examples of sources for variety of text data are Internet, Blogs, News, Email, Literature, Twitter, etc. These text data present some challenges for people. It is impossible for anyone to read all these data and digest it. There is a need for some tool that will help people digest this data. These text data can also be used to extract knowledge which can be used for better decision making. Product managers today use many data mining techniques to extract data from the feedback of customers and sales reports to improve the market growth.
Main techniques for harnessing Big Text Data are Text Retrieval and Text Mining.
The two terms Text Mining and Text Analytics are roughly the same. Mining emphasizes more the process, while Analytics focuses more on the result. In both cases we turn the text data into high-quality information or actionable knowledge so as to minimize human effort and supply knowledge for optimal decision making.
Text retrieval is an essential component and the per-processor in any text mining system.
Data can be broadly classified into 2 types;
- Text Data
- Non-Text Data (Numerical, Categorical, Relational, Video)
The non-text data are at times very important to extract knowledge. In the data mining software module, we have a number of different kind of mining algorithms. This is because for different types of data, we will need different types of algorithms.
- Pattern Discovery in Data Mining
- Text Retrieval and Search Engines
- Cluster Analysis
- Data Visualization