Over the last month, I educated myself on the topic, Data Mining. I learned a lot about the terminology used across the various industries, common processes, techniques, and business applications. My experience was both rewarding and challenging at the same time.
During the weeks, I conducted research and contributed to group discussions and creative projects on Data Mining topics. I applied the knowledge I gained from academic mentors, scholarly articles and books, and online video repositories to research projects associated with the following areas:
Data Mining Tools and Vendors (Review/Analysis)
Technology Enabling Business Analytics
Model-based Decision Making (Basic Concepts)
Data Mining Trends, Research, and Applications
One of my favorite projects I worked on this past month was linked to the data associated with the last 28-seasons of Dancing With The Stars (DWTS). The DWTS project allowed me to get my hands dirty while collecting, organizing, and cleaning data. This included leveraging the suite of functionality offered by Microsoft’s Power BI software to perform common Data Mining techniques and produce analytic insights via reports and/or visualizations.
Scholarly Book Review
I really enjoyed reading about Data Mining from authors, Han, Pei, and Kamber. Here’s an overview of their book according to the O’Reilly (2011) outline for eBooks.
Data Mining: Concepts and Techniques provides the concepts and techniques in processing gathered data or information, which will be used in various applications. Specifically, it explains data mining and the tools used in discovering knowledge from the collected data. This book is referred as the knowledge discovery from data (KDD). It focuses on the feasibility, usefulness, effectiveness, and scalability of techniques of large data sets. After describing data mining, this edition explains the methods of knowing, preprocessing, processing, and warehousing data. It then presents information about data warehouses, online analytical processing (OLAP), and data cube technology. Then, the methods involved in mining frequent patterns, associations, and correlations for large data sets are described. The book details the methods for data classification and introduces the concepts and methods for data clustering. The remaining chapters discuss the outlier detection and the trends, applications, and research frontiers in data mining.
This book is intended for Computer Science students, application developers, business professionals, and researchers who seek information on data mining.
Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects
Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields
Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of your data”
What I enjoyed most about the book, Data Mining: Concepts and Technologies, 3rd Edition, was the modern approach to explaining complex and simple topics such as Data Mining (DM) techniques, Advanced Algorithms (i.e. Clustering, Regression, Naive Bayes, etc.), and Industry-Standard DM process models/practices (i.e. CRISP-DM and SEMMA).