Automating Data Exploration with R 4.4 (172 ratings) Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
What you’ll learn
- Build a pipeline to automate the processing of raw data for discovery and modeling
- Know the main steps to prepare data for modeling
- Know how to handle the different data types in R
- Understand data imputation
- Treat categorical data properly with binarization (making dummy columns)
- Apply feature engineering to dates, integers and real numbers
- Apply variable selection, correlation and significance tests
- Model and measure prepared data using both supervised and unsupervised modeling
As data scientists and analysts we face constant repetitive task when approaching new data sets. This class aims at automating a lot of these tasks in order to get to the actual analysis as quickly as possible. Of course, there will always be exceptions to the rule, some manual work and customization will be required. But overall a large swath of that work can be automated by building a smart pipeline. This is what we’ll do here. This is especially important in the era of big data where handling variables by hand isn’t always possible.
It is also a great learning strategy to think in terms of a processing pipeline and to understand, design and build each stage as separate and independent units.Who this course is for:
- Interest and need to process raw data for exploration and modeling in R
Automate Data Exploration and Treatment. Automated data exploration process for analytic tasks and predictive modeling, so that users could focus on understanding data and extracting insights. The package scans and analyzes each variable, and visualizes them with typical graphical techniques.
available to treat and format data.