Contact us

PySpark for Data Science - III: Data Cleaning 

and Analysis

  • Dive into the world of big data processing with PySpark, the Python library for Apache Spark.
  • Learn how to process, analyze, and derive insights from massive datasets using Python’s user-friendly
    interface.
  • Elevate your data skills with PySpark. Dive deep into distributed data processing, machine learning,
    streaming, and more to navigate the vast oceans of big data.

Created by Selva Prabhakaran

  • English

  • English Captions

What you will learn

01

Identifying
Variable Types

02

Outlier Detection
and Treatment

03

Identifying and Removing
Duplicates

04

Feature Encoding with PySpark

05

Missing Value Imputation with PySpark

06

Feature scaling with 

PySpark

07

Feature Extraction /
Dimensionality Reduction

Course Curriculum

Requirements

  • Courses Page1 Basics of Python
  • Courses Page1 Foundational knowledge of Data Science
  • Courses Page1 High school maths

Who should attend this course?

  • Data Science Aspirants

  • Data Science Professionals

  • Software/Data engineers interested in quantitative analysis

  • Professionals working with large datasets

  • Data analysts, economists, researchers

About the course

You will learn the following skills by the end of the course:

  • LightGBM
  • XGBoost Random
  • Forest Decision Tree
  • Logistic Regression
  • Hyperparameter
  • Tuning Feature Importance Confusion Matrix
  • ROC AUC
  • Concordance and Discordance
  • Precision Recall Curve
  • Capture Rates and Gains
  • Feature Engineering
  • Label Encoding
  • Frequency Encoding
  • Chi-Square test ANOVA test
  • Exploratory Data Analysis
  • Memory
  • Optimization
  • Data Preprocessing

Instructor

Selva Prabhakaran Principal Data Scientist

My name is Selva, and I am super excited to mentor you on this project!

I head the Data Science team for a global Fortune 500 company and over the last 10 years of my data science experience I’ve deployed 20+ global products. I’m also the Founder & Chief Author of Machine Learning Plus, which has over 4M annual readers.

I specialize in covering the in-depth intuition and maths of any concept or algorithm. And based on my existing student requests, I’ve put up the series of courses and projects with detailed explanations – just like an on the job experience. Hope you love it!

  • 4.8+Instructor rating

  • 200+ reviews

  • 75K+students

  • 40+ Courses

Launch your GraphyLaunch your Graphy
100K+ creators trust Graphy to teach online
machinelearningplus 2024 Privacy policy Terms of use Contact us Refund policy