1 - The Python Data Science Stack
Python Libraries and PackagesUsing PandasData Type ConversionAggregation and GroupingExporting Data from PandasVisualization with Pandas
2 - Statistical Visualizations
Types of Graphs and When to Use ThemComponents of a GraphWhich Tool Should Be Used?Types of GraphsPandas DataFrames and Grouped DataChanging Plot Design: Modifying Graph ComponentsExporting Graphs
3 - Working with Big Data Frameworks
HadoopSparkWriting Parquet FilesHandling Unstructured Data
4 - Diving Deeper with Spark
Getting Started with Spark DataFramesWriting Output from Spark DataFramesExploring Spark DataFramesData Manipulation with Spark DataFramesGraphs in Spark
5 - Handling Missing Values and Correlation Analysis
Setting up the Jupyter NotebookMissing ValuesHandling Missing Values in Spark DataFramesCorrelation
6 - Exploratory Data Analysis
Defining a Business ProblemTranslating a Business Problem into Measurable Metrics and Exploratory Data Analysis (EDA)Structured Approach to the Data Science Project Life Cycle
7 - Reproducibility in Big Data Analysis
Reproducibility with Jupyter NotebooksGathering Data in a Reproducible WayCode Practices and StandardsAvoiding Repetition
8 - Creating a Full Analysis Report
Reading Data in Spark from Different Data SourcesSQL Operations on a Spark DataFrameGenerating Statistical Measurements
Actual course outline may vary depending on offering center. Contact your sales representative for more information.
Who is it For?
Big Data Analysis with Python is designed for Python developers, data analysts, and data scientists who want to get hands-on with methods to control data and transform it into impactful insights. Basic knowledge of statistical measurements and relational databases will help in understanding various concepts explained in this course.