Chapter 4: Data Visualization / Lesson 18

Seaborn Introduction

Introduction to Seaborn

Seaborn is a statistical data visualization library built on top of Matplotlib. It provides a high-level interface for creating attractive and informative statistical graphics with less code than Matplotlib.

Seaborn is particularly powerful for exploring relationships in datasets, creating publication-quality plots, and working with statistical models. It automatically handles many styling details that make plots more readable.

Why Use Seaborn?

Seaborn offers several advantages over Matplotlib:

  • Better Defaults: More attractive default styles and color palettes
  • Statistical Plots: Built-in support for statistical visualizations
  • Less Code: Create complex plots with fewer lines
  • DataFrame Integration: Works seamlessly with Pandas DataFrames

Basic Seaborn Plots

Seaborn makes it easy to create common statistical plots. Here are the basics:

seaborn_basics.py
# Basic Seaborn Visualizations import seaborn as sns import pandas as pd import matplotlib.pyplot as plt # Set style sns.set_style("whitegrid") # Sample data data = { 'x': [1, 2, 3, 4, 5], 'y': [2, 4, 6, 8, 10], 'category': ['A', 'B', 'A', 'B', 'A'] } df = pd.DataFrame(data) print("Seaborn makes statistical plots easy:") print(df) # Line plot with Seaborn # sns.lineplot(data=df, x='x', y='y') # plt.title('Seaborn Line Plot') # plt.show() print("\nSeaborn automatically handles styling and makes plots more attractive!")

Distribution Plots

Seaborn excels at showing data distributions:

distribution_plots.py
# Distribution Plots with Seaborn import seaborn as sns import pandas as pd import numpy as np # Generate sample data np.random.seed(42) data = np.random.normal(100, 15, 1000) # Mean=100, Std=15 df = pd.DataFrame({'values': data}) print("Distribution Plot Types:") print(" 1. Histogram: sns.histplot(data=df, x='values')") print(" 2. KDE Plot: sns.kdeplot(data=df, x='values')") print(" 3. Box Plot: sns.boxplot(data=df, y='values')") print(" 4. Violin Plot: sns.violinplot(data=df, y='values')") print("\nDistribution statistics:") print(f" Mean: {df['values'].mean():.2f}") print(f" Median: {df['values'].median():.2f}") print(f" Std Dev: {df['values'].std():.2f}")

Relationship Plots

Seaborn makes it easy to visualize relationships between variables:

relationship_plots.py
# Relationship Plots import seaborn as sns import pandas as pd # Sample data with relationship data = { 'hours_studied': [10, 15, 20, 25, 30, 35, 40], 'test_score': [60, 70, 75, 80, 85, 90, 95], 'subject': ['Math', 'Math', 'Science', 'Science', 'Math', 'Science', 'Math'] } df = pd.DataFrame(data) print("Relationship Plot Types:") print(" 1. Scatter Plot: sns.scatterplot(data=df, x='hours_studied', y='test_score')") print(" 2. Line Plot: sns.lineplot(data=df, x='hours_studied', y='test_score')") print(" 3. Reg Plot: sns.regplot(data=df, x='hours_studied', y='test_score')") print(" (Shows regression line automatically)") print("\nData:") print(df)

Categorical Plots

Seaborn excels at visualizing categorical data:

categorical_plots.py
# Categorical Plots import seaborn as sns import pandas as pd # Sample categorical data data = { 'product': ['A', 'B', 'C', 'A', 'B', 'C'], 'sales': [100, 150, 120, 110, 160, 130], 'region': ['North', 'South', 'North', 'South', 'North', 'South'] } df = pd.DataFrame(data) print("Categorical Plot Types:") print(" 1. Bar Plot: sns.barplot(data=df, x='product', y='sales')") print(" 2. Box Plot: sns.boxplot(data=df, x='product', y='sales')") print(" 3. Violin Plot: sns.violinplot(data=df, x='product', y='sales')") print(" 4. Count Plot: sns.countplot(data=df, x='product')") print("\nWith grouping (hue parameter):") print(" sns.barplot(data=df, x='product', y='sales', hue='region')") print(" (Groups by region within each product)") print("\nData:") print(df)

Heatmaps

Heatmaps are great for showing correlations or matrix data:

heatmap.py
# Creating Heatmaps import seaborn as sns import pandas as pd import numpy as np # Correlation matrix example data = { 'feature1': [1, 2, 3, 4, 5], 'feature2': [2, 4, 6, 8, 10], 'feature3': [5, 4, 3, 2, 1] } df = pd.DataFrame(data) # Calculate correlation correlation = df.corr() print("Correlation Matrix:") print(correlation) print("\nHeatmap:") print(" sns.heatmap(correlation, annot=True, cmap='coolwarm')") print(" - annot=True: Shows correlation values") print(" - cmap: Color scheme")

Exercise: Create Seaborn Visualizations

Complete the exercise on the right side:

  • Task 1: Create a DataFrame with sample data (x, y, category columns)
  • Task 2: Create a distribution plot (histogram) of the data
  • Task 3: Create a relationship plot (scatter plot) showing x vs y
  • Task 4: Create a categorical plot (bar chart) grouped by category

Write your code to create these Seaborn visualizations!

💡 Learning Tip

Practice is essential. Try modifying the code examples, experiment with different parameters, and see how changes affect the results. Hands-on experience is the best teacher!

🎉

Lesson Complete!

Great work! Continue to the next lesson.

main.py
📤 Output
Click "Run" to execute...