Lesson 18: Seaborn Introduction

Introduction to Seaborn

Seaborn is a statistical data visualization library built on top of Matplotlib. It provides a high-level interface for creating attractive and informative statistical graphics with less code than Matplotlib.

Seaborn is particularly powerful for exploring relationships in datasets, creating publication-quality plots, and working with statistical models. It automatically handles many styling details that make plots more readable.

Why Use Seaborn?

Seaborn offers several advantages over Matplotlib:

Better Defaults: More attractive default styles and color palettes
Statistical Plots: Built-in support for statistical visualizations
Less Code: Create complex plots with fewer lines
DataFrame Integration: Works seamlessly with Pandas DataFrames

Basic Seaborn Plots

Seaborn makes it easy to create common statistical plots. Here are the basics:

seaborn_basics.py
# Basic Seaborn Visualizations
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

# Set style
sns.set_style("whitegrid")

# Sample data
data = {
    'x': [1, 2, 3, 4, 5],
    'y': [2, 4, 6, 8, 10],
    'category': ['A', 'B', 'A', 'B', 'A']
}
df = pd.DataFrame(data)

print("Seaborn makes statistical plots easy:")
print(df)

# Line plot with Seaborn
# sns.lineplot(data=df, x='x', y='y')
# plt.title('Seaborn Line Plot')
# plt.show()

print("\nSeaborn automatically handles styling and makes plots more attractive!")

Distribution Plots

Seaborn excels at showing data distributions:

distribution_plots.py
# Distribution Plots with Seaborn
import seaborn as sns
import pandas as pd
import numpy as np

# Generate sample data
np.random.seed(42)
data = np.random.normal(100, 15, 1000)  # Mean=100, Std=15
df = pd.DataFrame({'values': data})

print("Distribution Plot Types:")
print("  1. Histogram: sns.histplot(data=df, x='values')")
print("  2. KDE Plot: sns.kdeplot(data=df, x='values')")
print("  3. Box Plot: sns.boxplot(data=df, y='values')")
print("  4. Violin Plot: sns.violinplot(data=df, y='values')")

print("\nDistribution statistics:")
print(f"  Mean: {df['values'].mean():.2f}")
print(f"  Median: {df['values'].median():.2f}")
print(f"  Std Dev: {df['values'].std():.2f}")

Relationship Plots

Seaborn makes it easy to visualize relationships between variables:

relationship_plots.py
# Relationship Plots
import seaborn as sns
import pandas as pd

# Sample data with relationship
data = {
    'hours_studied': [10, 15, 20, 25, 30, 35, 40],
    'test_score': [60, 70, 75, 80, 85, 90, 95],
    'subject': ['Math', 'Math', 'Science', 'Science', 'Math', 'Science', 'Math']
}
df = pd.DataFrame(data)

print("Relationship Plot Types:")
print("  1. Scatter Plot: sns.scatterplot(data=df, x='hours_studied', y='test_score')")
print("  2. Line Plot: sns.lineplot(data=df, x='hours_studied', y='test_score')")
print("  3. Reg Plot: sns.regplot(data=df, x='hours_studied', y='test_score')")
print("     (Shows regression line automatically)")

print("\nData:")
print(df)

Categorical Plots

Seaborn excels at visualizing categorical data:

categorical_plots.py
# Categorical Plots
import seaborn as sns
import pandas as pd

# Sample categorical data
data = {
    'product': ['A', 'B', 'C', 'A', 'B', 'C'],
    'sales': [100, 150, 120, 110, 160, 130],
    'region': ['North', 'South', 'North', 'South', 'North', 'South']
}
df = pd.DataFrame(data)

print("Categorical Plot Types:")
print("  1. Bar Plot: sns.barplot(data=df, x='product', y='sales')")
print("  2. Box Plot: sns.boxplot(data=df, x='product', y='sales')")
print("  3. Violin Plot: sns.violinplot(data=df, x='product', y='sales')")
print("  4. Count Plot: sns.countplot(data=df, x='product')")

print("\nWith grouping (hue parameter):")
print("  sns.barplot(data=df, x='product', y='sales', hue='region')")
print("  (Groups by region within each product)")

print("\nData:")
print(df)

Heatmaps

Heatmaps are great for showing correlations or matrix data:

heatmap.py
# Creating Heatmaps
import seaborn as sns
import pandas as pd
import numpy as np

# Correlation matrix example
data = {
    'feature1': [1, 2, 3, 4, 5],
    'feature2': [2, 4, 6, 8, 10],
    'feature3': [5, 4, 3, 2, 1]
}
df = pd.DataFrame(data)

# Calculate correlation
correlation = df.corr()

print("Correlation Matrix:")
print(correlation)

print("\nHeatmap:")
print("  sns.heatmap(correlation, annot=True, cmap='coolwarm')")
print("  - annot=True: Shows correlation values")
print("  - cmap: Color scheme")

Exercise: Create Seaborn Visualizations

Complete the exercise on the right side:

Task 1: Create a DataFrame with sample data (x, y, category columns)
Task 2: Create a distribution plot (histogram) of the data
Task 3: Create a relationship plot (scatter plot) showing x vs y
Task 4: Create a categorical plot (bar chart) grouped by category

Write your code to create these Seaborn visualizations!

Seaborn Introduction