🎯 Project: Complete Data Visualization Analysis
This project will help you apply everything you've learned about data visualization. You'll analyze a dataset, create multiple visualizations, and extract insights using Matplotlib and Seaborn.
Visualization is crucial for understanding data before building ML models. This project will give you hands-on experience with real visualization workflows.
Exploring Data with Visualizations
Visualizations help you understand your data. Here's a complete workflow:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
data = {
'month': ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'],
'sales': [100, 120, 140, 130, 150, 180],
'region': ['North', 'South', 'North', 'South', 'North', 'South'],
'product': ['A', 'B', 'A', 'B', 'A', 'B']
}
df = pd.DataFrame(data)
print("Dataset Overview:")
print(df)
print(f"\nDataset shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
print("\nVisualization Plan:")
print(" 1. Time series: Sales trend over months")
print(" 2. Comparison: Sales by region")
print(" 3. Distribution: Sales distribution")
print(" 4. Relationship: Sales vs other features")
Creating Multiple Visualizations
A complete analysis requires multiple visualization types:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.DataFrame({
'month': ['Jan', 'Feb', 'Mar', 'Apr', 'May'],
'sales': [100, 120, 140, 130, 150],
'region': ['North', 'South', 'North', 'South', 'North']
})
print("Visualization 1: Time Series (Line Plot)")
print(" plt.plot(df['month'], df['sales'])")
print(" Shows trend over time")
print("\nVisualization 2: Comparison (Bar Chart)")
print(" sns.barplot(data=df, x='region', y='sales')")
print(" Compares sales by region")
print("\nVisualization 3: Distribution (Histogram)")
print(" sns.histplot(data=df, x='sales')")
print(" Shows sales distribution")
print("\nVisualization 4: Statistical Summary (Box Plot)")
print(" sns.boxplot(data=df, x='region', y='sales')")
print(" Shows quartiles and outliers")
Combining Visualizations
Create comprehensive dashboards with multiple plots:
import matplotlib.pyplot as plt
import seaborn as sns
fig = plt.figure(figsize=(14, 10))
print("Dashboard Layout:")
print(" Top Row: Overview plots (trend, summary)")
print(" Bottom Row: Detailed analysis (distribution, comparison)")
print("\nSubplot Structure:")
print(" [0,0] - Time series line plot")
print(" [0,1] - Summary statistics bar chart")
print(" [1,0] - Distribution histogram")
print(" [1,1] - Comparison box plot")
print("\nBenefits of dashboard:")
print(" - See multiple perspectives at once")
print(" - Identify patterns and outliers")
print(" - Communicate insights effectively")
Exercise: Complete Visualization Project
Complete the exercise on the right side:
- Task 1: Create a DataFrame with sales data (month, sales, region)
- Task 2: Create a line plot showing sales trend over months
- Task 3: Create a bar chart comparing sales by region
- Task 4: Calculate and print summary statistics (mean, max, min sales)
- Task 5: Create a dashboard with 2x2 subplots showing all visualizations
Write your code to complete this comprehensive visualization project!