Creating Visualizations with Matplotlib
Visualizations help you understand your data, identify patterns, and communicate insights. Matplotlib is Python's primary plotting library, and mastering it is essential for data analysis and ML.
Good visualizations can reveal trends, outliers, and relationships that numbers alone cannot show. They're crucial for exploratory data analysis before building ML models.
Line Plots
Line plots are perfect for showing trends over time or relationships between variables:
import matplotlib.pyplot as plt
import numpy as np
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
sales = [100, 120, 140, 130, 150, 180]
plt.figure(figsize=(8, 5))
plt.plot(months, sales, marker='o', linewidth=2, color='#22d3ee')
plt.title('Monthly Sales Trend', fontsize=14, fontweight='bold')
plt.xlabel('Month')
plt.ylabel('Sales ($)')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
print("Line plot created showing sales trend over 6 months")
Bar Charts
Bar charts are ideal for comparing categories or discrete values:
import matplotlib.pyplot as plt
categories = ['Product A', 'Product B', 'Product C', 'Product D']
revenue = [45000, 52000, 38000, 61000]
plt.figure(figsize=(8, 5))
plt.bar(categories, revenue, color=['#22d3ee', '#06b6d4', '#a855f7', '#8b5cf6'])
plt.title('Revenue by Product', fontsize=14, fontweight='bold')
plt.xlabel('Product')
plt.ylabel('Revenue ($)')
plt.xticks(rotation=45)
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()
print("Bar chart created comparing revenue across products")
Scatter Plots
Scatter plots show relationships between two continuous variables:
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(42)
hours_studied = np.random.randint(10, 50, 30)
test_scores = hours_studied * 2 + np.random.randint(-10, 10, 30)
plt.figure(figsize=(8, 5))
plt.scatter(hours_studied, test_scores, alpha=0.6, s=100, color='#22d3ee')
plt.title('Study Hours vs Test Scores', fontsize=14, fontweight='bold')
plt.xlabel('Hours Studied')
plt.ylabel('Test Score')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
print("Scatter plot shows positive correlation between study hours and scores")
Histograms
Histograms show the distribution of a single variable:
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(42)
ages = np.random.normal(35, 10, 1000)
plt.figure(figsize=(8, 5))
plt.hist(ages, bins=30, color='#22d3ee', edgecolor='black', alpha=0.7)
plt.title('Age Distribution', fontsize=14, fontweight='bold')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()
print("Histogram shows the distribution of ages in the dataset")
Multiple Subplots
You can create multiple plots in one figure using subplots:
import matplotlib.pyplot as plt
import numpy as np
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
x = np.linspace(0, 10, 100)
axes[0, 0].plot(x, np.sin(x))
axes[0, 0].set_title('Sine Wave')
categories = ['A', 'B', 'C']
values = [10, 20, 15]
axes[0, 1].bar(categories, values)
axes[0, 1].set_title('Bar Chart')
x_scatter = np.random.randn(50)
y_scatter = np.random.randn(50)
axes[1, 0].scatter(x_scatter, y_scatter)
axes[1, 0].set_title('Scatter Plot')
data = np.random.normal(0, 1, 1000)
axes[1, 1].hist(data, bins=30)
axes[1, 1].set_title('Histogram')
plt.tight_layout()
plt.show()
print("Created 4 different plots in one figure using subplots")
Exercise: Create Visualizations
Complete the exercise on the right side:
- Task 1: Create a line plot showing temperature over 7 days
- Task 2: Create a bar chart comparing sales for 4 products
- Task 3: Create a scatter plot showing the relationship between two variables
- Task 4: Add titles and labels to all plots
Write your code to create these visualizations! (Note: In this environment, plots will be described in text output)
💡 Learning Tip
Practice is essential. Try modifying the code examples, experiment with different parameters, and see how changes affect the results. Hands-on experience is the best teacher!
🎉
Lesson Complete!
Great work! Continue to the next lesson.