Chapter 3: NumPy & Pandas / Lesson 14

DataFrames and Series

DataFrames and Series

Pandas has two main data structures: Series (1D) and DataFrame (2D). A Series is like a single column, while a DataFrame is like a table with multiple columns. Understanding both is essential for data manipulation in ML.

Series are useful for single-variable operations, while DataFrames handle multi-dimensional data—which is what you'll work with in most ML projects.

Understanding Series

A Series is a one-dimensional labeled array. Think of it as a single column from a spreadsheet:

series_basics.py
import pandas as pd # Create a Series from a list ages = pd.Series([25, 30, 35, 28, 32]) print("Series:") print(ages) print("\nData type:", ages.dtype) print("Mean:", ages.mean()) print("Max:", ages.max()) # Series with custom index names = pd.Series(['Alice', 'Bob', 'Charlie'], index=['a', 'b', 'c']) print("\nSeries with custom index:") print(names) print("Access by index:", names['a'])

Working with DataFrames

DataFrames are 2D structures with rows and columns. They're the primary tool for working with structured data:

dataframe_operations.py
import pandas as pd # Create DataFrame df = pd.DataFrame({ 'name': ['Alice', 'Bob', 'Charlie', 'Diana'], 'age': [25, 30, 35, 28], 'salary': [50000, 60000, 70000, 55000] }) print("DataFrame:") print(df) # Access columns (returns Series) print("\nAge column (Series):") print(df['age']) print("Type:", type(df['age'])) # Access rows print("\nFirst row:") print(df.iloc[0]) # Select multiple columns print("\nName and salary:") print(df[['name', 'salary']])

Series Operations

Series support many operations similar to NumPy arrays:

series_operations.py
import pandas as pd # Create Series s1 = pd.Series([1, 2, 3, 4, 5]) s2 = pd.Series([10, 20, 30, 40, 50]) print("Series 1:", s1) print("Series 2:", s2) # Arithmetic operations print("\nOperations:") print("Addition:", s1 + s2) print("Multiplication:", s1 * 2) print("Sum:", s1.sum()) print("Mean:", s1.mean()) # Boolean indexing print("\nValues > 3:", s1[s1 > 3])

DataFrame Methods

DataFrames have many useful methods for data exploration and manipulation:

dataframe_methods.py
import pandas as pd df = pd.DataFrame({ 'name': ['Alice', 'Bob', 'Charlie'], 'age': [25, 30, 35], 'salary': [50000, 60000, 70000] }) print("DataFrame info:") print(df.info()) print("\nSummary statistics:") print(df.describe()) print("\nFirst 2 rows:") print(df.head(2)) print("\nShape (rows, columns):", df.shape) print("Column names:", df.columns.tolist())

Converting Between Series and DataFrame

You can easily convert between Series and DataFrames:

conversions.py
import pandas as pd # Series to DataFrame s = pd.Series([1, 2, 3, 4], name='values') df_from_series = s.to_frame() print("Series converted to DataFrame:") print(df_from_series) # DataFrame column to Series df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]}) series_from_df = df['col1'] print("\nDataFrame column as Series:") print(series_from_df) print("Type:", type(series_from_df))

💡 Key Insight

In pandas, a DataFrame is essentially a collection of Series (columns). Each column is a Series, and operations on columns work on Series. Understanding this relationship helps you work more effectively with pandas!

🎉

Lesson Complete!

Great work! Continue to the next lesson.

main.py
📤 Output
Click "Run" to execute...