Introduction to NumPy
NumPy (Numerical Python) is the foundation of numerical computing in Python and is essential for machine learning. It provides powerful N-dimensional array objects and tools for working with these arrays efficiently.
Unlike Python lists, NumPy arrays are homogeneous (all elements have the same type) and are stored in contiguous memory, making operations much faster. This is crucial when working with large datasets in machine learning.
Why NumPy for Machine Learning?
NumPy is the backbone of most ML libraries because:
- Speed: Operations are implemented in C, making them much faster than Python loops
- Memory Efficiency: Arrays use less memory than Python lists
- Vectorization: Perform operations on entire arrays without explicit loops
- Integration: Works seamlessly with pandas, scikit-learn, and other ML libraries
Creating NumPy Arrays
The most common way to create arrays is using np.array(). You can create arrays from Python lists:
import numpy as np
arr1d = np.array([1, 2, 3, 4, 5])
print("1D Array:", arr1d)
print("Shape:", arr1d.shape)
print("Dimensions:", arr1d.ndim)
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
print("\n2D Array:")
print(arr2d)
print("Shape:", arr2d.shape)
print("Dimensions:", arr2d.ndim)
arr3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print("\n3D Array shape:", arr3d.shape)
Array Operations
NumPy allows you to perform mathematical operations on entire arrays efficiently:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print("Original:", arr)
print("Multiply by 2:", arr * 2)
print("Add 10:", arr + 10)
print("Square:", arr ** 2)
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
print("\nArray 1:", arr1)
print("Array 2:", arr2)
print("Sum:", arr1 + arr2)
print("Product:", arr1 * arr2)
print("Dot product:", np.dot(arr1, arr2))
Useful Array Functions
NumPy provides many useful functions for array manipulation:
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6])
print("Array:", arr)
print("Sum:", np.sum(arr))
print("Mean:", np.mean(arr))
print("Max:", np.max(arr))
print("Min:", np.min(arr))
print("Standard deviation:", np.std(arr))
arr = np.array([1, 2, 3, 4, 5, 6])
print("\nOriginal shape:", arr.shape)
reshaped = arr.reshape(2, 3)
print("Reshaped to (2, 3):")
print(reshaped)
Array Indexing and Slicing
NumPy arrays support powerful indexing and slicing operations:
import numpy as np
arr = np.array([10, 20, 30, 40, 50, 60, 70, 80])
print("First element:", arr[0])
print("Last element:", arr[-1])
print("First 3 elements:", arr[:3])
print("Last 3 elements:", arr[-3:])
print("Middle elements:", arr[2:5])
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("\nMatrix:")
print(matrix)
print("Element at [1, 2]:", matrix[1, 2])
print("First row:", matrix[0, :])
print("Second column:", matrix[:, 1])
Creating Arrays with Built-in Functions
NumPy provides convenient functions to create arrays with specific patterns:
import numpy as np
zeros = np.zeros((3, 4))
print("Zeros array (3x4):")
print(zeros)
ones = np.ones((2, 3))
print("\nOnes array (2x3):")
print(ones)
range_arr = np.arange(0, 10, 2)
print("\nRange array:", range_arr)
linspace_arr = np.linspace(0, 1, 5)
print("Linspace array:", linspace_arr)
random_arr = np.random.rand(3, 3)
print("\nRandom array:")
print(random_arr)
💡 Key Takeaway
NumPy arrays are the foundation of machine learning in Python. Most ML libraries (pandas, scikit-learn, TensorFlow) use NumPy arrays internally. Mastering NumPy will make learning these libraries much easier!