pandas
- stands for Python Data Analysis Library
- High performance library, with easy to use data structures and data analysis tools
- similar to a spreadsheet but with ore flexibility
- Used in various fields, including data science, machine learning, NNLP, big data, statistics, finance and many more
pandas Features
- has grouping and aggregation functions
- can join merge and reshape data
- built on top of numpy and matplotlib
- can import data from CSV, JSON, Excel, and others
pandas Analysis Process
data retrieval (set up the data) » selection » cleaning (chekin types, checking the blanks, and changing values) » analysis » grouping (group data) » plotting data
pandas Installation
pip install pandas
pip3 install pandas
pandas Main Data Structures
- Series
- Index
- DataFrame
Series
- A one-dimensional array
- Items share one type
- Items are indexed
- Similar to a Python dictionary
Index
- an immutable sequence for indexing and alignment
- Automatically generated when creating Series or DataFrames
- Facilitates houdn between Series and/or DataFrames
DataFrame
- pandas’ main data structure
- Two-dimensional grid similar to a database table
- Columns have labels and rows are indexed
- Similar to a list of lists/dictionaries in Python
DataFrame Example
screenshot
DataFrame Column Labels
Data Types
- objects (generalized type) it is like a pointer type and used for string type
- string type is a recent type and may not work with older versions
- categorial (is a list of user defined values, fixed numbered list of possible values)
- Numeric: integers, floats, boolean
- Datetime: date, timedelta, interval, period