Pandas is a powerful Python library used for data manipulation and analysis. It provides flexible data structures, such as DataFrames, that allow you to easily work with structured data. Pandas is commonly used for data wrangling, cleaning, and performing various data operations in a fast and efficient way.

Key Features

  • Data manipulation: Merge, filter, and transform data.
  • Data cleaning: Handle missing data and duplicate records.
  • Data visualization: Integrated with libraries like Matplotlib and Seaborn.
  • File I/O: Read/write CSV, Excel, JSON, and more.

Installation

To install Pandas, you can use pip:

pip install pandas

For Anaconda users, Pandas is already included, but you can update it using:

conda install pandas

Typical Use Cases

  • Data cleaning: Handling missing or duplicate data.
  • Data analysis: Aggregating and summarizing large datasets.
  • Data transformation: Reshaping and pivoting data for easy exploration.
  • File I/O: Loading data from external sources such as CSV and Excel files.

Importing Pandas

Before using Pandas, import it as follows:

import pandas as pd

Reading Data

  • From CSV:

     df = pd.read_csv('filename.csv')
    
  • From Excel:

     df = pd.read_excel('filename.xlsx')
    

Basic Operations

  • Display first n rows:

     df.head(n)
    
  • Display last n rows:

     df.tail(n)
    
  • Summary statistics:

     df.describe()
    
  • Select a single column:

     df['column_name']
    
  • Select multiple columns:

     df[['column_name1', 'column_name2']]
    
  • Filter rows based on condition:

     df[df['column_name'] > value]
    
  • Sort by column:

     df.sort_values('column_name')
    
  • Group by column:

     df.groupby('column_name').mean()
    

Data Cleaning

  • Check for missing values:

     df.isnull()
    
  • Drop rows with missing values:

     df.dropna()
    
  • Fill missing values:

     df.fillna(value)
    
  • Rename columns:

     df.rename(columns={'old_name': 'new_name'})
    
  • Drop a column:

     df.drop('column_name', axis=1)
    
  • Drop duplicate rows:

     df.drop_duplicates()
    

Data Manipulation

  • Create a new column:

     df['new_column'] = df['column1'] + df['column2']
    
  • Apply a function to a column:

     df['column'] = df['column'].apply(function_name)
    
  • Replace values in a column:

     df['column'] = df['column'].replace(value1, value2)
    
  • Merge two DataFrames:

     merged_df = pd.merge(df1, df2, on='column_name')
    
  • Pivot a table:

     df.pivot(index='index_column', columns='column_to_pivot', values='values_to_show')