Pandas DataFrame Intro
September 15, 2020 by Jane
What is pandas DataFrame?
data structure in pandas API
What is a DataFrame?
- stores data in cells
- usually contains named column and rows
Import pandas module
import pandas as pd
Creating a DataFrame:
- create a NxM matrix
- create an array of strings that holds the column names
- dataframe = pd.DataFrame(data=matrix, columns=column_names)
Adding a new column to DataFrame
- assign values to a new column name
- E.g. dataframe['new_column_name'] = 2
- E.g. dataframe['new_column_name'] = dataframe['existing_column'] + offset
Selecting DataFrame columns:
- dataframe.head(n) // select the first n rows of the dataframe
- dataframe.iloc([[n]]) // select single row indicated by n
- dataframe[1:4] // select rows 1 - 4 exclusive (i.e. row 1, 2, 3)
- dataframe['column_name'] // select column with 'column_name'
Copying DataFrames
- Referencing: assign a DataFrame to a new variable, the changes in any of the variables will be reflected across
- Copying: using pd.DataFrame.copy to copy a deep copy