Pandas DataFrame Intro
September 15, 2020 by Jane

What is pandas DataFrame?

data structure in pandas API

 

What is a DataFrame?

  • stores data in cells
  • usually contains named column and rows

 

Import pandas module

import pandas as pd

 

Creating a DataFrame:

  1. create a NxM matrix
  2. create an array of strings that holds the column names
  3. dataframe = pd.DataFrame(data=matrix, columns=column_names)

 

Adding a new column to DataFrame

  • assign values to a new column name
  • E.g. dataframe['new_column_name'] = 2
  • E.g. dataframe['new_column_name'] = dataframe['existing_column'] + offset

 

Selecting DataFrame columns:

  • dataframe.head(n) // select the first n rows of the dataframe
  • dataframe.iloc([[n]]) // select single row indicated by n 
  • dataframe[1:4] // select rows 1 - 4 exclusive (i.e. row 1, 2, 3)
  • dataframe['column_name'] // select column with 'column_name'

 

Copying DataFrames

  • Referencing: assign a DataFrame to a new variable, the changes in any of the variables will be reflected across
  • Copying: using pd.DataFrame.copy to copy a deep copy