6 Pandas Operations for Beginners
Pandas is an open-source Python library mainly
used for data manipulation and analysis. It's built on top of the NumPy library
and provides high-performance, easy-to-use data structures and data analysis
tools for the Python programming language.
In this article, you'll learn how to perform 6
basic operations using Pandas.
Using Pandas Examples
You can run the examples in this article using
computational notebooks like Jupyter
Notebook, Google
Colab, etc. You can also run the examples by entering the code
directly into the Python interpreter in interactive mode.
If you want to have a look at the complete
source code used in this article, you can access the Python Notebook file from
this GitHub
repository.
1. How to
Import Pandas as pd and Print the Version Number
You need to use the import keyword to import
any library in Python. Pandas is typically imported under the pd alias. With this
approach, you can refer to the Pandas package as pd instead
of pandas.
import pandas as pd
print(pd.__version__)
Output:
1.2.4
2. How to Create a Series in
Pandas
Pandas Series is a one-dimensional array that
holds data of any type. It's like a column in a table. You can create a series
using numpy arrays, numpy functions, lists, dictionaries, scalar values, etc.
The values of the series are labeled with their
index number. By default, the first value has index 0, the second value has
index 1, and so on. In order to name your own labels, you need to use the index argument.
How to Create an Empty Series
s = pd.Series(dtype='float64')
s
Output:
Series([], dtype: float64)
In the above example, an empty series with the float data type is
created.
How to
Create a Series Using NumPy Array
import pandas as pd
import numpy as np
d = np.array([1, 2, 3, 4, 5])
s = pd.Series(d)
s
Output:
0 1
1 2
2 3
3 4
4 5
dtype: int32
How to Create a Series Using
List
d = [1, 2, 3, 4, 5]
s = pd.Series(d)
s
Output:
0 1
1 2
2 3
3 4
4 5
dtype: int64
How to Create a Series With
Index
In order to create a series with an index, you
need to use the index argument.
The number of indexes must be equal to the number of elements in the series.
d = [1, 2, 3, 4, 5]
s = pd.Series(d, index=["one", "two", "three", "four", "five"])
s
Output:
one 1
two 2
three 3
four 4
five 5
dtype: int64
How to
Create a Series Using Dictionary
The keys of the dictionary become the labels of
the series.
d = {"one" : 1,
"two" : 2,
"three" : 3,
"four" : 4,
"five" : 5}
s = pd.Series(d)
s
Output:
one 1
two 2
three 3
four 4
five 5
dtype: int64
How to
Create a Series Using Scalar Value
If you want to create a series using a scalar
value, you must provide the index argument.
s = pd.Series(1, index = ["a", "b", "c", "d"])
s
Output:
a 1
b 1
c 1
d 1
dtype: int64
3. How to
Create a Dataframe in Pandas
A DataFrame is a two-dimensional data structure
where data is aligned in the form of rows and columns. A DataFrame can be
created using dictionaries, lists, a list of dictionaries, numpy arrays, etc.
In the real world, DataFrames are created using existing storage like CSV
files, excel files, SQL databases, etc.
The DataFrame object supports a number of
attributes and methods. If you want to know more about them, you can check out
the official documentation of pandas
dataframe.
How to Create an Empty
DataFrame
df = pd.DataFrame()
print(df)
Output:
Empty DataFrame
Columns: []
Index: []
How to Create
a DataFrame Using List
listObj = ["MUO", "technology", "simplified"]
df = pd.DataFrame(listObj)
print(df)
Output:
0
0 MUO
1 technology
2 simplified
How to
Create a DataFrame Using Dictionary of ndarray/Lists
batmanData = {'Movie Name' : ['Batman Begins', 'The Dark Knight', 'The Dark Knight Rises'],
'Year of Release' : [2005, 2008, 2012]}
df = pd.DataFrame(batmanData)
print(df)
Output:
Movie Name Year of Release
0 Batman Begins 2005
1 The Dark Knight 2008
2 The Dark Knight Rises 2012
How to
Create a DataFrame Using List of Lists
data = [['Alex', 601], ['Bob', 602], ['Cataline', 603]]
df = pd.DataFrame(data, columns = ['Name', 'Roll No.'])
print(df)
Output:
Name Roll No.
0 Alex 601
1 Bob 602
2 Cataline 603
How to
Create a DataFrame Using List of Dictionaries
data = [{'Name': 'Alex', 'Roll No.': 601},
{'Name': 'Bob', 'Roll No.': 602},
{'Name': 'Cataline', 'Roll No.': 603}]
df = pd.DataFrame(data)
print(df)
Output:
Name Roll No.
0 Alex 601
1 Bob 602
2 Cataline 603
How to
Create a DataFrame Using zip() Function
Use the zip() function
to merge lists in Python.
Name = ['Alex', 'Bob', 'Cataline']
RollNo = [601, 602, 603]
listOfTuples = list(zip(Name, RollNo))
df = pd.DataFrame(listOfTuples, columns = ['Name', 'Roll No.'])
print(df)
Output:
Name Roll No.
0 Alex 601
1 Bob 602
2 Cataline 603
4. How to Read CSV Data in
Pandas
A "comma-separated values" (CSV) file
is a delimited text file that uses a comma to separate values. You can read a
CSV file using the read_csv() method
in pandas. If you want to print the entire DataFrame, use the to_string() method.
In this and the next examples, this CSV
file will be used to perform the operations.
df = pd.read_csv('https://raw.githubusercontent.com/Yuvrajchandra/Basic-Operations-Using-Pandas/main/biostats.csv')
print(df.to_string())
5. How to Analyze DataFrames
Using the head(), tail(), and info() Methods
How to View
Data Using the head() Method
The head() method
is one of the best ways to get a quick overview of the DataFrame. This method
returns the header and specified number of rows, starting from the top.
df = pd.read_csv('https://raw.githubusercontent.com/Yuvrajchandra/Basic-Operations-Using-Pandas/main/biostats.csv')
print(df.head(10))
If you don't specify the number of rows, the
first 5 rows will be returned.
df = pd.read_csv('https://raw.githubusercontent.com/Yuvrajchandra/Basic-Operations-Using-Pandas/main/biostats.csv')
print(df.head())
How to View Data Using the
tail() Method
The tail() method
returns the header and specified number of rows, starting from the bottom.
df = pd.read_csv('https://raw.githubusercontent.com/Yuvrajchandra/Basic-Operations-Using-Pandas/main/biostats.csv')
print(df.tail(10))
If you don't specify the number of rows, the
last 5 rows will be returned.
df = pd.read_csv('https://raw.githubusercontent.com/Yuvrajchandra/Basic-Operations-Using-Pandas/main/biostats.csv')
print(df.tail())
How to Get Info About the Data
The info() methods
return a brief summary of a DataFrame including the index dtype and column
dtypes, non-null values, and memory usage.
df = pd.read_csv('https://raw.githubusercontent.com/Yuvrajchandra/Basic-Operations-Using-Pandas/main/biostats.csv')
print(df.info())
6. How to Read JSON Data in
Pandas
JSON (JavaScript Object Notation)
is a lightweight data-interchange format. You can read a JSON file using the read_json() method
in pandas. If you want to print the entire DataFrame, use the to_string() method.
In the below example, this JSON
file is used to perform the operations.
df = pd.read_json('https://raw.githubusercontent.com/Yuvrajchandra/Basic-Operations-Using-Pandas/main/google_markers.json')
print(df.to_string())
Refresh Your Python Knowledge
With Inbuilt Functions and Methods
Functions help shorten your code and improve
its efficiency. Functions and methods like reduce(), split(), enumerate(), eval(), round(), etc.
can make your code robust and easy to understand. It's always good to know
about built-in functions and methods as they can simplify your programming
tasks to a great extent.