Python Pandas - DataFrame
Exercise - Python Pandas DataFrame
import pandas as pd
import numpy as np
- Write a Pandas program to create and display a DataFrame from a specified dictionary data.
data1= {'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]}
df1=pd.DataFrame(data1)
print(df1)
X Y Z
0 78 84 86
1 85 94 97
2 96 89 96
3 80 83 72
4 86 86 83
- Write a Pandas program to create and display a DataFrame from a specified dictionary data which has the index labels.
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
df2=pd.DataFrame(exam_data,index=labels,columns=['attempts','name','qualify','score'])
df2
attempts | name | qualify | score | |
---|---|---|---|---|
a | 1 | Anastasia | yes | 12.5 |
b | 3 | Dima | no | 9.0 |
c | 2 | Katherine | yes | 16.5 |
d | 3 | James | no | NaN |
e | 2 | Emily | no | 9.0 |
f | 3 | Michael | yes | 20.0 |
g | 1 | Matthew | yes | 14.5 |
h | 1 | Laura | no | NaN |
i | 2 | Kevin | no | 8.0 |
j | 1 | Jonas | yes | 19.0 |
- Write a Pandas program to display a summary of the basic information about a specified DataFrame and its data.
df2.info()
<class 'pandas.core.frame.DataFrame'>
Index: 10 entries, a to j
Data columns (total 4 columns):
attempts 10 non-null int64
name 10 non-null object
qualify 10 non-null object
score 8 non-null float64
dtypes: float64(1), int64(1), object(2)
memory usage: 400.0+ bytes
df2.describe()
attempts | score | |
---|---|---|
count | 10.000000 | 8.000000 |
mean | 1.900000 | 13.562500 |
std | 0.875595 | 4.693746 |
min | 1.000000 | 8.000000 |
25% | 1.000000 | 9.000000 |
50% | 2.000000 | 13.500000 |
75% | 2.750000 | 17.125000 |
max | 3.000000 | 20.000000 |
- Write a Pandas program to get the first 3 rows of a given DataFrame.
df2.head(3)
attempts | name | qualify | score | |
---|---|---|---|---|
a | 1 | Anastasia | yes | 12.5 |
b | 3 | Dima | no | 9.0 |
c | 2 | Katherine | yes | 16.5 |
-
- Write a Pandas program to select the ‘name’ and ‘score’ columns from the above DataFrame.
df2[['name','score']]
name | score | |
---|---|---|
a | Anastasia | 12.5 |
b | Dima | 9.0 |
c | Katherine | 16.5 |
d | James | NaN |
e | Emily | 9.0 |
f | Michael | 20.0 |
g | Matthew | 14.5 |
h | Laura | NaN |
i | Kevin | 8.0 |
j | Jonas | 19.0 |
- Write a Pandas program to select the specified columns and rows from a given data frame.
df2[['name','score']].iloc[[1,3,5,6],:]
name | score | |
---|---|---|
b | Dima | 9.0 |
d | James | NaN |
f | Michael | 20.0 |
g | Matthew | 14.5 |
- Write a Pandas program to select the rows where the number of attempts in the examination is greater than 2.
df2[df2.attempts>2]
attempts | name | qualify | score | |
---|---|---|---|---|
b | 3 | Dima | no | 9.0 |
d | 3 | James | no | NaN |
f | 3 | Michael | yes | 20.0 |
- Write a Pandas program to count the number of rows and columns of a DataFrame.
df2.columns
Index(['attempts', 'name', 'qualify', 'score'], dtype='object')
print("Number of rows : {} \nNumber of columns : {}".format(df2.shape[0],df2.shape[1]))
Number of rows : 10
Number of columns : 4
print("Number of rows : {} \nNumber of columns : {}".format(len(df2),len(df2.columns)))
Number of rows : 10
Number of columns : 4
- Write a Pandas program to select the rows where the score is missing, i.e. is NaN.
df2[df2.score.isna()]
attempts | name | qualify | score | |
---|---|---|---|---|
d | 3 | James | no | NaN |
h | 1 | Laura | no | NaN |
- Write a Pandas program to select the rows the score is between 15 and 20 (inclusive).
df2[(df2.score>=15) & (df2.score<=20)]
attempts | name | qualify | score | |
---|---|---|---|---|
c | 2 | Katherine | yes | 16.5 |
f | 3 | Michael | yes | 20.0 |
j | 1 | Jonas | yes | 19.0 |
- Write a Pandas program to select the rows where number of attempts in the examination is less than 2 and score greater than 15.
df2[(df2.attempts<2) & (df2.score>15)]
attempts | name | qualify | score | |
---|---|---|---|---|
j | 1 | Jonas | yes | 19.0 |
- Write a Pandas program to change the score in row ‘d’ to 11.5.
df12=df2.copy()
df12.loc['d','score']=11.5
df12.loc['d']
attempts 3
name James
qualify no
score 11.5
Name: d, dtype: object
- Write a Pandas program to calculate the sum of the examination attempts by the students.
df2.attempts.sum()
19
- Write a Pandas program to calculate the mean score for each different student in DataFrame.
df2.score.mean()
13.5625
- Write a Pandas program to append a new row ‘k’ to data frame with given values for each column. Now delete the new row and return the original DataFrame.
df15=df2.copy()
df15.loc['k']=[1,'Suresh','yes',15.5]
df15
attempts | name | qualify | score | |
---|---|---|---|---|
a | 1 | Anastasia | yes | 12.5 |
b | 3 | Dima | no | 9.0 |
c | 2 | Katherine | yes | 16.5 |
d | 3 | James | no | NaN |
e | 2 | Emily | no | 9.0 |
f | 3 | Michael | yes | 20.0 |
g | 1 | Matthew | yes | 14.5 |
h | 1 | Laura | no | NaN |
i | 2 | Kevin | no | 8.0 |
j | 1 | Jonas | yes | 19.0 |
k | 1 | Suresh | yes | 15.5 |
- Write a Pandas program to sort the DataFrame first by ‘name’ in descending order, then by ‘score’ in ascending order.
df2.sort_values(['name','score'],ascending=[False,True])
attempts | name | qualify | score | |
---|---|---|---|---|
f | 3 | Michael | yes | 20.0 |
g | 1 | Matthew | yes | 14.5 |
h | 1 | Laura | no | NaN |
i | 2 | Kevin | no | 8.0 |
c | 2 | Katherine | yes | 16.5 |
j | 1 | Jonas | yes | 19.0 |
d | 3 | James | no | NaN |
e | 2 | Emily | no | 9.0 |
b | 3 | Dima | no | 9.0 |
a | 1 | Anastasia | yes | 12.5 |
- Write a Pandas program to replace the ‘qualify’ column contains the values ‘yes’ and ‘no’ with True and False.
df17=df2.copy()
df17.qualify=df17.qualify.apply(lambda x:True if x=='yes' else False)
df17
attempts | name | qualify | score | |
---|---|---|---|---|
a | 1 | Anastasia | True | 12.5 |
b | 3 | Dima | False | 9.0 |
c | 2 | Katherine | True | 16.5 |
d | 3 | James | False | NaN |
e | 2 | Emily | False | 9.0 |
f | 3 | Michael | True | 20.0 |
g | 1 | Matthew | True | 14.5 |
h | 1 | Laura | False | NaN |
i | 2 | Kevin | False | 8.0 |
j | 1 | Jonas | True | 19.0 |
- Write a Pandas program to change the name ‘James’ to ‘Suresh’ in name column of the DataFrame.
df18=df2.copy()
df18.name=df18.name.replace('James','Suresh')
df18
attempts | name | qualify | score | |
---|---|---|---|---|
a | 1 | Anastasia | yes | 12.5 |
b | 3 | Dima | no | 9.0 |
c | 2 | Katherine | yes | 16.5 |
d | 3 | Suresh | no | NaN |
e | 2 | Emily | no | 9.0 |
f | 3 | Michael | yes | 20.0 |
g | 1 | Matthew | yes | 14.5 |
h | 1 | Laura | no | NaN |
i | 2 | Kevin | no | 8.0 |
j | 1 | Jonas | yes | 19.0 |
- Write a Pandas program to delete the ‘attempts’ column from the DataFrame.
df19=df2.copy()
df19.drop('attempts',axis=1)
name | qualify | score | |
---|---|---|---|
a | Anastasia | yes | 12.5 |
b | Dima | no | 9.0 |
c | Katherine | yes | 16.5 |
d | James | no | NaN |
e | Emily | no | 9.0 |
f | Michael | yes | 20.0 |
g | Matthew | yes | 14.5 |
h | Laura | no | NaN |
i | Kevin | no | 8.0 |
j | Jonas | yes | 19.0 |
- Write a Pandas program to insert a new column in existing DataFrame.
df20=df2.copy()
color = ['Red','Blue','Orange','Red','White','White','Blue','Green','Green','Red']
df20['color']=color
df20
attempts | name | qualify | score | color | |
---|---|---|---|---|---|
a | 1 | Anastasia | yes | 12.5 | Red |
b | 3 | Dima | no | 9.0 | Blue |
c | 2 | Katherine | yes | 16.5 | Orange |
d | 3 | James | no | NaN | Red |
e | 2 | Emily | no | 9.0 | White |
f | 3 | Michael | yes | 20.0 | White |
g | 1 | Matthew | yes | 14.5 | Blue |
h | 1 | Laura | no | NaN | Green |
i | 2 | Kevin | no | 8.0 | Green |
j | 1 | Jonas | yes | 19.0 | Red |
- Write a Pandas program to iterate over rows in a DataFrame.
exam_data = [{'name':'Anastasia', 'score':12.5}, {'name':'Dima','score':9}, {'name':'Katherine','score':16.5}]
df21=pd.DataFrame(exam_data)
for index,row in df21.iterrows():
print(row['name'],row['score'])
Anastasia 12.5
Dima 9.0
Katherine 16.5
- Write a Pandas program to get list from DataFrame column headers.
df2.columns.tolist()
['attempts', 'name', 'qualify', 'score']
- Write a Pandas program to rename columns of a given DataFrame.
df23= pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6], 'col3': [7, 8, 9]})
df23
col1 | col2 | col3 | |
---|---|---|---|
0 | 1 | 4 | 7 |
1 | 2 | 5 | 8 |
2 | 3 | 6 | 9 |
df23.columns=['Column1','Column2','Column3']
df23
Column1 | Column2 | Column3 | |
---|---|---|---|
0 | 1 | 4 | 7 |
1 | 2 | 5 | 8 |
2 | 3 | 6 | 9 |
df23=df23.rename(columns={'Column1':'Columns1'})
df23
Columns1 | Column2 | Column3 | |
---|---|---|---|
0 | 1 | 4 | 7 |
1 | 2 | 5 | 8 |
2 | 3 | 6 | 9 |
- Write a Pandas program to select rows from a given DataFrame based on values in some columns.
df24= pd.DataFrame({'col1': [1, 4,3,4,5], 'col2': [4, 5, 6,7,8], 'col3': [7, 8, 9,0,1]})
df24[df24.col1==4]
col1 | col2 | col3 | |
---|---|---|---|
1 | 4 | 5 | 8 |
3 | 4 | 7 | 0 |
- Write a Pandas program to change the order of a DataFrame columns.
df25=df24[['col3','col2','col1']]
df25
col3 | col2 | col1 | |
---|---|---|---|
0 | 7 | 4 | 1 |
1 | 8 | 5 | 4 |
2 | 9 | 6 | 3 |
3 | 0 | 7 | 4 |
4 | 1 | 8 | 5 |
- Write a Pandas program to add one row in an existing DataFrame.
df26=df24.copy()
df26.loc[len(df26)]=[10,11,12]
df26
col1 | col2 | col3 | |
---|---|---|---|
0 | 1 | 4 | 7 |
1 | 4 | 5 | 8 |
2 | 3 | 6 | 9 |
3 | 4 | 7 | 0 |
4 | 5 | 8 | 1 |
5 | 10 | 11 | 12 |
- Write a Pandas program to write a DataFrame to CSV file using tab separator.
df24.to_csv('27.csv',sep='\t',index=False)
df27=pd.read_csv('27.csv')
df27
col1\tcol2\tcol3 | |
---|---|
0 | 1\t4\t7 |
1 | 4\t5\t8 |
2 | 3\t6\t9 |
3 | 4\t7\t0 |
4 | 5\t8\t1 |
- Write a Pandas program to count city wise number of people from a given of data set (city, name of the person).
df28 = pd.DataFrame({'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
'city': ['California', 'Los Angeles', 'California', 'California', 'California', 'Los Angeles', 'Los Angeles', 'Georgia', 'Georgia', 'Los Angeles']})
df28
name | city | |
---|---|---|
0 | Anastasia | California |
1 | Dima | Los Angeles |
2 | Katherine | California |
3 | James | California |
4 | Emily | California |
5 | Michael | Los Angeles |
6 | Matthew | Los Angeles |
7 | Laura | Georgia |
8 | Kevin | Georgia |
9 | Jonas | Los Angeles |
df28.groupby('city').size().reset_index(name='Number of people')
city | Number of people | |
---|---|---|
0 | California | 4 |
1 | Georgia | 2 |
2 | Los Angeles | 4 |
- Write a Pandas program to delete DataFrame row(s) based on given column value.
df29=df24.copy()
df29=df29[df29.col2!=5]
df29
col1 | col2 | col3 | |
---|---|---|---|
0 | 1 | 4 | 7 |
2 | 3 | 6 | 9 |
3 | 4 | 7 | 0 |
4 | 5 | 8 | 1 |
- Write a Pandas program to widen output display to see more columns.
pd.set_option('display.max_rows',500)
pd.set_option('display.max_columns',500)
pd.set_option('display.width',1000)
print(df24)
col1 col2 col3
0 1 4 7
1 4 5 8
2 3 6 9
3 4 7 0
4 5 8 1
- Write a Pandas program to select a row of series/dataframe by given integer index.
df24.loc[[2]]
col1 | col2 | col3 | |
---|---|---|---|
2 | 3 | 6 | 9 |
- Write a Pandas program to replace all the NaN values with Zero’s in a column of a dataframe.
df32=df2.copy()
df32.fillna(0)
attempts | name | qualify | score | |
---|---|---|---|---|
a | 1 | Anastasia | yes | 12.5 |
b | 3 | Dima | no | 9.0 |
c | 2 | Katherine | yes | 16.5 |
d | 3 | James | no | 0.0 |
e | 2 | Emily | no | 9.0 |
f | 3 | Michael | yes | 20.0 |
g | 1 | Matthew | yes | 14.5 |
h | 1 | Laura | no | 0.0 |
i | 2 | Kevin | no | 8.0 |
j | 1 | Jonas | yes | 19.0 |
- Write a Pandas program to convert index in a column of the given dataframe.
df33=df2.copy()
df33.reset_index()
index | attempts | name | qualify | score | |
---|---|---|---|---|---|
0 | a | 1 | Anastasia | yes | 12.5 |
1 | b | 3 | Dima | no | 9.0 |
2 | c | 2 | Katherine | yes | 16.5 |
3 | d | 3 | James | no | NaN |
4 | e | 2 | Emily | no | 9.0 |
5 | f | 3 | Michael | yes | 20.0 |
6 | g | 1 | Matthew | yes | 14.5 |
7 | h | 1 | Laura | no | NaN |
8 | i | 2 | Kevin | no | 8.0 |
9 | j | 1 | Jonas | yes | 19.0 |
print(df33.reset_index().to_string(index=False))
index attempts name qualify score
a 1 Anastasia yes 12.5
b 3 Dima no 9.0
c 2 Katherine yes 16.5
d 3 James no NaN
e 2 Emily no 9.0
f 3 Michael yes 20.0
g 1 Matthew yes 14.5
h 1 Laura no NaN
i 2 Kevin no 8.0
j 1 Jonas yes 19.0
#Back to original be setting index once again
df33.reset_index().set_index('index')
attempts | name | qualify | score | |
---|---|---|---|---|
index | ||||
a | 1 | Anastasia | yes | 12.5 |
b | 3 | Dima | no | 9.0 |
c | 2 | Katherine | yes | 16.5 |
d | 3 | James | no | NaN |
e | 2 | Emily | no | 9.0 |
f | 3 | Michael | yes | 20.0 |
g | 1 | Matthew | yes | 14.5 |
h | 1 | Laura | no | NaN |
i | 2 | Kevin | no | 8.0 |
j | 1 | Jonas | yes | 19.0 |
- Write a Pandas program to set a given value for particular cell in DataFrame using index value.
df2
attempts | name | qualify | score | |
---|---|---|---|---|
a | 1 | Anastasia | yes | 12.5 |
b | 3 | Dima | no | 9.0 |
c | 2 | Katherine | yes | 16.5 |
d | 3 | James | no | NaN |
e | 2 | Emily | no | 9.0 |
f | 3 | Michael | yes | 20.0 |
g | 1 | Matthew | yes | 14.5 |
h | 1 | Laura | no | NaN |
i | 2 | Kevin | no | 8.0 |
j | 1 | Jonas | yes | 19.0 |
df34=df2.copy()
df34.loc['i',:].score=10.2
C:\Users\a112471\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py:5208: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[name] = value
df34
attempts | name | qualify | score | |
---|---|---|---|---|
a | 1 | Anastasia | yes | 12.5 |
b | 3 | Dima | no | 9.0 |
c | 2 | Katherine | yes | 16.5 |
d | 3 | James | no | NaN |
e | 2 | Emily | no | 9.0 |
f | 3 | Michael | yes | 20.0 |
g | 1 | Matthew | yes | 14.5 |
h | 1 | Laura | no | NaN |
i | 2 | Kevin | no | 8.0 |
j | 1 | Jonas | yes | 19.0 |
df34.set_value('i','score',10.2)
C:\Users\a112471\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: set_value is deprecated and will be removed in a future release. Please use .at[] or .iat[] accessors instead
"""Entry point for launching an IPython kernel.
attempts | name | qualify | score | |
---|---|---|---|---|
a | 1 | Anastasia | yes | 12.5 |
b | 3 | Dima | no | 9.0 |
c | 2 | Katherine | yes | 16.5 |
d | 3 | James | no | NaN |
e | 2 | Emily | no | 9.0 |
f | 3 | Michael | yes | 20.0 |
g | 1 | Matthew | yes | 14.5 |
h | 1 | Laura | no | NaN |
i | 2 | Kevin | no | 10.2 |
j | 1 | Jonas | yes | 19.0 |
df34.at['i','score']=10.2
df34
attempts | name | qualify | score | |
---|---|---|---|---|
a | 1 | Anastasia | yes | 12.5 |
b | 3 | Dima | no | 9.0 |
c | 2 | Katherine | yes | 16.5 |
d | 3 | James | no | NaN |
e | 2 | Emily | no | 9.0 |
f | 3 | Michael | yes | 20.0 |
g | 1 | Matthew | yes | 14.5 |
h | 1 | Laura | no | NaN |
i | 2 | Kevin | no | 10.2 |
j | 1 | Jonas | yes | 19.0 |
df34.iat[8,3]
10.2
- Write a Pandas program to count the NaN values in one or more columns in DataFrame.
df2.isna().sum().sum()
2
- Write a Pandas program to drop a list of rows from a specified DataFrame.
df36=df24.copy()
df36.drop([2,4],axis=0)
col1 | col2 | col3 | |
---|---|---|---|
0 | 1 | 4 | 7 |
1 | 4 | 5 | 8 |
3 | 4 | 7 | 0 |
- Write a Pandas program to reset index in a given DataFrame.
df37=df2.copy()
df37.drop(['a','b']).reset_index()
index | attempts | name | qualify | score | |
---|---|---|---|---|---|
0 | c | 2 | Katherine | yes | 16.5 |
1 | d | 3 | James | no | NaN |
2 | e | 2 | Emily | no | 9.0 |
3 | f | 3 | Michael | yes | 20.0 |
4 | g | 1 | Matthew | yes | 14.5 |
5 | h | 1 | Laura | no | NaN |
6 | i | 2 | Kevin | no | 8.0 |
7 | j | 1 | Jonas | yes | 19.0 |
df37
attempts | name | qualify | score | |
---|---|---|---|---|
a | 1 | Anastasia | yes | 12.5 |
b | 3 | Dima | no | 9.0 |
c | 2 | Katherine | yes | 16.5 |
d | 3 | James | no | NaN |
e | 2 | Emily | no | 9.0 |
f | 3 | Michael | yes | 20.0 |
g | 1 | Matthew | yes | 14.5 |
h | 1 | Laura | no | NaN |
i | 2 | Kevin | no | 8.0 |
j | 1 | Jonas | yes | 19.0 |
- Write a Pandas program to divide a DataFrame in a given ratio.
df38=pd.DataFrame(np.random.randn(10,2))
df38
0 | 1 | |
---|---|---|
0 | 1.248443 | -1.241437 |
1 | 1.350421 | -1.539320 |
2 | -0.118075 | 1.586380 |
3 | 0.126995 | 1.419173 |
4 | -0.062786 | 0.070672 |
5 | -0.077202 | 0.442252 |
6 | -0.547482 | 0.472672 |
7 | -0.154437 | -0.138206 |
8 | 0.634787 | 1.480123 |
9 | -0.029771 | 0.834099 |
df38_70per=df38.sample(frac=.7,random_state=0)
df38_30per=df38.drop(df38_70per.index)
df38_70per
0 | 1 | |
---|---|---|
2 | -0.118075 | 1.586380 |
8 | 0.634787 | 1.480123 |
4 | -0.062786 | 0.070672 |
9 | -0.029771 | 0.834099 |
1 | 1.350421 | -1.539320 |
6 | -0.547482 | 0.472672 |
7 | -0.154437 | -0.138206 |
df38_30per
0 | 1 | |
---|---|---|
0 | 1.248443 | -1.241437 |
3 | 0.126995 | 1.419173 |
5 | -0.077202 | 0.442252 |
- Write a Pandas program to combining two series into a DataFrame.
s39_1 = pd.Series(['100', '200', 'python', '300.12', '400'])
s39_2 = pd.Series(['10', '20', 'php', '30.12', '40'])
pd.concat([s39_1,s39_2],axis=1)
0 | 1 | |
---|---|---|
0 | 100 | 10 |
1 | 200 | 20 |
2 | python | php |
3 | 300.12 | 30.12 |
4 | 400 | 40 |
- Write a Pandas program to shuffle a given DataFrame rows.
df40=df2.copy()
df40.sample(frac=1)
attempts | name | qualify | score | |
---|---|---|---|---|
g | 1 | Matthew | yes | 14.5 |
f | 3 | Michael | yes | 20.0 |
d | 3 | James | no | NaN |
a | 1 | Anastasia | yes | 12.5 |
e | 2 | Emily | no | 9.0 |
h | 1 | Laura | no | NaN |
i | 2 | Kevin | no | 8.0 |
c | 2 | Katherine | yes | 16.5 |
j | 1 | Jonas | yes | 19.0 |
b | 3 | Dima | no | 9.0 |
df40
attempts | name | qualify | score | |
---|---|---|---|---|
a | 1 | Anastasia | yes | 12.5 |
b | 3 | Dima | no | 9.0 |
c | 2 | Katherine | yes | 16.5 |
d | 3 | James | no | NaN |
e | 2 | Emily | no | 9.0 |
f | 3 | Michael | yes | 20.0 |
g | 1 | Matthew | yes | 14.5 |
h | 1 | Laura | no | NaN |
i | 2 | Kevin | no | 8.0 |
j | 1 | Jonas | yes | 19.0 |
- Write a Pandas program to convert DataFrame column type from string to datetime.
s41 = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000'])
s41
0 3/11/2000
1 3/12/2000
2 3/13/2000
dtype: object
pd.to_datetime(s41)
0 2000-03-11
1 2000-03-12
2 2000-03-13
dtype: datetime64[ns]
- Write a Pandas program to rename a specific column name in a given DataFrame.
df42= pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6], 'col3': [7, 8, 9]})
df42.rename(columns={'col2':'column2'})
col1 | column2 | col3 | |
---|---|---|---|
0 | 1 | 4 | 7 |
1 | 2 | 5 | 8 |
2 | 3 | 6 | 9 |
- Write a Pandas program to get a list of a specified column of a DataFrame.
df42.col2.tolist()
[4, 5, 6]
- Write a Pandas program to create a DataFrame from a Numpy array and specify the index column and column headers.
pd.DataFrame(np.random.randn(30).reshape(10,3),index=np.linspace(0,9,10,dtype='int'),columns=['column1','columns2','columns3'])
column1 | columns2 | columns3 | |
---|---|---|---|
0 | -0.345403 | 2.167887 | 0.495698 |
1 | 1.146200 | -1.078212 | 0.231268 |
2 | -0.619157 | 1.287958 | 0.216466 |
3 | -0.859700 | -0.735332 | 0.437529 |
4 | 0.395431 | 1.002545 | -0.999019 |
5 | -0.474449 | -0.106919 | -2.089148 |
6 | 0.375528 | 0.173526 | -1.103849 |
7 | -1.468913 | 1.206882 | -0.983365 |
8 | -0.751145 | -0.289252 | -0.346848 |
9 | 0.923659 | 1.038505 | -1.394882 |
- Write a Pandas program to find the row for where the value of a given column is maximum.
df45 = pd.DataFrame({'col1': [1, 2, 3, 4, 7], 'col2': [4, 5, 6, 9, 5], 'col3': [7, 8, 12, 1, 11]})
df45
col1 | col2 | col3 | |
---|---|---|---|
0 | 1 | 4 | 7 |
1 | 2 | 5 | 8 |
2 | 3 | 6 | 12 |
3 | 4 | 9 | 1 |
4 | 7 | 5 | 11 |
#Solution 1 (not efficient)
df45[df45.col1==df45.col1.max()].index[0]
4
df45['col1'].idxmax()
4
df45['col2'].idxmax()
3
df45['col3'].idxmax()
2
- Write a Pandas program to check whether a given column is present in a DataFrame or not.
col='Col4'
if col in df45.columns.tolist():
print(col,'present in DataFrame')
else:
print(col,'not present in DataFrame')
Col4 not present in DataFrame
- Write a Pandas program to get the specified row value of a given DataFrame.
df45.iloc[0]
col1 1
col2 4
col3 7
Name: 0, dtype: int64
- Write a Pandas program to get the datatypes of columns of a DataFrame.
df2.dtypes
attempts int64
name object
qualify object
score float64
dtype: object
- Write a Pandas program to append data to an empty DataFrame.
df49=pd.DataFrame()
d49=pd.DataFrame({'col1':range(3),'col2':range(3)})
df49.append(d49)
col1 | col2 | |
---|---|---|
0 | 0 | 0 |
1 | 1 | 1 |
2 | 2 | 2 |
- Write a Pandas program to sort a given DataFrame by two or more columns.
df50=df2.copy()
df50.sort_values(['attempts','name'],ascending=[True,False])
attempts | name | qualify | score | |
---|---|---|---|---|
g | 1 | Matthew | yes | 14.5 |
h | 1 | Laura | no | NaN |
j | 1 | Jonas | yes | 19.0 |
a | 1 | Anastasia | yes | 12.5 |
i | 2 | Kevin | no | 8.0 |
c | 2 | Katherine | yes | 16.5 |
e | 2 | Emily | no | 9.0 |
f | 3 | Michael | yes | 20.0 |
d | 3 | James | no | NaN |
b | 3 | Dima | no | 9.0 |
- Write a Pandas program to convert the datatype of a given column (floats to ints).
df51=df2.copy()
df51.dtypes
attempts int64
name object
qualify object
score float64
dtype: object
df51.fillna(0).astype({'score':int}).dtypes
attempts int64
name object
qualify object
score int32
dtype: object
- Write a Pandas program to remove infinite values from a given DataFrame.
df52 = pd.DataFrame([1000, 2000, 3000, -4000, np.inf, -np.inf])
df52
0 | |
---|---|
0 | 1000.0 |
1 | 2000.0 |
2 | 3000.0 |
3 | -4000.0 |
4 | inf |
5 | -inf |
df52.replace([np.inf,-np.inf],np.NaN)
0 | |
---|---|
0 | 1000.0 |
1 | 2000.0 |
2 | 3000.0 |
3 | -4000.0 |
4 | NaN |
5 | NaN |
- Write a Pandas program to insert a given column at a specific column index in a DataFrame.
df53 = pd.DataFrame({'col2': [4, 5, 6, 9, 5], 'col3': [7, 8, 12, 1, 11]})
df53
col2 | col3 | |
---|---|---|
0 | 4 | 7 |
1 | 5 | 8 |
2 | 6 | 12 |
3 | 9 | 1 |
4 | 5 | 11 |
col1= [1, 2, 3, 4, 7]
idx=0
df53.insert(loc=idx,column='col1',value=col1)
df53
col1 | col2 | col3 | |
---|---|---|---|
0 | 1 | 4 | 7 |
1 | 2 | 5 | 8 |
2 | 3 | 6 | 12 |
3 | 4 | 9 | 1 |
4 | 7 | 5 | 11 |
- Write a Pandas program to convert a given list of lists into a Dataframe.
d54=[[2, 4], [1, 3]]
pd.DataFrame(d54,columns=['col1','col2'])
col1 | col2 | |
---|---|---|
0 | 2 | 4 |
1 | 1 | 3 |
- Write a Pandas program to group by the first column and get second column as lists in rows.
df55=pd.DataFrame({'col1':['C1','C1','C2','C2','C2','C3','C2'],'col2':[1,2,3,3,4,6,5]})
g55=df55.groupby('col1')['col2']
#Wrong Solution (or not gettring desired result)
for ele in g55:
print(ele[1])
0 1
1 2
Name: col2, dtype: int64
2 3
3 3
4 4
6 5
Name: col2, dtype: int64
5 6
Name: col2, dtype: int64
#Correct Solution
df55.groupby('col1')['col2'].apply(list)
col1
C1 [1, 2]
C2 [3, 3, 4, 5]
C3 [6]
Name: col2, dtype: object
- Write a Pandas program to get column index from column name of a given DataFrame.
df45.columns.tolist().index('col2')
1
df45.columns.get_loc('col2')
1
- Write a Pandas program to count number of columns of a DataFrame.
df45.shape[1]
3
len(df45.columns)
3
- Write a Pandas program to select all columns, except one given column in a DataFrame.
#Solution 1
df45.drop('col3',axis=1)
col1 | col2 | |
---|---|---|
0 | 1 | 4 |
1 | 2 | 5 |
2 | 3 | 6 |
3 | 4 | 9 |
4 | 7 | 5 |
#Solution 2
df45.loc[:,df45.columns!='col3']
col1 | col2 | |
---|---|---|
0 | 1 | 4 |
1 | 2 | 5 |
2 | 3 | 6 |
3 | 4 | 9 |
4 | 7 | 5 |
- Write a Pandas program to get first n records of a DataFrame.
df45.head(3)
col1 | col2 | col3 | |
---|---|---|---|
0 | 1 | 4 | 7 |
1 | 2 | 5 | 8 |
2 | 3 | 6 | 12 |
- Write a Pandas program to get last n records of a DataFrame.
df45.iloc[-3:,:]
col1 | col2 | col3 | |
---|---|---|---|
2 | 3 | 6 | 12 |
3 | 4 | 9 | 1 |
4 | 7 | 5 | 11 |
df45.tail(3)
col1 | col2 | col3 | |
---|---|---|---|
2 | 3 | 6 | 12 |
3 | 4 | 9 | 1 |
4 | 7 | 5 | 11 |
- Write a Pandas program to get topmost n records within each group of a DataFrame.
df61 = pd.DataFrame({'col1': [1, 2, 3, 4, 7, 11], 'col2': [4, 5, 6, 9, 5, 0], 'col3': [7, 5, 8, 12, 1,11]})
df61
col1 | col2 | col3 | |
---|---|---|---|
0 | 1 | 4 | 7 |
1 | 2 | 5 | 5 |
2 | 3 | 6 | 8 |
3 | 4 | 9 | 12 |
4 | 7 | 5 | 1 |
5 | 11 | 0 | 11 |
df61.nlargest(3,'col1')
col1 | col2 | col3 | |
---|---|---|---|
5 | 11 | 0 | 11 |
4 | 7 | 5 | 1 |
3 | 4 | 9 | 12 |
df61.nlargest(3,'col2')
col1 | col2 | col3 | |
---|---|---|---|
3 | 4 | 9 | 12 |
2 | 3 | 6 | 8 |
1 | 2 | 5 | 5 |
df61.nlargest(3,'col3')
col1 | col2 | col3 | |
---|---|---|---|
3 | 4 | 9 | 12 |
5 | 11 | 0 | 11 |
2 | 3 | 6 | 8 |
- Write a Pandas program to remove first n rows of a given DataFrame.
df62=df61.copy()
df62
col1 | col2 | col3 | |
---|---|---|---|
0 | 1 | 4 | 7 |
1 | 2 | 5 | 5 |
2 | 3 | 6 | 8 |
3 | 4 | 9 | 12 |
4 | 7 | 5 | 1 |
5 | 11 | 0 | 11 |
df62=df62.tail(-3)
df62
col1 | col2 | col3 | |
---|---|---|---|
3 | 4 | 9 | 12 |
4 | 7 | 5 | 1 |
5 | 11 | 0 | 11 |
- Write a Pandas program to remove last n rows of a given DataFrame.
df63=df61.copy()
df63=df63.iloc[:-3,:]
df63
col1 | col2 | col3 | |
---|---|---|---|
0 | 1 | 4 | 7 |
1 | 2 | 5 | 5 |
2 | 3 | 6 | 8 |
- Write a Pandas program to add a prefix or suffix to all columns of a given DataFrame.
df64 = pd.DataFrame({'W':[68,75,86,80,66],'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]});
df64_1=df64.copy()
df64_1.columns='A_'+df64.columns
df64_1
A_W | A_X | A_Y | A_Z | |
---|---|---|---|---|
0 | 68 | 78 | 84 | 86 |
1 | 75 | 85 | 94 | 97 |
2 | 86 | 96 | 89 | 96 |
3 | 80 | 80 | 83 | 72 |
4 | 66 | 86 | 86 | 83 |
df64_2=df64.copy()
df64_2.columns=df64.columns+'_1'
df64_2
W_1 | X_1 | Y_1 | Z_1 | |
---|---|---|---|---|
0 | 68 | 78 | 84 | 86 |
1 | 75 | 85 | 94 | 97 |
2 | 86 | 96 | 89 | 96 |
3 | 80 | 80 | 83 | 72 |
4 | 66 | 86 | 86 | 83 |
#sol2
df64.add_prefix('A_')
A_W | A_X | A_Y | A_Z | |
---|---|---|---|---|
0 | 68 | 78 | 84 | 86 |
1 | 75 | 85 | 94 | 97 |
2 | 86 | 96 | 89 | 96 |
3 | 80 | 80 | 83 | 72 |
4 | 66 | 86 | 86 | 83 |
df64.add_suffix('_1')
W_1 | X_1 | Y_1 | Z_1 | |
---|---|---|---|---|
0 | 68 | 78 | 84 | 86 |
1 | 75 | 85 | 94 | 97 |
2 | 86 | 96 | 89 | 96 |
3 | 80 | 80 | 83 | 72 |
4 | 66 | 86 | 86 | 83 |
- Write a Pandas program to reverse order (rows, columns) of a given DataFrame.
df64[df64.columns[::-1]]
#df64.iloc[:,::-1] same answer
Z | Y | X | W | |
---|---|---|---|---|
0 | 86 | 84 | 78 | 68 |
1 | 97 | 94 | 85 | 75 |
2 | 96 | 89 | 96 | 86 |
3 | 72 | 83 | 80 | 80 |
4 | 83 | 86 | 86 | 66 |
df64.iloc[::-1]
W | X | Y | Z | |
---|---|---|---|---|
4 | 66 | 86 | 86 | 83 |
3 | 80 | 80 | 83 | 72 |
2 | 86 | 96 | 89 | 96 |
1 | 75 | 85 | 94 | 97 |
0 | 68 | 78 | 84 | 86 |
print("Reverse row order and reset index:")
df64.iloc[::-1].reset_index(drop=True)
Reverse row order and reset index:
W | X | Y | Z | |
---|---|---|---|---|
0 | 66 | 86 | 86 | 83 |
1 | 80 | 80 | 83 | 72 |
2 | 86 | 96 | 89 | 96 |
3 | 75 | 85 | 94 | 97 |
4 | 68 | 78 | 84 | 86 |
- Write a Pandas program to select columns by data type of a given DataFrame.
df66 = pd.DataFrame({
'name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Syed Wharton'],
'date_of_birth': ['17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
'age': [18.5, 21.2, 22.5, 22, 23]
})
df66
name | date_of_birth | age | |
---|---|---|---|
0 | Alberto Franco | 17/05/2002 | 18.5 |
1 | Gino Mcneill | 16/02/1999 | 21.2 |
2 | Ryan Parkes | 25/09/1998 | 22.5 |
3 | Eesha Hinton | 11/05/2002 | 22.0 |
4 | Syed Wharton | 15/09/1997 | 23.0 |
df66.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
name 5 non-null object
date_of_birth 5 non-null object
age 5 non-null float64
dtypes: float64(1), object(2)
memory usage: 248.0+ bytes
df66.select_dtypes(include='number')
age | |
---|---|
0 | 18.5 |
1 | 21.2 |
2 | 22.5 |
3 | 22.0 |
4 | 23.0 |
df66.select_dtypes(include='object')
name | date_of_birth | |
---|---|---|
0 | Alberto Franco | 17/05/2002 |
1 | Gino Mcneill | 16/02/1999 |
2 | Ryan Parkes | 25/09/1998 |
3 | Eesha Hinton | 11/05/2002 |
4 | Syed Wharton | 15/09/1997 |
- Write a Pandas program to split a given DataFrame into two random subsets.
df67=df66.copy()
df67_1=df67.sample(frac=0.7,random_state=0)
df67_2=df67.drop(df67_1.index)
df67_1
name | date_of_birth | age | |
---|---|---|---|
2 | Ryan Parkes | 25/09/1998 | 22.5 |
0 | Alberto Franco | 17/05/2002 | 18.5 |
1 | Gino Mcneill | 16/02/1999 | 21.2 |
3 | Eesha Hinton | 11/05/2002 | 22.0 |
df67_2
name | date_of_birth | age | |
---|---|---|---|
4 | Syed Wharton | 15/09/1997 | 23.0 |
- Write a Pandas program to rename all columns with the same pattern of a given DataFrame.
df68 = pd.DataFrame({
'Name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Syed Wharton'],
'Date_of_Birth': ['17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
'Age': [18.5, 21.2, 22.5, 22, 23]
})
df68
Name | Date_of_Birth | Age | |
---|---|---|---|
0 | Alberto Franco | 17/05/2002 | 18.5 |
1 | Gino Mcneill | 16/02/1999 | 21.2 |
2 | Ryan Parkes | 25/09/1998 | 22.5 |
3 | Eesha Hinton | 11/05/2002 | 22.0 |
4 | Syed Wharton | 15/09/1997 | 23.0 |
df68.columns=df68.columns.str.lower().str.strip()
df68
name | date_of_birth | age | |
---|---|---|---|
0 | Alberto Franco | 17/05/2002 | 18.5 |
1 | Gino Mcneill | 16/02/1999 | 21.2 |
2 | Ryan Parkes | 25/09/1998 | 22.5 |
3 | Eesha Hinton | 11/05/2002 | 22.0 |
4 | Syed Wharton | 15/09/1997 | 23.0 |
- Write a Pandas program to merge datasets and check uniqueness.
df69 = pd.DataFrame({
'Name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Syed Wharton'],
'Date_Of_Birth ': ['17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
'Age': [18.5, 21.2, 22.5, 22, 23]
})
df69_1=df69.copy()
df69_1=df69_1.drop([0,1])
df69_1
Name | Date_Of_Birth | Age | |
---|---|---|---|
2 | Ryan Parkes | 25/09/1998 | 22.5 |
3 | Eesha Hinton | 11/05/2002 | 22.0 |
4 | Syed Wharton | 15/09/1997 | 23.0 |
df69_2=df69.copy()
df69_2=df69_2.drop([2])
df69_2
Name | Date_Of_Birth | Age | |
---|---|---|---|
0 | Alberto Franco | 17/05/2002 | 18.5 |
1 | Gino Mcneill | 16/02/1999 | 21.2 |
3 | Eesha Hinton | 11/05/2002 | 22.0 |
4 | Syed Wharton | 15/09/1997 | 23.0 |
#one_to_one
pd.merge(df69_1,df69_2,validate='one_to_one')
Name | Date_Of_Birth | Age | |
---|---|---|---|
0 | Eesha Hinton | 11/05/2002 | 22.0 |
1 | Syed Wharton | 15/09/1997 | 23.0 |
#one_to_many
pd.merge(df69_1,df69_2,validate='one_to_many')
Name | Date_Of_Birth | Age | |
---|---|---|---|
0 | Eesha Hinton | 11/05/2002 | 22.0 |
1 | Syed Wharton | 15/09/1997 | 23.0 |
#many_to_one
pd.merge(df69_1,df69_2,validate='many_to_one')
Name | Date_Of_Birth | Age | |
---|---|---|---|
0 | Eesha Hinton | 11/05/2002 | 22.0 |
1 | Syed Wharton | 15/09/1997 | 23.0 |
#many_to_many
pd.merge(df69_1,df69_2,validate='many_to_many')
Name | Date_Of_Birth | Age | |
---|---|---|---|
0 | Eesha Hinton | 11/05/2002 | 22.0 |
1 | Syed Wharton | 15/09/1997 | 23.0 |
- Write a Pandas program to convert continuous values of a column in a given DataFrame to categorical.
df70 = pd.DataFrame({
'name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Syed Wharton', 'Kierra Gentry'],
'age': [18, 22, 85, 50, 80, 5]
})
df70
name | age | |
---|---|---|
0 | Alberto Franco | 18 |
1 | Gino Mcneill | 22 |
2 | Ryan Parkes | 85 |
3 | Eesha Hinton | 50 |
4 | Syed Wharton | 80 |
5 | Kierra Gentry | 5 |
df70['age_group']=pd.cut(df70.age,bins=[0,18,65,95],labels=['kids','adult','elderly'])
df70
name | age | age_group | |
---|---|---|---|
0 | Alberto Franco | 18 | kids |
1 | Gino Mcneill | 22 | adult |
2 | Ryan Parkes | 85 | elderly |
3 | Eesha Hinton | 50 | adult |
4 | Syed Wharton | 80 | elderly |
5 | Kierra Gentry | 5 | kids |
- Write a Pandas program to display memory usage of a given DataFrame and every column of the DataFrame.
df70.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
name 6 non-null object
age 6 non-null int64
age_group 6 non-null category
dtypes: category(1), int64(1), object(1)
memory usage: 334.0+ bytes
df70.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
name 6 non-null object
age 6 non-null int64
age_group 6 non-null category
dtypes: category(1), int64(1), object(1)
memory usage: 903.0 bytes
df70.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
name 6 non-null object
age 6 non-null int64
age_group 6 non-null category
dtypes: category(1), int64(1), object(1)
memory usage: 903.0 bytes
df70.memory_usage(deep=True)
Index 128
name 416
age 48
age_group 311
dtype: int64
df70.memory_usage(deep=False)
Index 128
name 48
age 48
age_group 110
dtype: int64
- Write a Pandas program to combine many given series to create a DataFrame.
s72_1=pd.Series('php python jawa c# c++'.split())
s72_2=pd.Series([1,2,3,4,5])
pd.concat([s72_1,s72_2],axis=1)
0 | 1 | |
---|---|---|
0 | php | 1 |
1 | python | 2 |
2 | jawa | 3 |
3 | c# | 4 |
4 | c++ | 5 |
pd.DataFrame({'col1':s72_1,'col2':s72_2})
col1 | col2 | |
---|---|---|
0 | php | 1 |
1 | python | 2 |
2 | jawa | 3 |
3 | c# | 4 |
4 | c++ | 5 |
- Write a Pandas program to create DataFrames that contains random values, contains missing values, contains datetime values and contains mixed values.
pd.DataFrame(np.random.randn(40).reshape(10,4),columns='A B C D'.split())
A | B | C | D | |
---|---|---|---|---|
0 | 0.976226 | -1.231195 | -1.927390 | 1.687527 |
1 | 0.265455 | -0.872881 | -1.234580 | 0.025491 |
2 | 2.309553 | -0.368772 | 1.281644 | 0.903505 |
3 | -0.490682 | -0.934150 | 0.395868 | -0.614944 |
4 | 1.270408 | 0.105810 | -0.934098 | -0.193174 |
5 | -0.412797 | 0.123316 | 0.656035 | -0.801055 |
6 | -0.627806 | -0.182581 | 0.200891 | -0.471121 |
7 | -0.705074 | 0.416434 | 0.424676 | 0.495411 |
8 | -0.157094 | -1.339826 | 1.379757 | 1.843449 |
9 | 1.265529 | -0.211037 | 0.542715 | 1.862077 |
pd.util.testing.makeDataFrame()
A | B | C | D | |
---|---|---|---|---|
mIrUvYIpau | -0.132329 | 0.369235 | -2.518745 | 0.463750 |
IIuIlUG3TZ | 0.224135 | 1.173692 | -1.402806 | -0.634084 |
lhz7LziqT6 | -1.747848 | 1.112314 | -1.173867 | -1.275242 |
bDRGRAVZ5L | -1.125598 | 1.167309 | 0.119561 | -0.315563 |
tltm1jawR2 | -0.170738 | -0.020117 | 0.807885 | -1.399174 |
gyHU2hW6dO | 0.228735 | -0.685193 | -0.241533 | -1.230713 |
GH6l1o4kp8 | -0.683374 | -1.373173 | 1.563031 | -1.344920 |
Ee15AMPrrZ | -0.229806 | -1.190730 | 0.691445 | -0.614836 |
CVtTzdhYOH | -1.806060 | 0.126193 | 1.921333 | 0.655736 |
u7yGpt7ojr | 0.418048 | -0.560950 | -2.247065 | 2.067856 |
JQRGGsJRHG | -1.116554 | 0.614687 | -0.367641 | 0.160849 |
6NTUAVqlrk | 0.622822 | 0.119354 | -1.052081 | -0.347517 |
8XKVbhaoCO | 1.380594 | -0.907565 | -1.298444 | 0.627115 |
HSlwYeeFDf | 0.566009 | -0.185294 | 0.437134 | -1.943550 |
441fk4pWhD | 0.214447 | -1.183068 | 1.761380 | -0.652720 |
VdiWlWttWB | 0.743793 | 1.722755 | 0.757516 | 0.615187 |
ls77KaEvOq | -1.344870 | 0.364236 | 0.093453 | -0.609153 |
5K0bPzfMFi | -1.065251 | 0.841662 | 1.479194 | -0.731203 |
AXsuHNuwHo | -0.827250 | 1.965530 | 0.773863 | 0.665516 |
T0GlToPXZi | -0.320390 | -0.728882 | 2.018289 | -0.578019 |
22vFAApOA5 | -0.494500 | 0.550730 | -1.067188 | 0.044000 |
gPiu7GA3im | -0.521648 | -0.856955 | 0.709654 | -0.932031 |
OZnBwP9gm9 | -0.181220 | -1.985956 | 0.794047 | -0.335244 |
9U23Nv30l1 | 0.126521 | -0.269881 | -0.475025 | 0.873421 |
NLLYWOAAPo | -1.399080 | -0.614755 | -0.431522 | -1.314192 |
0snfJwr8Zi | 1.104287 | -0.747049 | -0.494870 | -0.280476 |
hyIRHRTG6c | 0.447759 | 0.375540 | -1.348494 | -1.576017 |
nO4Ybp7ia5 | -0.665945 | 1.525348 | 0.291121 | 0.763574 |
OU81k9dvGD | 0.897210 | 0.587101 | 1.375282 | -0.842839 |
ktS4vr3Pqy | -0.270603 | -0.081540 | -1.243459 | 0.965349 |
pd.util.testing.makeMissingDataframe()
A | B | C | D | |
---|---|---|---|---|
K1jtgA2gtF | -1.142478 | 1.478004 | 0.777470 | 0.517304 |
Fz0OYcxMQn | -0.096315 | 0.070175 | 0.176539 | -0.511371 |
7gLTJCMX3w | -0.236882 | 0.453078 | -0.588427 | 1.412213 |
Id0qucUCeB | NaN | -1.088763 | -0.637960 | 1.000012 |
MlJfZSTHJj | -0.732373 | 1.112463 | 0.441055 | 0.148622 |
p74iwf5CWk | 0.880165 | NaN | 0.319050 | -0.100719 |
k4wEIqvtKd | -1.076164 | -1.052221 | NaN | 0.817084 |
6gdeN8A4gg | 1.160957 | 0.109959 | 0.587556 | 1.699185 |
c0GHnKP4vD | -1.127555 | 0.933663 | 0.527032 | -0.450704 |
0yltZq25JS | -0.177030 | 0.225647 | 0.413793 | 1.491172 |
BAGqtJef0S | 0.356970 | 0.621645 | -1.182119 | -0.008877 |
iOriKpo3am | 0.734046 | -0.004511 | 0.921544 | 0.719030 |
Ogd3TaGzC0 | -0.076480 | -1.460346 | 0.026264 | -0.775999 |
XZp591LpQ7 | -0.998655 | NaN | 1.518520 | 0.453103 |
nvd2jdiBX5 | 1.592946 | -1.306997 | -0.168547 | NaN |
ICJxEvHfx4 | -0.577977 | 0.387412 | -0.594324 | 0.294138 |
mLyqvY1KVk | 0.901325 | -1.560872 | 0.160145 | -1.170798 |
23TmyM4Vjt | 0.291432 | -0.810652 | -0.184815 | 0.211070 |
J2JBz9nc2N | 1.818592 | NaN | NaN | -0.257770 |
77utwjDcEg | NaN | 0.183574 | 0.111528 | 1.381484 |
yAg15dn3KU | -0.259229 | -0.201913 | -1.912119 | -0.271751 |
yRbk019rAz | -0.913102 | -0.120397 | -0.721780 | 0.725017 |
U5oGxD3WnW | -0.565617 | 0.579980 | NaN | 0.031062 |
sNGtSOERD7 | NaN | 0.074556 | 1.525357 | -1.744255 |
ubpFxokUim | 1.045186 | 1.165908 | -1.138931 | -0.082937 |
b6ZCgA1dZD | 1.620180 | 0.962883 | 0.966021 | 1.709217 |
3lYiadTEpV | -0.147411 | 0.950925 | -0.122691 | -0.973400 |
qvn2AUqCgF | 1.207141 | 0.611017 | -0.733806 | 0.072065 |
dqO35ESFKx | 2.076349 | 0.519654 | NaN | 0.217811 |
am4HPhiP5I | -1.403172 | NaN | 0.797241 | -0.641350 |
pd.util.testing.makeTimeDataFrame()
A | B | C | D | |
---|---|---|---|---|
2000-01-03 | 1.003781 | -0.356442 | -0.690759 | -2.543469 |
2000-01-04 | 0.307821 | -0.425427 | -0.134576 | 0.824121 |
2000-01-05 | -0.976996 | -1.113531 | -0.522290 | -0.567436 |
2000-01-06 | 0.438595 | 0.129363 | -0.433556 | 1.080008 |
2000-01-07 | -0.001516 | -0.291726 | -1.008141 | -0.605705 |
2000-01-10 | 0.015357 | -1.016492 | -0.162590 | -2.282291 |
2000-01-11 | 0.523147 | 0.587595 | 0.451137 | -0.031271 |
2000-01-12 | 0.917431 | 1.109996 | -0.561045 | 0.305243 |
2000-01-13 | 1.150708 | -0.069645 | -0.561491 | -0.165408 |
2000-01-14 | 1.122621 | -1.035067 | 1.201935 | -0.690251 |
2000-01-17 | 0.813662 | 0.924708 | 0.545710 | -0.002078 |
2000-01-18 | 1.379715 | -0.229399 | -0.726494 | 0.493481 |
2000-01-19 | 0.622288 | -0.757303 | 0.689828 | 0.229719 |
2000-01-20 | -0.680374 | 1.011196 | -0.551390 | -1.713440 |
2000-01-21 | -1.524046 | -1.010875 | -0.997332 | 0.666093 |
2000-01-24 | -0.941409 | 1.025098 | -0.905963 | 1.082745 |
2000-01-25 | 0.042881 | 1.382499 | 1.049391 | 1.074299 |
2000-01-26 | 0.967413 | -0.923113 | 2.007157 | 0.628600 |
2000-01-27 | -0.051459 | 1.172377 | 0.464885 | 1.129185 |
2000-01-28 | 0.533879 | -1.320932 | 0.503104 | -0.695140 |
2000-01-31 | -1.133007 | 0.437374 | 0.505606 | -1.571305 |
2000-02-01 | 0.979238 | -0.721377 | 0.234655 | -0.061103 |
2000-02-02 | 0.231642 | -0.627750 | -0.133623 | 0.748030 |
2000-02-03 | -1.431866 | 1.060292 | -0.040169 | 0.229246 |
2000-02-04 | -0.350627 | -1.165422 | 0.436717 | 0.446511 |
2000-02-07 | -1.886733 | 0.926730 | 0.604955 | -1.034671 |
2000-02-08 | 1.205402 | 1.982943 | -0.653149 | -0.675256 |
2000-02-09 | 0.212245 | -0.374800 | -0.472396 | 1.577047 |
2000-02-10 | -0.732806 | -1.332025 | 1.173203 | 0.602202 |
2000-02-11 | -0.681866 | -0.170969 | -1.366112 | 0.779899 |
pd.util.testing.makeMixedDataFrame()
A | B | C | D | |
---|---|---|---|---|
0 | 0.0 | 0.0 | foo1 | 2009-01-01 |
1 | 1.0 | 1.0 | foo2 | 2009-01-02 |
2 | 2.0 | 0.0 | foo3 | 2009-01-05 |
3 | 3.0 | 1.0 | foo4 | 2009-01-06 |
4 | 4.0 | 0.0 | foo5 | 2009-01-07 |
- Write a Pandas program to fill missing values in time series data.
s74 = {"c1":[120, 130 ,140, 150, np.nan, 170], "c2":[7, np.nan, 10, np.nan, 5.5, 16.5]}
pd.util.testing.makeDateIndex()[:6]
DatetimeIndex(['2000-01-03', '2000-01-04', '2000-01-05', '2000-01-06', '2000-01-07', '2000-01-10'], dtype='datetime64[ns]', freq='B')
df74=pd.DataFrame(s74,index=pd.util.testing.makeDateIndex()[:6])
df74
c1 | c2 | |
---|---|---|
2000-01-03 | 120.0 | 7.0 |
2000-01-04 | 130.0 | NaN |
2000-01-05 | 140.0 | 10.0 |
2000-01-06 | 150.0 | NaN |
2000-01-07 | NaN | 5.5 |
2000-01-10 | 170.0 | 16.5 |
df74.interpolate()
c1 | c2 | |
---|---|---|
2000-01-03 | 120.0 | 7.00 |
2000-01-04 | 130.0 | 8.50 |
2000-01-05 | 140.0 | 10.00 |
2000-01-06 | 150.0 | 7.75 |
2000-01-07 | 160.0 | 5.50 |
2000-01-10 | 170.0 | 16.50 |
- Write a Pandas program to use a local variable within a query.
df75 = pd.DataFrame({'W':[68,75,86,80,66],'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]});
df75
W | X | Y | Z | |
---|---|---|---|---|
0 | 68 | 78 | 84 | 86 |
1 | 75 | 85 | 94 | 97 |
2 | 86 | 96 | 89 | 96 |
3 | 80 | 80 | 83 | 72 |
4 | 66 | 86 | 86 | 83 |
maxx = df75["W"].max()
df75.query("W < @maxx")
W | X | Y | Z | |
---|---|---|---|---|
0 | 68 | 78 | 84 | 86 |
1 | 75 | 85 | 94 | 97 |
3 | 80 | 80 | 83 | 72 |
4 | 66 | 86 | 86 | 83 |
#Solution 2 (without Query)
df75[df75.W<df75.W.max()]
W | X | Y | Z | |
---|---|---|---|---|
0 | 68 | 78 | 84 | 86 |
1 | 75 | 85 | 94 | 97 |
3 | 80 | 80 | 83 | 72 |
4 | 66 | 86 | 86 | 83 |
- Write a Pandas program to clean object column with mixed data of a given DataFrame using regular expression.
df76 = pd.DataFrame({"agent": ["a001", "a002", "a003", "a003", "a004"], "purchase":[4500.00, 7500.00, "$3000.25", "$1250.35", "9000.00"]})
df76
agent | purchase | |
---|---|---|
0 | a001 | 4500 |
1 | a002 | 7500 |
2 | a003 | $3000.25 |
3 | a003 | $1250.35 |
4 | a004 | 9000.00 |
df76.purchase.apply(type)
0 <class 'float'>
1 <class 'float'>
2 <class 'str'>
3 <class 'str'>
4 <class 'str'>
Name: purchase, dtype: object
df76.purchase.replace("[$]","",regex=True).astype('float').apply(type)
0 <class 'float'>
1 <class 'float'>
2 <class 'float'>
3 <class 'float'>
4 <class 'float'>
Name: purchase, dtype: object
- Write a Pandas program to get the numeric representation of an array by identifying distinct values of a given column of a dataframe.
df77 = pd.DataFrame({
'Name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Gino Mcneill'],
'Date_Of_Birth ': ['17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
'Age': [18.5, 21.2, 22.5, 22, 23]
})
df77
Name | Date_Of_Birth | Age | |
---|---|---|---|
0 | Alberto Franco | 17/05/2002 | 18.5 |
1 | Gino Mcneill | 16/02/1999 | 21.2 |
2 | Ryan Parkes | 25/09/1998 | 22.5 |
3 | Eesha Hinton | 11/05/2002 | 22.0 |
4 | Gino Mcneill | 15/09/1997 | 23.0 |
a77_1=df77.Name.unique()
d77={}
for ind,name in df77.iterrows():
print(ind,name[0])
if name[0] not in d77.keys():
d77[name[0]]=ind
0 Alberto Franco
1 Gino Mcneill
2 Ryan Parkes
3 Eesha Hinton
4 Gino Mcneill
d77
{'Alberto Franco': 0, 'Gino Mcneill': 1, 'Ryan Parkes': 2, 'Eesha Hinton': 3}
df77.replace({'Name':d77})
Name | Date_Of_Birth | Age | |
---|---|---|---|
0 | 0 | 17/05/2002 | 18.5 |
1 | 1 | 16/02/1999 | 21.2 |
2 | 2 | 25/09/1998 | 22.5 |
3 | 3 | 11/05/2002 | 22.0 |
4 | 1 | 15/09/1997 | 23.0 |
#solution 1
df77['Name'].map(d77).tolist()
[0, 1, 2, 3, 1]
#Solution 2
df77.replace({'Name':d77}).Name.tolist()
[0, 1, 2, 3, 1]
#one step Solution
pd.factorize(df77.Name)
(array([0, 1, 2, 3, 1], dtype=int64),
Index(['Alberto Franco', 'Gino Mcneill', 'Ryan Parkes', 'Eesha Hinton'], dtype='object'))
- Write a Pandas program to replace the current value in a dataframe column based on last largest value. If the current value is less than last largest value replaces the value with 0.
df78=pd.DataFrame({'rnum':[23, 21, 27, 22, 34, 33, 34, 31, 25, 22, 34, 19, 31, 32, 19]})
df78
rnum | |
---|---|
0 | 23 |
1 | 21 |
2 | 27 |
3 | 22 |
4 | 34 |
5 | 33 |
6 | 34 |
7 | 31 |
8 | 25 |
9 | 22 |
10 | 34 |
11 | 19 |
12 | 31 |
13 | 32 |
14 | 19 |
a78_1=[-np.inf]
for ele in df78.rnum:
if ele<np.amax(a78_1):
a78_1.append(0)
else:
a78_1.append(ele)
#print(ele)
print(a78_1[1:])
[23, 0, 27, 0, 34, 0, 34, 0, 0, 0, 34, 0, 0, 0, 0]
#Solution 2
df78.rnum.where(df78.rnum.eq(df78.rnum.cummax()),0)
0 23
1 0
2 27
3 0
4 34
5 0
6 34
7 0
8 0
9 0
10 34
11 0
12 0
13 0
14 0
Name: rnum, dtype: int64
- Write a Pandas program to create a DataFrame from the clipboard (data from an Excel spreadsheet or a Google Sheet).
df79=pd.read_clipboard()
df79
Python | Pandas | DataFrame |
---|
- Write a Pandas program to check for inequality of two given DataFrames.
df80_1 = pd.DataFrame({'W':[68,75,86,80,None],'X':[78,85,None,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]});
df80_2 = pd.DataFrame({'W':[78,75,86,80,None],'X':[78,85,96,80,76], 'Y':[84,84,89,83,86],'Z':[86,97,96,72,83]});
df80_1!=df80_2
#~(df80_1==df80_2)
#df80_1.ne(df80_2)
W | X | Y | Z | |
---|---|---|---|---|
0 | True | False | False | False |
1 | False | False | True | False |
2 | False | True | False | False |
3 | False | False | False | False |
4 | True | True | False | False |
- Write a Pandas program to get lowest n records within each group of a given DataFrame.
df81 = pd.DataFrame({'col1': [1, 2, 3, 4, 7, 11], 'col2': [4, 5, 6, 9, 5, 0], 'col3': [7, 5, 8, 12, 1,11]})
df81.nsmallest(3,'col1')
col1 | col2 | col3 | |
---|---|---|---|
0 | 1 | 4 | 7 |
1 | 2 | 5 | 5 |
2 | 3 | 6 | 8 |
df81.nsmallest(3,'col2')
col1 | col2 | col3 | |
---|---|---|---|
5 | 11 | 0 | 11 |
0 | 1 | 4 | 7 |
1 | 2 | 5 | 5 |
df81.nsmallest(3,'col3')
col1 | col2 | col3 | |
---|---|---|---|
4 | 7 | 5 | 1 |
1 | 2 | 5 | 5 |
0 | 1 | 4 | 7 |
#df81.nsmallest(3,columns=['col2','col1']) # It is taking first passed column into account even if we pass multiple columns