Python Pandas - DataFrame

29 minute read

Exercise - Python Pandas DataFrame

import pandas as pd
import numpy as np

Write a Pandas program to create and display a DataFrame from a specified dictionary data.

data1= {'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]}

df1=pd.DataFrame(data1)

print(df1)

Write a Pandas program to create and display a DataFrame from a specified dictionary data which has the index labels.

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df2=pd.DataFrame(exam_data,index=labels,columns=['attempts','name','qualify','score'])

df2

	attempts	name	qualify	score
a	1	Anastasia	yes	12.5
b	3	Dima	no	9.0
c	2	Katherine	yes	16.5
d	3	James	no	NaN
e	2	Emily	no	9.0
f	3	Michael	yes	20.0
g	1	Matthew	yes	14.5
h	1	Laura	no	NaN
i	2	Kevin	no	8.0
j	1	Jonas	yes	19.0

Write a Pandas program to display a summary of the basic information about a specified DataFrame and its data.

df2.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10 entries, a to j
Data columns (total 4 columns):
attempts    10 non-null int64
name        10 non-null object
qualify     10 non-null object
score       8 non-null float64
dtypes: float64(1), int64(1), object(2)
memory usage: 400.0+ bytes

df2.describe()

	attempts	score
count	10.000000	8.000000
mean	1.900000	13.562500
std	0.875595	4.693746
min	1.000000	8.000000
25%	1.000000	9.000000
50%	2.000000	13.500000
75%	2.750000	17.125000
max	3.000000	20.000000

Write a Pandas program to get the first 3 rows of a given DataFrame.

df2.head(3)

	attempts	name	qualify	score
a	1	Anastasia	yes	12.5
b	3	Dima	no	9.0
c	2	Katherine	yes	16.5

1. Write a Pandas program to select the ‘name’ and ‘score’ columns from the above DataFrame.

df2[['name','score']]

	name	score
a	Anastasia	12.5
b	Dima	9.0
c	Katherine	16.5
d	James	NaN
e	Emily	9.0
f	Michael	20.0
g	Matthew	14.5
h	Laura	NaN
i	Kevin	8.0
j	Jonas	19.0

Write a Pandas program to select the specified columns and rows from a given data frame.

df2[['name','score']].iloc[[1,3,5,6],:]

	name	score
b	Dima	9.0
d	James	NaN
f	Michael	20.0
g	Matthew	14.5

Write a Pandas program to select the rows where the number of attempts in the examination is greater than 2.

df2[df2.attempts>2]

	attempts	name	qualify	score
b	3	Dima	no	9.0
d	3	James	no	NaN
f	3	Michael	yes	20.0

Write a Pandas program to count the number of rows and columns of a DataFrame.

df2.columns

Index(['attempts', 'name', 'qualify', 'score'], dtype='object')

print("Number of rows : {} \nNumber of columns : {}".format(df2.shape[0],df2.shape[1]))

Number of rows : 10
Number of columns : 4

print("Number of rows : {} \nNumber of columns : {}".format(len(df2),len(df2.columns)))

Number of rows : 10
Number of columns : 4

Write a Pandas program to select the rows where the score is missing, i.e. is NaN.

df2[df2.score.isna()]

	attempts	name	qualify	score
d	3	James	no	NaN
h	1	Laura	no	NaN

Write a Pandas program to select the rows the score is between 15 and 20 (inclusive).

df2[(df2.score>=15) & (df2.score<=20)]

	attempts	name	qualify	score
c	2	Katherine	yes	16.5
f	3	Michael	yes	20.0
j	1	Jonas	yes	19.0

Write a Pandas program to select the rows where number of attempts in the examination is less than 2 and score greater than 15.

df2[(df2.attempts<2) & (df2.score>15)]

	attempts	name	qualify	score
j	1	Jonas	yes	19.0

Write a Pandas program to change the score in row ‘d’ to 11.5.

df12=df2.copy()

df12.loc['d','score']=11.5

df12.loc['d']

attempts        3
name        James
qualify        no
score        11.5
Name: d, dtype: object

Write a Pandas program to calculate the sum of the examination attempts by the students.

df2.attempts.sum()

Write a Pandas program to calculate the mean score for each different student in DataFrame.

df2.score.mean()

13.5625

Write a Pandas program to append a new row ‘k’ to data frame with given values for each column. Now delete the new row and return the original DataFrame.

df15=df2.copy()

df15.loc['k']=[1,'Suresh','yes',15.5]

df15

	attempts	name	qualify	score
a	1	Anastasia	yes	12.5
b	3	Dima	no	9.0
c	2	Katherine	yes	16.5
d	3	James	no	NaN
e	2	Emily	no	9.0
f	3	Michael	yes	20.0
g	1	Matthew	yes	14.5
h	1	Laura	no	NaN
i	2	Kevin	no	8.0
j	1	Jonas	yes	19.0
k	1	Suresh	yes	15.5

Write a Pandas program to sort the DataFrame first by ‘name’ in descending order, then by ‘score’ in ascending order.

df2.sort_values(['name','score'],ascending=[False,True])

	attempts	name	qualify	score
f	3	Michael	yes	20.0
g	1	Matthew	yes	14.5
h	1	Laura	no	NaN
i	2	Kevin	no	8.0
c	2	Katherine	yes	16.5
j	1	Jonas	yes	19.0
d	3	James	no	NaN
e	2	Emily	no	9.0
b	3	Dima	no	9.0
a	1	Anastasia	yes	12.5

Write a Pandas program to replace the ‘qualify’ column contains the values ‘yes’ and ‘no’ with True and False.

df17=df2.copy()

df17.qualify=df17.qualify.apply(lambda x:True if x=='yes' else False)

df17

	attempts	name	qualify	score
a	1	Anastasia	True	12.5
b	3	Dima	False	9.0
c	2	Katherine	True	16.5
d	3	James	False	NaN
e	2	Emily	False	9.0
f	3	Michael	True	20.0
g	1	Matthew	True	14.5
h	1	Laura	False	NaN
i	2	Kevin	False	8.0
j	1	Jonas	True	19.0

Write a Pandas program to change the name ‘James’ to ‘Suresh’ in name column of the DataFrame.

df18=df2.copy()
df18.name=df18.name.replace('James','Suresh')

df18

	attempts	name	qualify	score
a	1	Anastasia	yes	12.5
b	3	Dima	no	9.0
c	2	Katherine	yes	16.5
d	3	Suresh	no	NaN
e	2	Emily	no	9.0
f	3	Michael	yes	20.0
g	1	Matthew	yes	14.5
h	1	Laura	no	NaN
i	2	Kevin	no	8.0
j	1	Jonas	yes	19.0

Write a Pandas program to delete the ‘attempts’ column from the DataFrame.

df19=df2.copy()

df19.drop('attempts',axis=1)

	name	qualify	score
a	Anastasia	yes	12.5
b	Dima	no	9.0
c	Katherine	yes	16.5
d	James	no	NaN
e	Emily	no	9.0
f	Michael	yes	20.0
g	Matthew	yes	14.5
h	Laura	no	NaN
i	Kevin	no	8.0
j	Jonas	yes	19.0

Write a Pandas program to insert a new column in existing DataFrame.

df20=df2.copy()

color = ['Red','Blue','Orange','Red','White','White','Blue','Green','Green','Red']

df20['color']=color

df20

	attempts	name	qualify	score	color
a	1	Anastasia	yes	12.5	Red
b	3	Dima	no	9.0	Blue
c	2	Katherine	yes	16.5	Orange
d	3	James	no	NaN	Red
e	2	Emily	no	9.0	White
f	3	Michael	yes	20.0	White
g	1	Matthew	yes	14.5	Blue
h	1	Laura	no	NaN	Green
i	2	Kevin	no	8.0	Green
j	1	Jonas	yes	19.0	Red

Write a Pandas program to iterate over rows in a DataFrame.

exam_data = [{'name':'Anastasia', 'score':12.5}, {'name':'Dima','score':9}, {'name':'Katherine','score':16.5}]

df21=pd.DataFrame(exam_data)

for index,row in df21.iterrows():
    print(row['name'],row['score'])

Anastasia 12.5
Dima 9.0
Katherine 16.5

Write a Pandas program to get list from DataFrame column headers.

df2.columns.tolist()

['attempts', 'name', 'qualify', 'score']

Write a Pandas program to rename columns of a given DataFrame.

df23= pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6], 'col3': [7, 8, 9]})

df23

	col1	col2	col3
0	1	4	7
1	2	5	8
2	3	6	9

df23.columns=['Column1','Column2','Column3']

df23

	Column1	Column2	Column3
0	1	4	7
1	2	5	8
2	3	6	9

df23=df23.rename(columns={'Column1':'Columns1'})

df23

	Columns1	Column2	Column3
0	1	4	7
1	2	5	8
2	3	6	9

Write a Pandas program to select rows from a given DataFrame based on values in some columns.

df24= pd.DataFrame({'col1': [1, 4,3,4,5], 'col2': [4, 5, 6,7,8], 'col3': [7, 8, 9,0,1]})

df24[df24.col1==4]

	col1	col2	col3
1	4	5	8
3	4	7	0

Write a Pandas program to change the order of a DataFrame columns.

df25=df24[['col3','col2','col1']]

df25

	col3	col2	col1
0	7	4	1
1	8	5	4
2	9	6	3
3	0	7	4
4	1	8	5

Write a Pandas program to add one row in an existing DataFrame.

df26=df24.copy()

df26.loc[len(df26)]=[10,11,12]

df26

	col1	col2	col3
0	1	4	7
1	4	5	8
2	3	6	9
3	4	7	0
4	5	8	1
5	10	11	12

Write a Pandas program to write a DataFrame to CSV file using tab separator.

df24.to_csv('27.csv',sep='\t',index=False)

df27=pd.read_csv('27.csv')

df27

	col1\tcol2\tcol3
0	1\t4\t7
1	4\t5\t8
2	3\t6\t9
3	4\t7\t0
4	5\t8\t1

Write a Pandas program to count city wise number of people from a given of data set (city, name of the person).

df28 = pd.DataFrame({'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
'city': ['California', 'Los Angeles', 'California', 'California', 'California', 'Los Angeles', 'Los Angeles', 'Georgia', 'Georgia', 'Los Angeles']})

df28

	name	city
0	Anastasia	California
1	Dima	Los Angeles
2	Katherine	California
3	James	California
4	Emily	California
5	Michael	Los Angeles
6	Matthew	Los Angeles
7	Laura	Georgia
8	Kevin	Georgia
9	Jonas	Los Angeles

df28.groupby('city').size().reset_index(name='Number of people')

	city	Number of people
0	California	4
1	Georgia	2
2	Los Angeles	4

Write a Pandas program to delete DataFrame row(s) based on given column value.

df29=df24.copy()

df29=df29[df29.col2!=5]

df29

	col1	col2	col3
0	1	4	7
2	3	6	9
3	4	7	0
4	5	8	1

Write a Pandas program to widen output display to see more columns.

pd.set_option('display.max_rows',500)
pd.set_option('display.max_columns',500)
pd.set_option('display.width',1000)

print(df24)

   col1  col2  col3
   1     4     7
   4     5     8
   3     6     9
   4     7     0
   5     8     1

Write a Pandas program to select a row of series/dataframe by given integer index.

df24.loc[[2]]

	col1	col2	col3
2	3	6	9

Write a Pandas program to replace all the NaN values with Zero’s in a column of a dataframe.

df32=df2.copy()

df32.fillna(0)

	attempts	name	qualify	score
a	1	Anastasia	yes	12.5
b	3	Dima	no	9.0
c	2	Katherine	yes	16.5
d	3	James	no	0.0
e	2	Emily	no	9.0
f	3	Michael	yes	20.0
g	1	Matthew	yes	14.5
h	1	Laura	no	0.0
i	2	Kevin	no	8.0
j	1	Jonas	yes	19.0

Write a Pandas program to convert index in a column of the given dataframe.

df33=df2.copy()

df33.reset_index()

	index	attempts	name	qualify	score
0	a	1	Anastasia	yes	12.5
1	b	3	Dima	no	9.0
2	c	2	Katherine	yes	16.5
3	d	3	James	no	NaN
4	e	2	Emily	no	9.0
5	f	3	Michael	yes	20.0
6	g	1	Matthew	yes	14.5
7	h	1	Laura	no	NaN
8	i	2	Kevin	no	8.0
9	j	1	Jonas	yes	19.0

print(df33.reset_index().to_string(index=False))

index  attempts       name qualify  score
    a         1  Anastasia     yes   12.5
    b         3       Dima      no    9.0
    c         2  Katherine     yes   16.5
    d         3      James      no    NaN
    e         2      Emily      no    9.0
    f         3    Michael     yes   20.0
    g         1    Matthew     yes   14.5
    h         1      Laura      no    NaN
    i         2      Kevin      no    8.0
    j         1      Jonas     yes   19.0

#Back to original be setting index once again
df33.reset_index().set_index('index')

	attempts	name	qualify	score
index
a	1	Anastasia	yes	12.5
b	3	Dima	no	9.0
c	2	Katherine	yes	16.5
d	3	James	no	NaN
e	2	Emily	no	9.0
f	3	Michael	yes	20.0
g	1	Matthew	yes	14.5
h	1	Laura	no	NaN
i	2	Kevin	no	8.0
j	1	Jonas	yes	19.0

Write a Pandas program to set a given value for particular cell in DataFrame using index value.

df2

	attempts	name	qualify	score
a	1	Anastasia	yes	12.5
b	3	Dima	no	9.0
c	2	Katherine	yes	16.5
d	3	James	no	NaN
e	2	Emily	no	9.0
f	3	Michael	yes	20.0
g	1	Matthew	yes	14.5
h	1	Laura	no	NaN
i	2	Kevin	no	8.0
j	1	Jonas	yes	19.0

df34=df2.copy()

df34.loc['i',:].score=10.2

C:\Users\a112471\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py:5208: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value

df34

	attempts	name	qualify	score
a	1	Anastasia	yes	12.5
b	3	Dima	no	9.0
c	2	Katherine	yes	16.5
d	3	James	no	NaN
e	2	Emily	no	9.0
f	3	Michael	yes	20.0
g	1	Matthew	yes	14.5
h	1	Laura	no	NaN
i	2	Kevin	no	8.0
j	1	Jonas	yes	19.0

df34.set_value('i','score',10.2)

C:\Users\a112471\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: set_value is deprecated and will be removed in a future release. Please use .at[] or .iat[] accessors instead
  """Entry point for launching an IPython kernel.

	attempts	name	qualify	score
a	1	Anastasia	yes	12.5
b	3	Dima	no	9.0
c	2	Katherine	yes	16.5
d	3	James	no	NaN
e	2	Emily	no	9.0
f	3	Michael	yes	20.0
g	1	Matthew	yes	14.5
h	1	Laura	no	NaN
i	2	Kevin	no	10.2
j	1	Jonas	yes	19.0

df34.at['i','score']=10.2

df34

	attempts	name	qualify	score
a	1	Anastasia	yes	12.5
b	3	Dima	no	9.0
c	2	Katherine	yes	16.5
d	3	James	no	NaN
e	2	Emily	no	9.0
f	3	Michael	yes	20.0
g	1	Matthew	yes	14.5
h	1	Laura	no	NaN
i	2	Kevin	no	10.2
j	1	Jonas	yes	19.0

df34.iat[8,3]

10.2

Write a Pandas program to count the NaN values in one or more columns in DataFrame.

df2.isna().sum().sum()

Write a Pandas program to drop a list of rows from a specified DataFrame.

df36=df24.copy()

df36.drop([2,4],axis=0)

	col1	col2	col3
0	1	4	7
1	4	5	8
3	4	7	0

Write a Pandas program to reset index in a given DataFrame.

df37=df2.copy()

df37.drop(['a','b']).reset_index()

	index	attempts	name	qualify	score
0	c	2	Katherine	yes	16.5
1	d	3	James	no	NaN
2	e	2	Emily	no	9.0
3	f	3	Michael	yes	20.0
4	g	1	Matthew	yes	14.5
5	h	1	Laura	no	NaN
6	i	2	Kevin	no	8.0
7	j	1	Jonas	yes	19.0

df37

	attempts	name	qualify	score
a	1	Anastasia	yes	12.5
b	3	Dima	no	9.0
c	2	Katherine	yes	16.5
d	3	James	no	NaN
e	2	Emily	no	9.0
f	3	Michael	yes	20.0
g	1	Matthew	yes	14.5
h	1	Laura	no	NaN
i	2	Kevin	no	8.0
j	1	Jonas	yes	19.0

Write a Pandas program to divide a DataFrame in a given ratio.

df38=pd.DataFrame(np.random.randn(10,2))

df38

	0	1
0	1.248443	-1.241437
1	1.350421	-1.539320
2	-0.118075	1.586380
3	0.126995	1.419173
4	-0.062786	0.070672
5	-0.077202	0.442252
6	-0.547482	0.472672
7	-0.154437	-0.138206
8	0.634787	1.480123
9	-0.029771	0.834099

df38_70per=df38.sample(frac=.7,random_state=0)

df38_30per=df38.drop(df38_70per.index)

df38_70per

	0	1
2	-0.118075	1.586380
8	0.634787	1.480123
4	-0.062786	0.070672
9	-0.029771	0.834099
1	1.350421	-1.539320
6	-0.547482	0.472672
7	-0.154437	-0.138206

df38_30per

	0	1
0	1.248443	-1.241437
3	0.126995	1.419173
5	-0.077202	0.442252

Write a Pandas program to combining two series into a DataFrame.

s39_1 = pd.Series(['100', '200', 'python', '300.12', '400'])
s39_2 = pd.Series(['10', '20', 'php', '30.12', '40'])

pd.concat([s39_1,s39_2],axis=1)

	0	1
0	100	10
1	200	20
2	python	php
3	300.12	30.12
4	400	40

Write a Pandas program to shuffle a given DataFrame rows.

df40=df2.copy()

df40.sample(frac=1)

	attempts	name	qualify	score
g	1	Matthew	yes	14.5
f	3	Michael	yes	20.0
d	3	James	no	NaN
a	1	Anastasia	yes	12.5
e	2	Emily	no	9.0
h	1	Laura	no	NaN
i	2	Kevin	no	8.0
c	2	Katherine	yes	16.5
j	1	Jonas	yes	19.0
b	3	Dima	no	9.0

df40

	attempts	name	qualify	score
a	1	Anastasia	yes	12.5
b	3	Dima	no	9.0
c	2	Katherine	yes	16.5
d	3	James	no	NaN
e	2	Emily	no	9.0
f	3	Michael	yes	20.0
g	1	Matthew	yes	14.5
h	1	Laura	no	NaN
i	2	Kevin	no	8.0
j	1	Jonas	yes	19.0

Write a Pandas program to convert DataFrame column type from string to datetime.

s41 = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000'])

s41

  3/11/2000
  3/12/2000
  3/13/2000
dtype: object

pd.to_datetime(s41)

 2000-03-11
 2000-03-12
 2000-03-13
dtype: datetime64[ns]

Write a Pandas program to rename a specific column name in a given DataFrame.

df42= pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6], 'col3': [7, 8, 9]})

df42.rename(columns={'col2':'column2'})

	col1	column2	col3
0	1	4	7
1	2	5	8
2	3	6	9

Write a Pandas program to get a list of a specified column of a DataFrame.

df42.col2.tolist()

[4, 5, 6]

Write a Pandas program to create a DataFrame from a Numpy array and specify the index column and column headers.

pd.DataFrame(np.random.randn(30).reshape(10,3),index=np.linspace(0,9,10,dtype='int'),columns=['column1','columns2','columns3'])

	column1	columns2	columns3
0	-0.345403	2.167887	0.495698
1	1.146200	-1.078212	0.231268
2	-0.619157	1.287958	0.216466
3	-0.859700	-0.735332	0.437529
4	0.395431	1.002545	-0.999019
5	-0.474449	-0.106919	-2.089148
6	0.375528	0.173526	-1.103849
7	-1.468913	1.206882	-0.983365
8	-0.751145	-0.289252	-0.346848
9	0.923659	1.038505	-1.394882

Write a Pandas program to find the row for where the value of a given column is maximum.

df45 = pd.DataFrame({'col1': [1, 2, 3, 4, 7], 'col2': [4, 5, 6, 9, 5], 'col3': [7, 8, 12, 1, 11]})

df45

	col1	col2	col3
0	1	4	7
1	2	5	8
2	3	6	12
3	4	9	1
4	7	5	11

#Solution 1 (not efficient)
df45[df45.col1==df45.col1.max()].index[0]

df45['col1'].idxmax()

df45['col2'].idxmax()

df45['col3'].idxmax()

Write a Pandas program to check whether a given column is present in a DataFrame or not.

col='Col4'
if col in df45.columns.tolist():
    print(col,'present in DataFrame')
else:
    print(col,'not present in DataFrame')

Col4 not present in DataFrame

Write a Pandas program to get the specified row value of a given DataFrame.

df45.iloc[0]

col1    1
col2    4
col3    7
Name: 0, dtype: int64

Write a Pandas program to get the datatypes of columns of a DataFrame.

df2.dtypes

attempts      int64
name         object
qualify      object
score       float64
dtype: object

Write a Pandas program to append data to an empty DataFrame.

df49=pd.DataFrame()

d49=pd.DataFrame({'col1':range(3),'col2':range(3)})

df49.append(d49)

	col1	col2
0	0	0
1	1	1
2	2	2

Write a Pandas program to sort a given DataFrame by two or more columns.

df50=df2.copy()

df50.sort_values(['attempts','name'],ascending=[True,False])

	attempts	name	qualify	score
g	1	Matthew	yes	14.5
h	1	Laura	no	NaN
j	1	Jonas	yes	19.0
a	1	Anastasia	yes	12.5
i	2	Kevin	no	8.0
c	2	Katherine	yes	16.5
e	2	Emily	no	9.0
f	3	Michael	yes	20.0
d	3	James	no	NaN
b	3	Dima	no	9.0

Write a Pandas program to convert the datatype of a given column (floats to ints).

df51=df2.copy()

df51.dtypes

attempts      int64
name         object
qualify      object
score       float64
dtype: object

df51.fillna(0).astype({'score':int}).dtypes

attempts     int64
name        object
qualify     object
score        int32
dtype: object

Write a Pandas program to remove infinite values from a given DataFrame.

df52 = pd.DataFrame([1000, 2000, 3000, -4000, np.inf, -np.inf])

df52

	0
0	1000.0
1	2000.0
2	3000.0
3	-4000.0
4	inf
5	-inf

df52.replace([np.inf,-np.inf],np.NaN)

	0
0	1000.0
1	2000.0
2	3000.0
3	-4000.0
4	NaN
5	NaN

Write a Pandas program to insert a given column at a specific column index in a DataFrame.

df53 = pd.DataFrame({'col2': [4, 5, 6, 9, 5], 'col3': [7, 8, 12, 1, 11]})

df53

	col2	col3
0	4	7
1	5	8
2	6	12
3	9	1
4	5	11

col1= [1, 2, 3, 4, 7]

idx=0

df53.insert(loc=idx,column='col1',value=col1)

df53

	col1	col2	col3
0	1	4	7
1	2	5	8
2	3	6	12
3	4	9	1
4	7	5	11

Write a Pandas program to convert a given list of lists into a Dataframe.

d54=[[2, 4], [1, 3]]

pd.DataFrame(d54,columns=['col1','col2'])

	col1	col2
0	2	4
1	1	3

Write a Pandas program to group by the first column and get second column as lists in rows.

df55=pd.DataFrame({'col1':['C1','C1','C2','C2','C2','C3','C2'],'col2':[1,2,3,3,4,6,5]})

g55=df55.groupby('col1')['col2']

#Wrong Solution (or not gettring desired result)
for ele in g55:
    print(ele[1])

  1
  2
Name: col2, dtype: int64
  3
  3
  4
  5
Name: col2, dtype: int64
  6
Name: col2, dtype: int64

#Correct Solution
df55.groupby('col1')['col2'].apply(list)

col1
C1          [1, 2]
C2    [3, 3, 4, 5]
C3             [6]
Name: col2, dtype: object

Write a Pandas program to get column index from column name of a given DataFrame.

df45.columns.tolist().index('col2')

df45.columns.get_loc('col2')

Write a Pandas program to count number of columns of a DataFrame.

df45.shape[1]

len(df45.columns)

Write a Pandas program to select all columns, except one given column in a DataFrame.

#Solution 1
df45.drop('col3',axis=1)

	col1	col2
0	1	4
1	2	5
2	3	6
3	4	9
4	7	5

#Solution 2
df45.loc[:,df45.columns!='col3']

	col1	col2
0	1	4
1	2	5
2	3	6
3	4	9
4	7	5

Write a Pandas program to get first n records of a DataFrame.

df45.head(3)

	col1	col2	col3
0	1	4	7
1	2	5	8
2	3	6	12

Write a Pandas program to get last n records of a DataFrame.

df45.iloc[-3:,:]

	col1	col2	col3
2	3	6	12
3	4	9	1
4	7	5	11

df45.tail(3)

	col1	col2	col3
2	3	6	12
3	4	9	1
4	7	5	11

Write a Pandas program to get topmost n records within each group of a DataFrame.

df61 = pd.DataFrame({'col1': [1, 2, 3, 4, 7, 11], 'col2': [4, 5, 6, 9, 5, 0], 'col3': [7, 5, 8, 12, 1,11]})

df61

	col1	col2	col3
0	1	4	7
1	2	5	5
2	3	6	8
3	4	9	12
4	7	5	1
5	11	0	11

df61.nlargest(3,'col1')

	col1	col2	col3
5	11	0	11
4	7	5	1
3	4	9	12

df61.nlargest(3,'col2')

	col1	col2	col3
3	4	9	12
2	3	6	8
1	2	5	5

df61.nlargest(3,'col3')

	col1	col2	col3
3	4	9	12
5	11	0	11
2	3	6	8

Write a Pandas program to remove first n rows of a given DataFrame.

df62=df61.copy()

df62

	col1	col2	col3
0	1	4	7
1	2	5	5
2	3	6	8
3	4	9	12
4	7	5	1
5	11	0	11

df62=df62.tail(-3)

df62

	col1	col2	col3
3	4	9	12
4	7	5	1
5	11	0	11

Write a Pandas program to remove last n rows of a given DataFrame.

df63=df61.copy()

df63=df63.iloc[:-3,:]

df63

	col1	col2	col3
0	1	4	7
1	2	5	5
2	3	6	8

Write a Pandas program to add a prefix or suffix to all columns of a given DataFrame.

df64 = pd.DataFrame({'W':[68,75,86,80,66],'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]});

df64_1=df64.copy()

df64_1.columns='A_'+df64.columns

df64_1

	A_W	A_X	A_Y	A_Z
0	68	78	84	86
1	75	85	94	97
2	86	96	89	96
3	80	80	83	72
4	66	86	86	83

df64_2=df64.copy()

df64_2.columns=df64.columns+'_1'

df64_2

	W_1	X_1	Y_1	Z_1
0	68	78	84	86
1	75	85	94	97
2	86	96	89	96
3	80	80	83	72
4	66	86	86	83

#sol2
df64.add_prefix('A_')

	A_W	A_X	A_Y	A_Z
0	68	78	84	86
1	75	85	94	97
2	86	96	89	96
3	80	80	83	72
4	66	86	86	83

df64.add_suffix('_1')

	W_1	X_1	Y_1	Z_1
0	68	78	84	86
1	75	85	94	97
2	86	96	89	96
3	80	80	83	72
4	66	86	86	83

Write a Pandas program to reverse order (rows, columns) of a given DataFrame.

df64[df64.columns[::-1]]
#df64.iloc[:,::-1] same answer

	Z	Y	X	W
0	86	84	78	68
1	97	94	85	75
2	96	89	96	86
3	72	83	80	80
4	83	86	86	66

df64.iloc[::-1]

	W	X	Y	Z
4	66	86	86	83
3	80	80	83	72
2	86	96	89	96
1	75	85	94	97
0	68	78	84	86

print("Reverse row order and reset index:")
df64.iloc[::-1].reset_index(drop=True)

Reverse row order and reset index:

	W	X	Y	Z
0	66	86	86	83
1	80	80	83	72
2	86	96	89	96
3	75	85	94	97
4	68	78	84	86

Write a Pandas program to select columns by data type of a given DataFrame.

df66 = pd.DataFrame({
    'name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Syed Wharton'],
    'date_of_birth': ['17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
    'age': [18.5, 21.2, 22.5, 22, 23]
})

df66

	name	date_of_birth	age
0	Alberto Franco	17/05/2002	18.5
1	Gino Mcneill	16/02/1999	21.2
2	Ryan Parkes	25/09/1998	22.5
3	Eesha Hinton	11/05/2002	22.0
4	Syed Wharton	15/09/1997	23.0

df66.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
name             5 non-null object
date_of_birth    5 non-null object
age              5 non-null float64
dtypes: float64(1), object(2)
memory usage: 248.0+ bytes

df66.select_dtypes(include='number')

	age
0	18.5
1	21.2
2	22.5
3	22.0
4	23.0

df66.select_dtypes(include='object')

	name	date_of_birth
0	Alberto Franco	17/05/2002
1	Gino Mcneill	16/02/1999
2	Ryan Parkes	25/09/1998
3	Eesha Hinton	11/05/2002
4	Syed Wharton	15/09/1997

Write a Pandas program to split a given DataFrame into two random subsets.

df67=df66.copy()

df67_1=df67.sample(frac=0.7,random_state=0)

df67_2=df67.drop(df67_1.index)

df67_1

	name	date_of_birth	age
2	Ryan Parkes	25/09/1998	22.5
0	Alberto Franco	17/05/2002	18.5
1	Gino Mcneill	16/02/1999	21.2
3	Eesha Hinton	11/05/2002	22.0

df67_2

	name	date_of_birth	age
4	Syed Wharton	15/09/1997	23.0

Write a Pandas program to rename all columns with the same pattern of a given DataFrame.

df68 = pd.DataFrame({
    'Name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Syed Wharton'],
    'Date_of_Birth': ['17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
    'Age': [18.5, 21.2, 22.5, 22, 23]
})

df68

	Name	Date_of_Birth	Age
0	Alberto Franco	17/05/2002	18.5
1	Gino Mcneill	16/02/1999	21.2
2	Ryan Parkes	25/09/1998	22.5
3	Eesha Hinton	11/05/2002	22.0
4	Syed Wharton	15/09/1997	23.0

df68.columns=df68.columns.str.lower().str.strip()

df68

	name	date_of_birth	age
0	Alberto Franco	17/05/2002	18.5
1	Gino Mcneill	16/02/1999	21.2
2	Ryan Parkes	25/09/1998	22.5
3	Eesha Hinton	11/05/2002	22.0
4	Syed Wharton	15/09/1997	23.0

Write a Pandas program to merge datasets and check uniqueness.

df69 = pd.DataFrame({
    'Name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Syed Wharton'],
    'Date_Of_Birth ': ['17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
    'Age': [18.5, 21.2, 22.5, 22, 23]
})

df69_1=df69.copy()

df69_1=df69_1.drop([0,1])

df69_1

	Name	Date_Of_Birth	Age
2	Ryan Parkes	25/09/1998	22.5
3	Eesha Hinton	11/05/2002	22.0
4	Syed Wharton	15/09/1997	23.0

df69_2=df69.copy()

df69_2=df69_2.drop([2])

df69_2

	Name	Date_Of_Birth	Age
0	Alberto Franco	17/05/2002	18.5
1	Gino Mcneill	16/02/1999	21.2
3	Eesha Hinton	11/05/2002	22.0
4	Syed Wharton	15/09/1997	23.0

#one_to_one
pd.merge(df69_1,df69_2,validate='one_to_one')

	Name	Date_Of_Birth	Age
0	Eesha Hinton	11/05/2002	22.0
1	Syed Wharton	15/09/1997	23.0

#one_to_many
pd.merge(df69_1,df69_2,validate='one_to_many')

	Name	Date_Of_Birth	Age
0	Eesha Hinton	11/05/2002	22.0
1	Syed Wharton	15/09/1997	23.0

#many_to_one
pd.merge(df69_1,df69_2,validate='many_to_one')

	Name	Date_Of_Birth	Age
0	Eesha Hinton	11/05/2002	22.0
1	Syed Wharton	15/09/1997	23.0

#many_to_many
pd.merge(df69_1,df69_2,validate='many_to_many')

	Name	Date_Of_Birth	Age
0	Eesha Hinton	11/05/2002	22.0
1	Syed Wharton	15/09/1997	23.0

Write a Pandas program to convert continuous values of a column in a given DataFrame to categorical.

df70 = pd.DataFrame({
    'name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Syed Wharton', 'Kierra Gentry'],
      'age': [18, 22, 85, 50, 80, 5]
})

df70

	name	age
0	Alberto Franco	18
1	Gino Mcneill	22
2	Ryan Parkes	85
3	Eesha Hinton	50
4	Syed Wharton	80
5	Kierra Gentry	5

df70['age_group']=pd.cut(df70.age,bins=[0,18,65,95],labels=['kids','adult','elderly'])

df70

	name	age	age_group
0	Alberto Franco	18	kids
1	Gino Mcneill	22	adult
2	Ryan Parkes	85	elderly
3	Eesha Hinton	50	adult
4	Syed Wharton	80	elderly
5	Kierra Gentry	5	kids

Write a Pandas program to display memory usage of a given DataFrame and every column of the DataFrame.

df70.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
name         6 non-null object
age          6 non-null int64
age_group    6 non-null category
dtypes: category(1), int64(1), object(1)
memory usage: 334.0+ bytes

df70.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
name         6 non-null object
age          6 non-null int64
age_group    6 non-null category
dtypes: category(1), int64(1), object(1)
memory usage: 903.0 bytes

df70.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
name         6 non-null object
age          6 non-null int64
age_group    6 non-null category
dtypes: category(1), int64(1), object(1)
memory usage: 903.0 bytes

df70.memory_usage(deep=True)

Index        128
name         416
age           48
age_group    311
dtype: int64

df70.memory_usage(deep=False)

Index        128
name          48
age           48
age_group    110
dtype: int64

Write a Pandas program to combine many given series to create a DataFrame.

s72_1=pd.Series('php python jawa c# c++'.split())

s72_2=pd.Series([1,2,3,4,5])

pd.concat([s72_1,s72_2],axis=1)

	0	1
0	php	1
1	python	2
2	jawa	3
3	c#	4
4	c++	5

pd.DataFrame({'col1':s72_1,'col2':s72_2})

	col1	col2
0	php	1
1	python	2
2	jawa	3
3	c#	4
4	c++	5

Write a Pandas program to create DataFrames that contains random values, contains missing values, contains datetime values and contains mixed values.

pd.DataFrame(np.random.randn(40).reshape(10,4),columns='A B C D'.split())

	A	B	C	D
0	0.976226	-1.231195	-1.927390	1.687527
1	0.265455	-0.872881	-1.234580	0.025491
2	2.309553	-0.368772	1.281644	0.903505
3	-0.490682	-0.934150	0.395868	-0.614944
4	1.270408	0.105810	-0.934098	-0.193174
5	-0.412797	0.123316	0.656035	-0.801055
6	-0.627806	-0.182581	0.200891	-0.471121
7	-0.705074	0.416434	0.424676	0.495411
8	-0.157094	-1.339826	1.379757	1.843449
9	1.265529	-0.211037	0.542715	1.862077

pd.util.testing.makeDataFrame()

	A	B	C	D
mIrUvYIpau	-0.132329	0.369235	-2.518745	0.463750
IIuIlUG3TZ	0.224135	1.173692	-1.402806	-0.634084
lhz7LziqT6	-1.747848	1.112314	-1.173867	-1.275242
bDRGRAVZ5L	-1.125598	1.167309	0.119561	-0.315563
tltm1jawR2	-0.170738	-0.020117	0.807885	-1.399174
gyHU2hW6dO	0.228735	-0.685193	-0.241533	-1.230713
GH6l1o4kp8	-0.683374	-1.373173	1.563031	-1.344920
Ee15AMPrrZ	-0.229806	-1.190730	0.691445	-0.614836
CVtTzdhYOH	-1.806060	0.126193	1.921333	0.655736
u7yGpt7ojr	0.418048	-0.560950	-2.247065	2.067856
JQRGGsJRHG	-1.116554	0.614687	-0.367641	0.160849
6NTUAVqlrk	0.622822	0.119354	-1.052081	-0.347517
8XKVbhaoCO	1.380594	-0.907565	-1.298444	0.627115
HSlwYeeFDf	0.566009	-0.185294	0.437134	-1.943550
441fk4pWhD	0.214447	-1.183068	1.761380	-0.652720
VdiWlWttWB	0.743793	1.722755	0.757516	0.615187
ls77KaEvOq	-1.344870	0.364236	0.093453	-0.609153
5K0bPzfMFi	-1.065251	0.841662	1.479194	-0.731203
AXsuHNuwHo	-0.827250	1.965530	0.773863	0.665516
T0GlToPXZi	-0.320390	-0.728882	2.018289	-0.578019
22vFAApOA5	-0.494500	0.550730	-1.067188	0.044000
gPiu7GA3im	-0.521648	-0.856955	0.709654	-0.932031
OZnBwP9gm9	-0.181220	-1.985956	0.794047	-0.335244
9U23Nv30l1	0.126521	-0.269881	-0.475025	0.873421
NLLYWOAAPo	-1.399080	-0.614755	-0.431522	-1.314192
0snfJwr8Zi	1.104287	-0.747049	-0.494870	-0.280476
hyIRHRTG6c	0.447759	0.375540	-1.348494	-1.576017
nO4Ybp7ia5	-0.665945	1.525348	0.291121	0.763574
OU81k9dvGD	0.897210	0.587101	1.375282	-0.842839
ktS4vr3Pqy	-0.270603	-0.081540	-1.243459	0.965349

pd.util.testing.makeMissingDataframe()

	A	B	C	D
K1jtgA2gtF	-1.142478	1.478004	0.777470	0.517304
Fz0OYcxMQn	-0.096315	0.070175	0.176539	-0.511371
7gLTJCMX3w	-0.236882	0.453078	-0.588427	1.412213
Id0qucUCeB	NaN	-1.088763	-0.637960	1.000012
MlJfZSTHJj	-0.732373	1.112463	0.441055	0.148622
p74iwf5CWk	0.880165	NaN	0.319050	-0.100719
k4wEIqvtKd	-1.076164	-1.052221	NaN	0.817084
6gdeN8A4gg	1.160957	0.109959	0.587556	1.699185
c0GHnKP4vD	-1.127555	0.933663	0.527032	-0.450704
0yltZq25JS	-0.177030	0.225647	0.413793	1.491172
BAGqtJef0S	0.356970	0.621645	-1.182119	-0.008877
iOriKpo3am	0.734046	-0.004511	0.921544	0.719030
Ogd3TaGzC0	-0.076480	-1.460346	0.026264	-0.775999
XZp591LpQ7	-0.998655	NaN	1.518520	0.453103
nvd2jdiBX5	1.592946	-1.306997	-0.168547	NaN
ICJxEvHfx4	-0.577977	0.387412	-0.594324	0.294138
mLyqvY1KVk	0.901325	-1.560872	0.160145	-1.170798
23TmyM4Vjt	0.291432	-0.810652	-0.184815	0.211070
J2JBz9nc2N	1.818592	NaN	NaN	-0.257770
77utwjDcEg	NaN	0.183574	0.111528	1.381484
yAg15dn3KU	-0.259229	-0.201913	-1.912119	-0.271751
yRbk019rAz	-0.913102	-0.120397	-0.721780	0.725017
U5oGxD3WnW	-0.565617	0.579980	NaN	0.031062
sNGtSOERD7	NaN	0.074556	1.525357	-1.744255
ubpFxokUim	1.045186	1.165908	-1.138931	-0.082937
b6ZCgA1dZD	1.620180	0.962883	0.966021	1.709217
3lYiadTEpV	-0.147411	0.950925	-0.122691	-0.973400
qvn2AUqCgF	1.207141	0.611017	-0.733806	0.072065
dqO35ESFKx	2.076349	0.519654	NaN	0.217811
am4HPhiP5I	-1.403172	NaN	0.797241	-0.641350

pd.util.testing.makeTimeDataFrame()

	A	B	C	D
2000-01-03	1.003781	-0.356442	-0.690759	-2.543469
2000-01-04	0.307821	-0.425427	-0.134576	0.824121
2000-01-05	-0.976996	-1.113531	-0.522290	-0.567436
2000-01-06	0.438595	0.129363	-0.433556	1.080008
2000-01-07	-0.001516	-0.291726	-1.008141	-0.605705
2000-01-10	0.015357	-1.016492	-0.162590	-2.282291
2000-01-11	0.523147	0.587595	0.451137	-0.031271
2000-01-12	0.917431	1.109996	-0.561045	0.305243
2000-01-13	1.150708	-0.069645	-0.561491	-0.165408
2000-01-14	1.122621	-1.035067	1.201935	-0.690251
2000-01-17	0.813662	0.924708	0.545710	-0.002078
2000-01-18	1.379715	-0.229399	-0.726494	0.493481
2000-01-19	0.622288	-0.757303	0.689828	0.229719
2000-01-20	-0.680374	1.011196	-0.551390	-1.713440
2000-01-21	-1.524046	-1.010875	-0.997332	0.666093
2000-01-24	-0.941409	1.025098	-0.905963	1.082745
2000-01-25	0.042881	1.382499	1.049391	1.074299
2000-01-26	0.967413	-0.923113	2.007157	0.628600
2000-01-27	-0.051459	1.172377	0.464885	1.129185
2000-01-28	0.533879	-1.320932	0.503104	-0.695140
2000-01-31	-1.133007	0.437374	0.505606	-1.571305
2000-02-01	0.979238	-0.721377	0.234655	-0.061103
2000-02-02	0.231642	-0.627750	-0.133623	0.748030
2000-02-03	-1.431866	1.060292	-0.040169	0.229246
2000-02-04	-0.350627	-1.165422	0.436717	0.446511
2000-02-07	-1.886733	0.926730	0.604955	-1.034671
2000-02-08	1.205402	1.982943	-0.653149	-0.675256
2000-02-09	0.212245	-0.374800	-0.472396	1.577047
2000-02-10	-0.732806	-1.332025	1.173203	0.602202
2000-02-11	-0.681866	-0.170969	-1.366112	0.779899

pd.util.testing.makeMixedDataFrame()

	A	B	C	D
0	0.0	0.0	foo1	2009-01-01
1	1.0	1.0	foo2	2009-01-02
2	2.0	0.0	foo3	2009-01-05
3	3.0	1.0	foo4	2009-01-06
4	4.0	0.0	foo5	2009-01-07

Write a Pandas program to fill missing values in time series data.

s74 = {"c1":[120, 130 ,140, 150, np.nan, 170], "c2":[7, np.nan, 10, np.nan, 5.5, 16.5]}

pd.util.testing.makeDateIndex()[:6]

DatetimeIndex(['2000-01-03', '2000-01-04', '2000-01-05', '2000-01-06', '2000-01-07', '2000-01-10'], dtype='datetime64[ns]', freq='B')

df74=pd.DataFrame(s74,index=pd.util.testing.makeDateIndex()[:6])

df74

	c1	c2
2000-01-03	120.0	7.0
2000-01-04	130.0	NaN
2000-01-05	140.0	10.0
2000-01-06	150.0	NaN
2000-01-07	NaN	5.5
2000-01-10	170.0	16.5

df74.interpolate()

	c1	c2
2000-01-03	120.0	7.00
2000-01-04	130.0	8.50
2000-01-05	140.0	10.00
2000-01-06	150.0	7.75
2000-01-07	160.0	5.50
2000-01-10	170.0	16.50

Write a Pandas program to use a local variable within a query.

df75 = pd.DataFrame({'W':[68,75,86,80,66],'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]});

df75

	W	X	Y	Z
0	68	78	84	86
1	75	85	94	97
2	86	96	89	96
3	80	80	83	72
4	66	86	86	83

maxx = df75["W"].max()

df75.query("W < @maxx")

	W	X	Y	Z
0	68	78	84	86
1	75	85	94	97
3	80	80	83	72
4	66	86	86	83

#Solution 2 (without Query)
df75[df75.W<df75.W.max()]

	W	X	Y	Z
0	68	78	84	86
1	75	85	94	97
3	80	80	83	72
4	66	86	86	83

Write a Pandas program to clean object column with mixed data of a given DataFrame using regular expression.

df76 = pd.DataFrame({"agent": ["a001", "a002", "a003", "a003", "a004"], "purchase":[4500.00, 7500.00, "$3000.25", "$1250.35", "9000.00"]})

df76

	agent	purchase
0	a001	4500
1	a002	7500
2	a003	$3000.25
3	a003	$1250.35
4	a004	9000.00

df76.purchase.apply(type)

  <class 'float'>
  <class 'float'>
    <class 'str'>
    <class 'str'>
    <class 'str'>
Name: purchase, dtype: object

df76.purchase.replace("[$]","",regex=True).astype('float').apply(type)

  <class 'float'>
  <class 'float'>
  <class 'float'>
  <class 'float'>
  <class 'float'>
Name: purchase, dtype: object

Write a Pandas program to get the numeric representation of an array by identifying distinct values of a given column of a dataframe.

df77 = pd.DataFrame({
    'Name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Gino Mcneill'],
    'Date_Of_Birth ': ['17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
    'Age': [18.5, 21.2, 22.5, 22, 23]
})

df77

	Name	Date_Of_Birth	Age
0	Alberto Franco	17/05/2002	18.5
1	Gino Mcneill	16/02/1999	21.2
2	Ryan Parkes	25/09/1998	22.5
3	Eesha Hinton	11/05/2002	22.0
4	Gino Mcneill	15/09/1997	23.0

a77_1=df77.Name.unique()

d77={}
for ind,name in df77.iterrows():
    print(ind,name[0])
    if name[0] not in d77.keys():
        d77[name[0]]=ind

Alberto Franco
Gino Mcneill
Ryan Parkes
Eesha Hinton
Gino Mcneill

d77

{'Alberto Franco': 0, 'Gino Mcneill': 1, 'Ryan Parkes': 2, 'Eesha Hinton': 3}

df77.replace({'Name':d77})

	Name	Date_Of_Birth	Age
0	0	17/05/2002	18.5
1	1	16/02/1999	21.2
2	2	25/09/1998	22.5
3	3	11/05/2002	22.0
4	1	15/09/1997	23.0

#solution 1
df77['Name'].map(d77).tolist()

[0, 1, 2, 3, 1]

#Solution 2
df77.replace({'Name':d77}).Name.tolist()

[0, 1, 2, 3, 1]

#one step Solution
pd.factorize(df77.Name)

(array([0, 1, 2, 3, 1], dtype=int64),
 Index(['Alberto Franco', 'Gino Mcneill', 'Ryan Parkes', 'Eesha Hinton'], dtype='object'))

Write a Pandas program to replace the current value in a dataframe column based on last largest value. If the current value is less than last largest value replaces the value with 0.

df78=pd.DataFrame({'rnum':[23, 21, 27, 22, 34, 33, 34, 31, 25, 22, 34, 19, 31, 32, 19]})

df78

	rnum
0	23
1	21
2	27
3	22
4	34
5	33
6	34
7	31
8	25
9	22
10	34
11	19
12	31
13	32
14	19

a78_1=[-np.inf]
for ele in df78.rnum:
    if ele<np.amax(a78_1):
        a78_1.append(0)
    else:
        a78_1.append(ele)
    #print(ele)
print(a78_1[1:])

[23, 0, 27, 0, 34, 0, 34, 0, 0, 0, 34, 0, 0, 0, 0]

#Solution 2
df78.rnum.where(df78.rnum.eq(df78.rnum.cummax()),0)

   23
    0
   27
    0
   34
    0
   34
    0
    0
    0
  34
   0
   0
   0
   0
Name: rnum, dtype: int64

Write a Pandas program to create a DataFrame from the clipboard (data from an Excel spreadsheet or a Google Sheet).

df79=pd.read_clipboard()

df79

	Python	Pandas	DataFrame

Write a Pandas program to check for inequality of two given DataFrames.

df80_1 = pd.DataFrame({'W':[68,75,86,80,None],'X':[78,85,None,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]});
df80_2 = pd.DataFrame({'W':[78,75,86,80,None],'X':[78,85,96,80,76], 'Y':[84,84,89,83,86],'Z':[86,97,96,72,83]});

df80_1!=df80_2
#~(df80_1==df80_2)
#df80_1.ne(df80_2)

	W	X	Y	Z
0	True	False	False	False
1	False	False	True	False
2	False	True	False	False
3	False	False	False	False
4	True	True	False	False

Write a Pandas program to get lowest n records within each group of a given DataFrame.

df81 = pd.DataFrame({'col1': [1, 2, 3, 4, 7, 11], 'col2': [4, 5, 6, 9, 5, 0], 'col3': [7, 5, 8, 12, 1,11]})

df81.nsmallest(3,'col1')

	col1	col2	col3
0	1	4	7
1	2	5	5
2	3	6	8

df81.nsmallest(3,'col2')

	col1	col2	col3
5	11	0	11
0	1	4	7
1	2	5	5

df81.nsmallest(3,'col3')

	col1	col2	col3
4	7	5	1
1	2	5	5
0	1	4	7

#df81.nsmallest(3,columns=['col2','col1']) # It is taking first passed column into account even if we pass multiple columns

Share on

Twitter Facebook Google+ LinkedIn

Rohit Kumar

Python Pandas - DataFrame

Exercise - Python Pandas DataFrame

Share on

You May Also Enjoy

Python Pandas - String and Regular Expression

Python Pandas - Joining and merging DataFrame

Feature Engineering 7 - Outliers and Its impact on ML usecases

Feature Engineering 6 - Handling Imbalanced Dataset