There is a potential source of confusion when using loc
for a Series
or DataFrame
with an integer index: it is important to remember that loc
always refers to the index labels
whereas iloc
takes a (zero-based) integer location index:
In [x]: df = pd.DataFrame(np.arange(12).reshape(4, 3) + 10,
index=[1, 2, 3, 4], columns=list('abc'))
In [x]: df
Out[x]:
a b c
1 10 11 12
2 13 14 15
3 16 17 18
4 19 20 21
In [x]: df.loc[1] # the row with index *label* 1 (the first row)
Out[x]:
a 10
b 11
c 12
Name: 1, dtype: int64
In [x]: df.iloc[1] # the row with index *location* 1 (the row labeled 2)
a 13
b 14
c 15
Name: 2, dtype: int64
Note also that index labels do not have to be unique:
In [x]: df.index = [1, 2, 2, 3] # change the index labels
In [x]: df
Out[x]:
a b c
1 10 11 12
2 13 14 15
2 16 17 18
3 19 20 21
In [x]: df.loc[2] # a DataFrame: all rows labeled 2
Out[x]:
a b c
2 13 14 15
2 16 17 18
In [x]: df.iloc[2] # a Series: there is only one row located at index 2
Out[x]:
a 16
b 17
c 18
Name: 2, dtype: int64