Let's take a look at a few different ways to filter and select rows in a pandas dataframe based on multiple conditions.
To start we're going to create a simple dataframe in python:
Let's look at six different ways to filter rows in a dataframe based on multiple conditions:
Get all rows having hourly wage greater than or equal to 100 and age < 60 and favorite football team name starts with ‘S’.
The loc function in pandas can be used to access groups of rows or columns by label.
Add each condition you want to be included in the filtered result and concatenate them with the & operator. You'll see our code sample will return a pd.dataframe of our filtered rows. Don't forget to include "import pandas as pd" at the top!
The Numpy python library interacts great with dataframes, especially when dealing with indexing.
Let's use the numpy.where function with a few conditions to produce the same result we wanted above.
Let's take the returned row index list that matches the combined conditions into loc
We can use the pandas dataframe function query() and boolean expressions to get our filtered rows back. The main benefit of the query function is it uses numexpr which improves efficiency, especially in larger dataframes.
This is a good method to go with if you want to remove columns as well, as you can exclude any dataframe columns you don't want in the last statement. Boolean indexing is also very efficient as it does not make a copy of the data.
We can use the pandas dataframe function eval inside a df[] tag to filter on these conditions. Note the .values() at the end.
This numpy array and functools implementation allows you to use disjunction to filter your dataframe. Here's an example:
Note: you still need "import pandas as pd"
Post pandas .22 update, there's multiple functions you can use as well to compare column values to conditions. Make sure your dtype is the same as what you want to compare to.