Many different data science and machine learning use cases care about element value frequency, so the ability to produce these values for 1 or multiple specific values is important. On top of this, we may want to use this information about how many times we can count the number or value to perform different operations that are similar to just counting a certain value in a column.
Always remember to do import pandas as pd to start. We'll define this easy dataframe with multiple columns and rows to begin.
Your Dataframe should look something like this!
Now lets take a look at the different ways to count a specific value in columns.
Using a staple pandas dataframe function, we can define the specific value we want to return the count for instead of the counts of all unique values in a column. You can remove the [] from the line to return all counts for all values. Let's get the values count for 77 in the 'Score' column for example.
You can use this with a list of values like all the unique values in a column to loop through and return just the count for each.
Here's a way to count the number of times a value in column 'Last' occurs in the pandas dataframe column using .shape. This is one of the faster ways to return the occurrences but does require you to define the column specifically instead of brackets and a string.
Here we're going to count occurrences of a value in the Score column using the length function and a mask. A mask is simply a conditional statement we use to shrink a dataframe to values only meeting the condition. We put this mask inside a dataframe bracket to apply it and use the python function len() to get the length of the new masked dataframe. We want to count the number of values that are == to 77, and we will get the length of the row count back.
We can use the sum() function on a specified column to count values equal to a set condition, in this case we use == to get just rows equal to our specific data point.
If we wanted to count specific values that match another boolean operation we can. Had we used an integer column and used > it would count values greater than our comparison case.
Let's say you have a list of multiple columns you want to look for a certain value in, but don't know what columns are in the list already. We want to count the occurrences of our value in each column and return a list. Check out this code snippet:
Empty Return will return default 0
Returns the count for the value for each column in the dataframe. Like we mentioned above you can use this same code with multiple boolean operations to change the found values in column "x".
Let's say we want to count values that contain a specific string or regex pattern in each element of our series. This can be really useful to find values that we want but don't exactly match a comparator with a boolean. For instance, if we wanted any element that contains the name john, we can use the pandas string count function for this.
As we can see it returns a list of dtype int64 elements that tell us the number of times the string "john" occurred in each element. We can take this returned data and count the number of values over 0 to get the total number of elements containing a pattern. Null values will be ignored.
In pandas, count occurrences of multiple values in a dataframe using the map function along with a lambda inside. This will remove any rows where the "score" column is not equal to 87 or 77.