Efficient way of collecting sum of missing values per row in Pandas
While doing project assigned from the Udacity Nanodegree program I'm currently attending, I had to collect the number of null values in each row and display it in the histogram. However, the Pandas dataset contained 891221 rows, which I had to wait quite a long time to iterate through the rows using the following code: df . apply( lambda row: sum_of_nulls_in_row(row), axis = 1 ) Although it was suggested in this post that using apply() is much faster than using iterrow(), it was still too slow to finish the project efficiently. After several search, I found this discussion. In Icyblade's answer, he mentioned this: "When using pandas, try to avoid performing operations in a loop, including apply, map, applymapetc. That's slow!" Icyblade's suggestion was to use following code: df . isnull() . sum(axis = 1 ) I've applied it into my code, and Boom! It worked like a charm. Long waiting was eliminated and the result was there in a blink ....