라벨이 pandas인 게시물 표시

Efficient way of collecting sum of missing values per row in Pandas

이미지
While doing project assigned from the Udacity Nanodegree program I'm currently attending, I had to collect the number of null values in each row and display it in the histogram. However, the Pandas dataset contained 891221 rows, which I had to wait quite a long time to iterate through the rows using the following code: df . apply( lambda row: sum_of_nulls_in_row(row), axis = 1 ) Although it was suggested in this post that using apply() is much faster than using iterrow(), it was still too slow to finish the project efficiently. After several search, I found this discussion. In Icyblade's answer, he mentioned this: "When using pandas, try to avoid performing operations in a loop, including apply, map, applymapetc. That's slow!" Icyblade's suggestion was to use following code: df . isnull() . sum(axis = 1 ) I've applied it into my code, and Boom! It worked like a charm. Long waiting was eliminated and the result was there in a blink ....

Frequently used Pandas commands - Lambda function

이미지
After finishing the first assignment of Udacity Data Analyst Nanodegree , I decided to summarize and record most commonly used Pandas commands here. The first topic I would like to post is the lambda function. Lambda function in Pandas can be used via apply() command something like below: df = pd . DataFrame({ 'A' :[ 1 , 2 ], 'B' :[ 3 , 4 ]}) df . apply( lambda x : x + 1 ) Above code adds 1 to all the data resides in the DataFrame named 'df'. The result looks like below: __ | A B 0 | 2 4 1 | 3 5 Lambda function can be applied to a single column using below code: df[ 'A' ] . apply( lambda x : x + 1 ) But the result only shows the index and values without the column name like below: 0 | 2 1 | 3 To include the column name, following code can be used: df . apply({ 'A' : lambda x : x + 1 }) And the result will look like this: __ | A 0 | 2 1 | 3 This lambda function can be us...