Python Data Science Toolbox (Part 1)
Bringing it all together (1)
Recall the Bringing it all together exercise in the previous chapter where you did a simple Twitter analysis by developing a function that counts how many tweets are in certain languages. The output of your function was a dictionary that had the language as the keys and the counts of tweets in that language as the value.
In this exercise, we will generalize the Twitter language analysis that you did in the previous chapter. You will do that by including a default argument that takes a column name.
For your convenience, pandas has been imported as pd and the ‘tweets.csv’ file has been imported into the DataFrame tweets_df. Parts of the code from your previous work are also provided.
# Define count_entries()
def count_entries(df, col_name='lang'):
"""Return a dictionary with counts of
occurrences as value for each key."""
# Initialize an empty dictionary: cols_count
cols_count = {}
# Extract column from DataFrame: col
col = df[col_name]
# Iterate over the column in DataFrame
for entry in col:
# If entry is in cols_count, add 1
if entry in cols_count.keys():
cols_count[entry] += 1
# Else add the entry to cols_count, set the value to 1
else:
cols_count[entry] = 1
# Return the cols_count dictionary
return cols_count
# Call count_entries(): result1
result1 = count_entries(tweets_df, col_name='lang')
# Call count_entries(): result2
result2 = count_entries(tweets_df, col_name='source')
# Print result1 and result2
print(result1)
print(result2)
Bringing it all together (2)
Wow, you’ve just generalized your Twitter language analysis that you did in the previous chapter to include a default argument for the column name. You’re now going to generalize this function one step further by allowing the user to pass it a flexible argument, that is, in this case, as many column names as the user would like!
Once again, for your convenience, pandas has been imported as pd and the ‘tweets.csv’ file has been imported into the DataFrame tweets_df. Parts of the code from your previous work are also provided.
# Define count_entries()
def count_entries(df, *args):
"""Return a dictionary with counts of
occurrences as value for each key."""
#Initialize an empty dictionary: cols_count
cols_count = {}
# Iterate over column names in args
for col_name in args:
# Extract column from DataFrame: col
col = df[col_name]
# Iterate over the column in DataFrame
for entry in col:
# If entry is in cols_count, add 1
if entry in cols_count.keys():
cols_count[entry] += 1
# Else add the entry to cols_count, set the value to 1
else:
cols_count[entry] = 1
# Return the cols_count dictionary
return cols_count
# Call count_entries(): result1
result1 = count_entries(tweets_df, 'lang')
# Call count_entries(): result2
result2 = count_entries(tweets_df, 'lang', 'source')
# Print result1 and result2
print(result1)
print(result2)
{'en': 97, 'et': 1, 'und': 2}
{'en': 97, 'et': 1, 'und': 2, '<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>': 24, '<a href="http://www.facebook.com/twitter" rel="nofollow">Facebook</a>': 1, '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>': 26, '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>': 33, '<a href="http://www.twitter.com" rel="nofollow">Twitter for BlackBerry</a>': 2, '<a href="http://www.google.com/" rel="nofollow">Google</a>': 2, '<a href="http://twitter.com/#!/download/ipad" rel="nofollow">Twitter for iPad</a>': 6, '<a href="http://linkis.com" rel="nofollow">Linkis.com</a>': 2, '<a href="http://rutracker.org/forum/viewforum.php?f=93" rel="nofollow">newzlasz</a>': 2, '<a href="http://ifttt.com" rel="nofollow">IFTTT</a>': 1, '<a href="http://www.myplume.com/" rel="nofollow">Plume\xa0for\xa0Android</a>': 1}
上一篇: UNIX系列之AIX克隆系统盘
下一篇: aix下面新用户修改密码错误的解决方法
推荐阅读
-
Python for Data Analysis v2 | Notes_ Chapter 1-2
-
Python数据分析实战-Boston Public Schools GEO数据分析-Part1
-
Python框架Django之学习记录--Part 1
-
开启Python取经之路-CLASS-6(Part 1)
-
Heat Map and Automatic Data Optimization : part-1
-
开启Python取经之路-CLASS-6(Part 1)
-
Heat Map and Automatic Data Optimization : part-1
-
Oracle X$ tables – Part 1 – Where do they get their data f
-
Intro to Python for Data Science Learning 6 - NumPy
-
Python for Data Science