欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

笔记:Python Data Science Toolbox (Part 1)

程序员文章站 2022-03-10 08:23:00
...

Functions

Define a function, shout(), which simply prints out a string with three exclamation marks ‘!!!’ at the end.
定义一个名为shout()的函数,在字符串后面加"!!!"。

def shout():
    """Print a string with three exclamation marks"""
    shout_word = 'congratulations' + '!!!' # 连接字符串:shout_word
    print(shout_word)
    
shout()

<script.py> output:
    congratulations!!!

In the previous exercise, you defined and called the function shout(), which printed out a string concatenated with ‘!!!’. You will now update shout() by adding a parameter so that it can accept and process any string argument passed to it.
在前面的练习中,您定义并调用了函数shout(),该函数打印出一个串接“!!!”的字符串。现在,您将通过添加一个参数来更新shout(),以便它可以接受和处理传递给它的任何字符串参数。

def shout(word):
    """Print a string with three exclamation marks"""
    shout_word = word + '!!!'
    print(shout_word)

shout('congratulations')

<script.py> output:
    congratulations!!!

Try your hand at another modification to the shout() function so that it now returns a single value instead of printing within the function. Recall that the return keyword lets you return values from functions.
尝试对shout()函数进行另一个修改,使它现在返回单个值,而不是在函数中打印。还记得吗,return关键字允许您从函数返回值。

Python 函数中,没有return语句时,默认 return一个 None 对象;多个return语句时,运行到第一个return语句即返回,不再执行其它代码。

def shout(word):
    """Return a string with three exclamation marks"""
    shout_word = word + '!!!'
    return shout_word

yell = shout('congratulations')
print(yell)

<script.py> output:
    congratulations!!!

You will modify shout() to accept two arguments.
将上面的函数修改为两个参数:

def shout(word1, word2):
    """Concatenate strings with three exclamation marks"""
    shout1 = word1 + '!!!'
    shout2 = word2 + '!!!'
    new_shout = shout1 + shout2
    return new_shout

yell = shout('congratulations', 'you')
print(yell)

<script.py> output:
    congratulations!!!you!!!

Here you will return multiple values from a function using tuples. Let’s now update our shout() function to return multiple values. Instead of returning just one string, we will return two strings with the string !!! concatenated to each.
现在让我们更新一下shout()函数以返回多个值。我们将返回两个字符串,而不是返回一个字符串。

def shout_all(word1, word2):
    shout1 = word1 + '!!!'
    shout2 = word2 + '!!!'
    shout_words = (shout1, shout2)
    return shout_words

yell1, yell2 = shout_all('congratulations', 'you')
print(yell1)
print(yell2)

<script.py> output:
    congratulations!!!
    you!!!

Bringing it all together 综合案例1

For this exercise, your goal is to recall how to load a dataset into a DataFrame. The dataset contains Twitter data and you will iterate over entries in a column to build a dictionary in which the keys are the names of languages and the values are the number of tweets in the given language. The file tweets.csv is available in your current directory.
对于本练习,您的目标是回忆如何将数据集加载到DataFrame中。

import pandas as pd
df = pd.read_csv('tweets.csv') # 将Twitter数据导入为DataFrame
langs_count = {} # 初始化一个空字典
col = df['lang'] # 从DataFrame中提取列
for entry in col: # 遍历DataFrame中的lang列
    if entry in langs_count.keys(): # 如果语言是langs_count,则添加1
        langs_count[entry] = langs_count[entry] + 1
    else: # 否则将语言添加到langs_count中,将值设置为1
        langs_count[entry] = 1

print(langs_count) # 打印字典

<script.py> output:
    {'en': 97, 'et': 1, 'und': 2}

In this exercise, you will define a function with the functionality you developed in the previous exercise, return the resulting dictionary from within the function, and call the function with the appropriate arguments.
在本练习中,您将使用在前一练习中开发的功能定义一个函数,从函数中返回结果字典,并使用适当的参数调用该函数。

import pandas as pd
tweets_df = pd.read_csv('tweets.csv')

def count_entries(df, col_name):
    """Return a dictionary with counts of 
    occurrences as value for each key."""
    langs_count = {}
    col = tweets_df[col_name] # 从DataFrame: col中提取列
    for entry in col:
        if entry in langs_count.keys():
            langs_count[entry] = langs_count[entry] + 1
        else:
            langs_count[entry] = 1
    return langs_count # 返回langs_count字典

result = count_entries(tweets_df, 'lang')
print(result)

<script.py> output:
    {'en': 97, 'et': 1, 'und': 2}

全局变量(global)

Let’s work more on your mastery of scope. In this exercise, you will use the keyword global within a function to alter the value of a variable defined in the global scope.
让我们进一步学习你对范围的掌握。在本练习中,您将在函数中使用关键字global来更改在全局作用域中定义的变量的值。

team = "teen titans"
def change_team():
    """Change the value of the global variable team."""
    global team # 全局变量
    team = 'justice league'

print(team)
change_team() # 调用change_team ()
print(team)

<script.py> output:
    teen titans
    justice league

嵌套函数

You’ve learned in the last video about nesting functions within functions. One reason why you’d like to do this is to avoid writing out the same computations within functions repeatedly.
编写一个嵌套函数,嵌套函数的作用是避免在函数中重复编写相同的计算。

def three_shouts(word1, word2, word3):
    """Returns a tuple of strings
    concatenated with '!!!'."""
    def inner(word): # 定义内置函数
        """Returns a string concatenated with '!!!'."""
        return word + '!!!'
    return (inner(word1), inner(word2), inner(word3))

print(three_shouts('a', 'b', 'c'))

<script.py> output:
    ('a!!!', 'b!!!', 'c!!!')

Great job, you’ve just nested a function within another function. One other pretty cool reason for nesting functions is the idea of a closure. This means that the nested or inner function remembers the state of its enclosing scope when called. Thus, anything defined locally in the enclosing scope is available to the inner function even when the outer function has finished execution.
嵌套函数的另一个很酷的原因是闭包的思想。这意味着嵌套或内部函数在调用时记住其封闭范围的状态。因此,即使外部函数已经完成执行,封闭范围内定义的任何内容对内部函数都是可用的。

Let’s move forward then! In this exercise, you will complete the definition of the inner function inner_echo() and then call echo() a couple of times, each with a different argument. Complete the exercise and see what the output will be!
在本练习中,您将完成内部函数inner_echo()的定义,然后多次调用echo(),每个调用都有不同的参数。

def echo(n):
    """Return the inner_echo function."""
    def inner_echo(word1):
        """Concatenate n copies of word1."""
        echo_word = word1 * n
        return echo_word
    return inner_echo

twice = echo(2) # 调用2次echo
thrice = echo(3) # 调用3次echo
print(twice('hello'), thrice('hello'))

<script.py> output:
    hellohello hellohellohello

nonlocal

Let’s once again work further on your mastery of scope! In this exercise, you will use the keyword nonlocal within a nested function to alter the value of a variable defined in the enclosing scope.
在本练习中,您将在嵌套函数中使用关键字nonlocal来更改封闭范围中定义的变量的值。

def echo_shout(word):
    """Change the value of a nonlocal variable"""
    echo_word = word * 2 # 连接word和它本身
    print(echo_word)
    
    def shout(): # 定义内置函数shout()
        """Alter a variable in the enclosing scope"""    
        nonlocal echo_word # 使用外一层函数中的echo_word,对比global
        echo_word = echo_word + '!!!' # 将echo_word改为echo_word!!!
    shout() # 调用shout()
    print(echo_word)
    
echo_shout('hello')

<script.py> output:
    hellohello
    hellohello!!!

多参数函数

In the previous chapter, you’ve learned to define functions with more than one parameter and then calling those functions by passing the required number of arguments. In the last video, Hugo built on this idea by showing you how to define functions with default arguments. You will practice that skill in this exercise by writing a function that uses a default argument and then calling the function a couple of times.
在前一章中,您学习了如何定义具有多个参数的函数,然后通过传递所需的参数数量来调用这些函数。在本练习中,您将通过编写一个使用默认参数的函数并多次调用该函数来练习这一技巧。

def shout_echo(word1, echo = 1):
    """Concatenate echo copies of word1 and three
     exclamation marks at the end of the string."""
    echo_word = word1 * echo
    shout_word = echo_word + '!!!'
    return shout_word
    
no_echo = shout_echo('Hey') # 调用shout_echo('Hey')
with_echo = shout_echo('Hey', echo=5) # 调用shout_echo('Hey', echo=5)
print(no_echo)
print(with_echo)

<script.py> output:
    Hey!!!
    HeyHeyHeyHeyHey!!!

You’ve now defined a function that uses a default argument - don’t stop there just yet! You will now try your hand at defining a function with more than one default argument and then calling this function in various ways.
您现在已经定义了一个使用默认参数的函数—不要到此为止!现在,您将尝试定义一个具有多个默认参数的函数,然后以各种方式调用这个函数。

def shout_echo(word1, echo=1, intense=False):
    """Concatenate echo copies of word1 and three
    exclamation marks at the end of the string."""
    echo_word = word1 * echo
    if intense is True: # 如果为真,则使用echo_word大写
        echo_word_new = echo_word.upper() + '!!!'
    else:
        echo_word_new = echo_word + '!!!'
    return echo_word_new

with_big_echo = shout_echo('Hey', echo=5, intense=True)
big_no_echo = shout_echo('Hey', intense=True)
print(with_big_echo)
print(big_no_echo)

<script.py> output:
    HEYHEYHEYHEYHEY!!!
    HEY!!!

*args

Flexible arguments enable you to pass a variable number of arguments to a function. In this exercise, you will practice defining a function that accepts a variable number of string arguments.
在本练习中,您将练习定义一个接受可变数量字符串参数的函数。在函数定义中,args是一个元组。

def gibberish(*args):
    """Concatenate strings in *args together."""
    hodgepodge = '' # 初始化一个空字符串
    for word in args: # 将字符串连接到args中
        hodgepodge += word
    return hodgepodge

one_word = gibberish('luke') # 用一个字符串调用gibberish()
many_words = gibberish("luke", "leia", "han", "obi", "darth") # 用5个字符串调用gibberish()
print(one_word)
print(many_words)

<script.py> output:
    luke
    lukeleiahanobidarth

**kwargs

Let’s push further on what you’ve learned about flexible arguments - you’ve used *args, you’re now going to use **kwargs! What makes **kwargs different is that it allows you to pass a variable number of keyword arguments to functions. Recall from the previous video that, within the function definition, kwargs is a dictionary.
**kwargs的不同之处在于,它允许您向函数传递可变数量的关键字参数。在函数定义中,kwargs是一个字典。

def report_status(**kwargs):
    """Print out the status of a movie character."""
    print("\nBEGIN: REPORT\n")
    for key, value in kwargs.items(): # 遍历kwargs的键-值对
        print(key + ": " + value)
    print("\nEND REPORT")

report_status(name='luke', affiliation='jedi', status='missing')
report_status(name='anakin', affiliation='sith lord', status='deceased')

<script.py> output:
    
    BEGIN: REPORT
    
    name: luke
    affiliation: jedi
    status: missing
    
    END REPORT
    
    BEGIN: REPORT
    
    name: anakin
    affiliation: sith lord
    status: deceased
    
    END REPORT

Bringing it all together 综合案例2

In this exercise, we will generalize the Twitter language analysis that you did in the previous chapter. You will do that by including a default argument that takes a column name.
在本练习中,我们将总结您在前一章中所做的Twitter语言分析。要做到这一点,需要包含一个默认参数,该参数采用列名。

import pandas as pd
tweets_df = pd.read_csv('tweets.csv')

def count_entries(df, col_name='lang'):
    """Return a dictionary with counts of
    occurrences as value for each key."""
    cols_count = {} # 初始化一个空字典
    col = df[col_name]
    for entry in col: # 遍历DataFrame中的列
        if entry in cols_count.keys():
            cols_count[entry] += 1
        else:
            cols_count[entry] = 1
    return cols_count # 返回cols_count字典

result1 = count_entries(tweets_df, col_name='lang')
result2 = count_entries(tweets_df, col_name='source')
print(result1)
print(result2)

<script.py> output:
    {'en': 97, 'et': 1, 'und': 2}
    {'<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>': 24, '<a href="http://www.facebook.com/twitter" rel="nofollow">Facebook</a>': 1, '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>': 26, '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>': 33, '<a href="http://www.twitter.com" rel="nofollow">Twitter for BlackBerry</a>': 2, '<a href="http://www.google.com/" rel="nofollow">Google</a>': 2, '<a href="http://twitter.com/#!/download/ipad" rel="nofollow">Twitter for iPad</a>': 6, '<a href="http://linkis.com" rel="nofollow">Linkis.com</a>': 2, '<a href="http://rutracker.org/forum/viewforum.php?f=93" rel="nofollow">newzlasz</a>': 2, '<a href="http://ifttt.com" rel="nofollow">IFTTT</a>': 1, '<a href="http://www.myplume.com/" rel="nofollow">Plume\xa0for\xa0Android</a>': 1}

Wow, you’ve just generalized your Twitter language analysis that you did in the previous chapter to include a default argument for the column name. You’re now going to generalize this function one step further by allowing the user to pass it a flexible argument, that is, in this case, as many column names as the user would like!
您刚刚概括了上一章中所做的Twitter语言分析,其中包含了列名的默认参数。现在,您将进一步一般化这个函数,允许用户传递一个灵活的参数,也就是说,在本例中,用户希望有多少列名就有多少列名!

import pandas as pd
tweets_df = pd.read_csv('tweets.csv')

def count_entries(df, *args):
    """Return a dictionary with counts of
    occurrences as value for each key."""
    cols_count = {}
    for col_name in args:
        col = df[col_name]
        for entry in col:
            if entry in cols_count.keys():
                cols_count[entry] += 1
            else:
                cols_count[entry] = 1
    return cols_count

result1 = count_entries(tweets_df, 'lang')
result2 = count_entries(tweets_df, 'lang', 'source')
print(result1)
print(result2)

<script.py> output:
    {'en': 97, 'et': 1, 'und': 2}
    {'en': 97, 'et': 1, 'und': 2, '<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>': 24, '<a href="http://www.facebook.com/twitter" rel="nofollow">Facebook</a>': 1, '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>': 26, '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>': 33, '<a href="http://www.twitter.com" rel="nofollow">Twitter for BlackBerry</a>': 2, '<a href="http://www.google.com/" rel="nofollow">Google</a>': 2, '<a href="http://twitter.com/#!/download/ipad" rel="nofollow">Twitter for iPad</a>': 6, '<a href="http://linkis.com" rel="nofollow">Linkis.com</a>': 2, '<a href="http://rutracker.org/forum/viewforum.php?f=93" rel="nofollow">newzlasz</a>': 2, '<a href="http://ifttt.com" rel="nofollow">IFTTT</a>': 1, '<a href="http://www.myplume.com/" rel="nofollow">Plume\xa0for\xa0Android</a>': 1}

Lambda functions

Some function definitions are simple enough that they can be converted to a lambda function. By doing this, you write less lines of code, which is pretty awesome and will come in handy, especially when you’re writing and maintaining big programs. In this exercise, you will use what you know about lambda functions to convert a function that does a simple task into a lambda function.
有些函数定义非常简单,可以转换成lambda函数。通过这样做,您可以编写更少的代码行,这是非常棒的,而且非常方便,特别是在编写和维护大型程序时。在本练习中,您将使用您对lambda函数的了解来将执行简单任务的函数转换为lambda函数。改写以下函数:

def shout_echo(word1, echo = 1):
    """Concatenate echo copies of word1 and three
     exclamation marks at the end of the string."""
    echo_word = word1 * echo
    return echo_word 
    
with_echo = shout_echo('Hey', echo=5) # 调用shout_echo('Hey', echo=5)
print(with_echo)
echo_word = (lambda word1, echo: word1 * echo)
result = echo_word('hey', 5)
print(result)

<script.py> output:
    heyheyheyheyhey

So far, you’ve used lambda functions to write short, simple functions as well as to redefine functions with simple functionality. The best use case for lambda functions, however, are for when you want these simple functionalities to be anonymously embedded within larger expressions. What that means is that the functionality is not stored in the environment, unlike a function defined with def. To understand this idea better, you will use a lambda function in the context of the map() function.
到目前为止,您已经使用lambda函数来编写简短、简单的函数,以及用简单的功能重新定义函数。然而,lambda函数的最佳用例是当您希望这些简单的功能嵌入到较大的表达式中时。为了更好地理解这个概念,您将在map()函数的上下文中使用lambda函数。

spells = ["protego", "accio", "expecto patronum", "legilimens"]
shout_spells = map(lambda item: item + '!!!', spells) # 使用map()将lambda函数应用于拼字
shout_spells_list = list(shout_spells) # 将shout_spell转换为一个列表
print(shout_spells_list)

<script.py> output:
    ['protego!!!', 'accio!!!', 'expecto patronum!!!', 'legilimens!!!']

In the previous exercise, you used lambda functions to anonymously embed an operation within map(). You will practice this again in this exercise by using a lambda function with filter(), which may be new to you! The function filter() offers a way to filter out elements from a list that don’t satisfy certain criteria.
在本练习中,您将使用带有filter()的lambda函数来练习此方法,filter()函数提供了一种从列表中过滤出不满足特定条件的元素的方法。

fellowship = ['frodo', 'samwise', 'merry', 'pippin', 'aragorn', 'boromir', 'legolas', 'gimli', 'gandalf']
result = filter(lambda member: len(member) > 6, fellowship) # 使用filter()
result_list = list(result)
print(result_list)

<script.py> output:
    ['samwise', 'aragorn', 'boromir', 'legolas', 'gandalf']

You’re getting very good at using lambda functions! Here’s one more function to add to your repertoire of skills. The reduce() function is useful for performing some computation on a list and, unlike map() and filter(), returns a single value as a result. To use reduce(), you must import it from the functools module.
reduce()函数用于对列表执行一些计算,并且与map()和filter()不同,它返回一个单独的值作为结果。要使用reduce(),您必须从functools模块导入它。

from functools import reduce
stark = ['robb', 'sansa', 'arya', 'brandon', 'rickon'] # 创建一个字符串列表
result = reduce(lambda item1, item2: item1 + item2, stark) # 使用reduce()
print(result)

<script.py> output:
    robbsansaaryabrandonrickon

处理错误

In this exercise, you will define a function as well as use a try-except block for handling cases when incorrect input arguments are passed to the function.
在本练习中,您将定义一个函数,并使用try-except块来处理将不正确的输入参数传递给函数的情况。

def shout_echo(word1, echo=1):
    """Concatenate echo copies of word1 and three
    exclamation marks at the end of the string."""
    echo_word = ''
    shout_words = ''
    try: # 使用try-except添加异常处理
        echo_word = echo * word1
        shout_words = echo_word + '!!!'
    except: # 打印错误消息
        print("word1 must be a string and echo must be an integer.")
    return shout_words

shout_echo("particle", echo="accelerator")

<script.py> output:
    word1 must be a string and echo must be an integer.

Another way to raise an error is by using raise. In this exercise, you will add a raise statement to the shout_echo() function you defined before to raise an error message when the value supplied by the user to the echo argument is less than 0.
另一种引起错误的方法是使用raise。在这个练习中,您将向之前定义的shout_echo()函数添加一条raise语句,以便在用户向echo参数提供的值小于0时引发一条错误消息。

def shout_echo(word1, echo=1):
    """Concatenate echo copies of word1 and three
    exclamation marks at the end of the string."""
    if echo<0: # 使用Raise提出错误
        raise Exception('echo must be greater than or equal to 0')
    echo_word = word1 * echo
    shout_word = echo_word + '!!!'
    return shout_word

shout_echo("particle", echo=5)

Bringing it all together 综合案例3

This is awesome! You have now learned how to write anonymous functions using lambda, how to pass lambda functions as arguments to other functions such as map(), filter(), and reduce(), as well as how to write errors and output custom error messages within your functions. You will now put together these learnings to good use by working with a Twitter dataset. Before practicing your new error handling skills,in this exercise, you will write a lambda function and use filter() to select retweets, that is, tweets that begin with the string ‘RT’.
您现在已经了解了如何使用lambda编写匿名函数,如何将lambda函数作为参数传递给其他函数,如map()、filter()和reduce(),以及如何在函数中编写错误和输出自定义错误消息。

import pandas as pd
tweets_df = pd.read_csv('tweets.csv')

result = filter(lambda x: x[0:2]=='RT', tweets_df['text']) # 从DataFrame中选择retweets
res_list = list(result)
for tweet in res_list:
    print(tweet)

<script.py> output:
    RT @bpolitics: .@krollbondrating's Christopher Whalen says Clinton is the weakest Dem candidate in 50 years https://t.co/pLk7rvoRSn https:/...

Sometimes, we make mistakes when calling functions - even ones you made yourself. But don’t fret! In this exercise, you will improve on your previous work with the count_entries() function in the last chapter by adding a try-except block to it. This will allow your function to provide a helpful message when the user calls your count_entries() function but provides a column name that isn’t in the DataFrame.
有时,我们在调用函数时会犯错误——即使是您自己犯的错误。在这个练习中,您将通过添加一个try-except模块来改进下方的count_entries函数。

import pandas as pd
tweets_df = pd.read_csv('tweets.csv')

def count_entries(df, col_name='lang'):
    """Return a dictionary with counts of
    occurrences as value for each key."""
    cols_count = {} # 初始化一个空字典
    col = df[col_name]
    for entry in col: # 遍历DataFrame中的列
        if entry in cols_count.keys():
            cols_count[entry] += 1
        else:
            cols_count[entry] = 1
    return cols_count # 返回cols_count字典

result1 = count_entries(tweets_df, col_name='lang')
result2 = count_entries(tweets_df, col_name='source')
print(result1)
print(result2)
def count_entries(df, col_name='lang'):
    """Return a dictionary with counts of
    occurrences as value for each key."""
    cols_count = {} # 初始化一个空字典
    try: # 添加try块
        col = df[col_name]
        for entry in col:
            if entry in cols_count.keys():
                cols_count[entry] += 1
            else:
                cols_count[entry] = 1
        return cols_count
    except:
        print('The DataFrame does not have a ' + col_name + ' column.')

result1 = count_entries(tweets_df, 'lang')
print(result1)

通过添加一个raise模块来改进count_entries函数。

def count_entries(df, col_name='lang'):
    """Return a dictionary with counts of
    occurrences as value for each key."""
    if col_name not in df.columns: # raise模块
        raise ValueError('The DataFrame does not have a ' + col_name + ' column.')
    cols_count = {}
    col = df[col_name]
    for entry in col:
        if entry in cols_count.keys():
            cols_count[entry] += 1
        else:
            cols_count[entry] = 1
    return cols_count

result1 = count_entries(tweets_df, 'lang')
print(result1)