Python代码中的捕捉性能-CPU分析（Python脚本）

程序员文章站 2024-03-22 09:15:28

...

在这篇文章中，我将介绍一些可以帮助我们解决Python中另一个痛苦问题的工具：分析CPU使用情况。

CPU分析意味着通过分析CPU执行代码的方式来测量代码的性能。这意味着在我们的代码中找到热点，看看我们如何处理它们。

接下来我们将看到如何跟踪Python脚本使用的CPU使用情况。我们将重点关注以下分析器：

系列文章索引：

测量CPU使用率

对于这篇文章，我将主要使用与内存分析文章中使用的脚本相同的脚本，您可以在下面或在这里看到它。

import time


def primes(n):
    if n == 2:
        return [2]
    elif n < 2:
        return []
    s = []
    for i in range(3, n+1):
        if i % 2 != 0:
            s.append(i)
    mroot = n ** 0.5
    half = (n + 1) / 2 - 1
    i = 0
    m = 3
    while m <= mroot:
        if s[i]:
            j = (m * m - 3) / 2
            s[j] = 0
            while j < half:
                s[j] = 0
                j += m
        i = i + 1
        m = 2 * i + 3
    l = [2]
    for x in s:
        if x:
            l.append(x)
    return l

	
def benchmark():
	start = time.time()
	for _ in xrange(40):
		count = len(primes(1000000))
	end = time.time()
	print "Benchmark duration: %r seconds" % (end-start)

	
benchmark()

另外，记住PyPy2，需要使用一个pip版本来处理它:

pypy -m ensure pip

其他任何东西都将使用:

pypy -m pip install

cProfile

讨论CPU分析时最常用的工具之一是cProfile，主要是因为它在CPython2和PyPy2中。它是一个确定性分析器，意味着它将在运行我们的工作负载时收集一组统计数据，比如我们代码的各个部分的执行次数或执行时间。此外，cProfile比其他内置分析器（profile或hotshot）在系统上的开销更低。

使用CPython2时，使用起来很简单：

python -m cProfile 03.primes-v1.py

如果你正在使用PyPy2：

pypy -m cProfile 03.primes-v1.py

这个输出如下：

Benchmark duration: 30.11158514022827 seconds
         23139965 function calls in 30.112 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   30.112   30.112 03.primes.py:1(<module
       40   19.760    0.494   29.896    0.747 03.primes.py:3(primes)
        1    0.216    0.216   30.112   30.112 03.primes.py:31(benchmark)
       40    0.000    0.000    0.000    0.000 {len}
 23139840    6.683    0.000    6.683    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
       40    3.453    0.086    3.453    0.086 {range}
        2    0.000    0.000    0.000    0.000 {time.time}

即使有了这个文本输出，很容易看到我们的脚本很多时候调用这个list.append方法。

如果我们使用gprof2dot，我们可以看到cProfile的输出。要使用它，我们必须首先安装graphviz作为一个要求，然后包装本身。在Ubuntu上输入命令：

apt-get install graphviz
pip install gprof2dot

我们再次运行我们的脚本：

python -m cProfile -o output.pstats 03.primes-v1.py
gprof2dot -f pstats output.pstats | dot -Tpng -o output.png

我们得到以下output.png文件：

Python代码中的捕捉性能-CPU分析（Python脚本）

这样更容易看到一切。让我们仔细看看它输出的内容。你看到脚本中的函数调用图。在每一个方格中，你可以逐行看到：

在第一行：Python文件名，行号和方法名称
在第二行：这个广场使用的全球时间的百分比
在第三行：在括号中，方法本身花费的全球时间的百分比
第四行：拨打的电话号码

例如，从第三个红色方块开始，这个方法primes 占用了98.28％的时间，65.44％在里面做了40次。剩下的时间用在Python 的list.append（22.33％）和range（11.51％）方法中。

作为一个简单的脚本，我们只需要重写我们的脚本，不要使用这么多的附加，就像这样：

import time


def primes(n): 
    if n==2:
        return [2]
    elif n<2:
        return []
    s=range(3,n+1,2)
    mroot = n ** 0.5
    half=(n+1)/2-1
    i=0
    m=3
    while m <= mroot:
        if s[i]:
            j=(m*m-3)/2
            s[j]=0
            while j<half:
                s[j]=0
                j+=m
        i=i+1
        m=2*i+3
    return [2]+[x for x in s if x]

	
def benchmark():
	start = time.time()
	for _ in xrange(40):
		count = len(primes(1000000))
	end = time.time()
	print "Benchmark duration: %r seconds" % (end-start)

	
benchmark()

如果我们用CPython2来测量我们的脚本的时间和现在：

python 03.primes-v1.py
基准持续时间：15.768115043640137秒

python 03.primes-v2.py
基准持续时间：6.56312108039856秒

另外用PyPy2：

pypy 03.primes-v1.py
基准持续时间：1.4009230136871338秒

pypy 03.primes-v2.py
基准持续时间：0.4542720317840576秒

我们获得了良好的2.4X 与CPython2和改进3.1X与PyPy2。不错。和cProfile调用图：

Python代码中的捕捉性能-CPU分析（Python脚本）

您也可以通过编程方式使用cProfile，例如：

import cProfile 
 
pr = cProfile.Profile（）
pr.enable（）
 
function_to_measure（）

pr.disable（）
pr.print_stats（sort ='time'）

这在某些情况下很有用，如多进程性能测量。更多可以在这里看到。

line_profiler

该分析器在工作负载的线级提供信息。它使用Cython在C中实现，在比较时有一个小的开销cProfile。

源代码回购可以在这里找到和PyPI页面在这里。与cProfile相比，它有相当的开销，花费12倍的时间来获得一个配置文件。

要使用它，你需要先通过pip来添加它：pip install pip install Cython ipython==5.4.1 line_profiler （CPython2）。这个分析器的一个主要缺点是它不支持PyPy。

就像使用时一样memory_profiler，你需要添加一个装饰器到你想要分析的函数中。在我们的例子中，你需要@profile在我们primes函数的定义之前添加03.primes-v1.py。

然后像这样调用它：

kernprof -l 03.primes-v1.py
python -m line_profiler 03.primes-v1。py.lprof

你会得到这样的输出：

Timer unit: 1e-06 s

Total time: 181.595 s
File: 03.primes-v1.py
Function: primes at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     3                                           @profile
     4                                           def primes(n):
     5        40          107      2.7      0.0      if n == 2:
     6                                                   return [2]
     7        40           49      1.2      0.0      elif n < 2:
     8                                                   return []
     9        40           44      1.1      0.0      s = []
    10  39999960     34410114      0.9     18.9      for i in range(3, n+1):
    11  39999920     29570173      0.7     16.3          if i % 2 != 0:
    12  19999960     14976433      0.7      8.2              s.append(i)
    13        40          329      8.2      0.0      mroot = n ** 0.5
    14        40           82      2.0      0.0      half = (n + 1) / 2 - 1
    15        40           46      1.1      0.0      i = 0
    16        40           30      0.8      0.0      m = 3
    17     20000        17305      0.9      0.0      while m <= mroot:
    18     19960        16418      0.8      0.0          if s[i]:
    19      6680         6798      1.0      0.0              j = (m * m - 3) / 2
    20      6680         6646      1.0      0.0              s[j] = 0
    21  32449400     22509523      0.7     12.4              while j < half:
    22  32442720     26671867      0.8     14.7                  s[j] = 0
    23  32442720     22913591      0.7     12.6                  j += m
    24     19960        15078      0.8      0.0          i = i + 1
    25     19960        16170      0.8      0.0          m = 2 * i + 3
    26        40           87      2.2      0.0      l = [2]
    27  20000000     14292643      0.7      7.9      for x in s:
    28  19999960     13753547      0.7      7.6          if x:
    29   3139880      2417421      0.8      1.3              l.append(x)
    30        40           33      0.8      0.0      return l

我们看到，反复调用的两个循环list.append占用了我们脚本的大部分时间。

pprofile

根据作者，pprofile是一个“线粒度，线程感知确定性和统计纯python分析器”。

它受到启发line_profiler，修复了许多缺点，但是因为它完全是用Python编写的，所以它也可以成功地用于PyPy。与cProfile相比，使用CPython的分析花费的时间多了28倍，使用PyPy花费的时间多了10倍，而且细节层次也更加细化。

我们支持PyPy！除此之外，它支持分析线程，这在各种情况下都可以派上用场。

要使用它，你需要先通过pip添加它：pip install pprofile （CPython2）/ pypy -m pip install pprofile（PyPy），

然后像这样调用它：

pprofile 03.primes-v1.py

输出与我们之前看到的不同，我们得到如下的东西：

Benchmark duration: 886.8774709701538 seconds
Command line: ['03.primes-v1.py']
Total duration: 886.878s
File: 03.primes-v1.py
File duration: 886.878s (100.00%)
Line #|      Hits|         Time| Time per hit|      %|Source code
------+----------+-------------+-------------+-------+-----------
     1|         2|  7.10487e-05|  3.55244e-05|  0.00%|import time
     2|         0|            0|            0|  0.00%|
     3|         0|            0|            0|  0.00%|
     4|        41|   0.00029397|     7.17e-06|  0.00%|def primes(n):
     5|        40|  0.000231266|  5.78165e-06|  0.00%|    if n == 2:
     6|         0|            0|            0|  0.00%|        return [2]
     7|        40|  0.000178337|  4.45843e-06|  0.00%|    elif n < 2:
     8|         0|            0|            0|  0.00%|        return []
     9|        40|  0.000188112|  4.70281e-06|  0.00%|    s = []
    10|  39999960|      159.268|  3.98171e-06| 17.96%|    for i in range(3, n+1):
    11|  39999920|      152.924|  3.82312e-06| 17.24%|        if i % 2 != 0:
    12|  19999960|      76.2135|  3.81068e-06|  8.59%|            s.append(i)
    13|        40|   0.00147367|  3.68416e-05|  0.00%|    mroot = n ** 0.5
    14|        40|  0.000319004|   7.9751e-06|  0.00%|    half = (n + 1) / 2 - 1
    15|        40|  0.000220776|  5.51939e-06|  0.00%|    i = 0
    16|        40|  0.000243902|  6.09756e-06|  0.00%|    m = 3
    17|     20000|    0.0777466|  3.88733e-06|  0.01%|    while m <= mroot:
    18|     19960|    0.0774016|  3.87784e-06|  0.01%|        if s[i]:
    19|      6680|    0.0278566|  4.17015e-06|  0.00%|            j = (m * m - 3) / 2
    20|      6680|    0.0275929|  4.13067e-06|  0.00%|            s[j] = 0
    21|  32449400|      114.858|   3.5396e-06| 12.95%|            while j < half:
    22|  32442720|      120.841|  3.72475e-06| 13.63%|                s[j] = 0
    23|  32442720|      114.432|   3.5272e-06| 12.90%|                j += m
    24|     19960|    0.0749919|  3.75711e-06|  0.01%|        i = i + 1
    25|     19960|    0.0765574|  3.83554e-06|  0.01%|        m = 2 * i + 3
    26|        40|  0.000222206|  5.55515e-06|  0.00%|    l = [2]
    27|  20000000|      68.8031|  3.44016e-06|  7.76%|    for x in s:
    28|  19999960|      67.9391|  3.39696e-06|  7.66%|        if x:
    29|   3139880|      10.9989|  3.50295e-06|  1.24%|            l.append(x)
    30|        40|  0.000155687|  3.89218e-06|  0.00%|    return l
    31|         0|            0|            0|  0.00%|
    32|         0|            0|            0|  0.00%|
    33|         2|  8.10623e-06|  4.05312e-06|  0.00%|def benchmark():
    34|         1|  5.00679e-06|  5.00679e-06|  0.00%|  start = time.time()
    35|        41|   0.00101089|   2.4656e-05|  0.00%|  for _ in xrange(40):
    36|        40|     0.232263|   0.00580657|  0.03%|          count = len(primes(1000000))
(call)|        40|      886.644|      22.1661| 99.97%|# 03.primes-v1.py:4 primes
    37|         1|  5.96046e-06|  5.96046e-06|  0.00%|  end = time.time()
    38|         1|  0.000678062|  0.000678062|  0.00%|  print "Benchmark duration: %r seconds" % (end-start)
    39|         0|            0|            0|  0.00%|
    40|         0|            0|            0|  0.00%|
    41|         1|  5.79357e-05|  5.79357e-05|  0.00%|benchmark()
(call)|         1|      886.878|      886.878|100.00%|# 03.primes-v1.py:33 benchmark

我们现在可以更详细地看到一切。我们来看看输出。您可以获得脚本的全部输出，并且在每行的前面可以看到对其进行的调用次数，运行花费的时间（以秒为单位），每次调用的时间以及花费的全局时间的百分比运行它。另外，pprofile在我们的输出中增加了额外的行（如44行和50行(call)），以及累计度量。

我们再次看到，反复调用的两个循环list.append花费了大量的时间。

vprof

vprof是一个Python分析器，为各种Python程序特性（如运行时间和内存使用情况）提供丰富的交互式可视化。这是一个基于Node.JS的图形显示在网页上的结果。

有了它，您可以看到以下一个或所有与Python脚本相关的内容：

CPU火焰图
代码分析
内存图
代码热图

要使用它，你需要先通过pip添加它：pip install vprof （CPython2）/ pypy -m pip install vprof（PyPy），然后像这样调用它：

在CPython2上，显示代码热图（下面的第一个调用）和代码分析（下面的第二个调用）：

vprof -ch 03.primes-v1.py
vprof -cp 03.primes-v1.py

在PyPy上，显示代码热图（下面的第一个调用）和代码分析（下面的第二个调用）：

pypy -m vprof -ch 03.primes-v1.py
pypy -m vprof -cp 03.primes-v1.py

在每种情况下，您将看到以下代码热图：

Python代码中的捕捉性能-CPU分析（Python脚本）

和下面的代码分析。

Python代码中的捕捉性能-CPU分析（Python脚本）

结果以图形方式显示，我们可以将鼠标悬停在鼠标上或单击每行以获取更多信息。

我们再次看到，反复调用的两个循环list.append花费了大量的时间。

作者：Alecsandru Patrascu，alecsandru.patrascu [at] rinftech [dot] com

翻译自：https://pythonfiles.wordpress.com/2017/06/01/hunting-performance-in-python-code-part-3/

相关标签： python 程序性能分析

上一篇： C语言宏 #、##操作

下一篇：查看linux下gem的文档

Python代码中的捕捉性能-CPU分析（Python脚本）

测量CPU使用率

cProfile

line_profiler

pprofile

vprof

Python代码中的捕捉性能-CPU分析（Python脚本）

python中wx将图标显示在右下角的脚本代码

Python中捕捉详细异常信息的代码示例

python中wx将图标显示在右下角的脚本代码

Python判断值是否在list或set中的性能对比分析

python脚本实现统计日志文件中的ip访问次数代码分享

.net 调用 Python脚本中的代码

Python中eval带来的潜在风险代码分析

如何使用Python脚本分析CPU使用情况的？

python中delattr删除对象方法的代码分析