使用bandit对目标python代码进行安全函数扫描的案例分析

程序员文章站 2022-03-19 14:53:44

技术背景在一些对python开源库代码的安全扫描中，我们有可能需要分析库中所使用到的函数是否会对代码的执行环境造成一些非预期的影响。典型的例如python的沙箱逃逸问题，通过一些python的第三方库...

技术背景

在一些对python开源库代码的安全扫描中，我们有可能需要分析库中所使用到的函数是否会对代码的执行环境造成一些非预期的影响。典型的例如python的沙箱逃逸问题，通过一些python的第三方库可以执行系统shell命令，而这就不在python的沙箱防护范围之内了。关于python的沙箱逃逸问题，这里不作展开，这也是困扰业界多年的一个问题，连python官方也提过python的沙箱是没有完美的防护方案的，这里仅作为一个背景案例使用：

# subprocess_popen.py

import subprocess
import uuid

subprocess.popen('touch ' + str(uuid.uuid1()) +'.txt', shell = true)

这里演示的功能是使用subprocess函数库开启一个系统shell，并执行一个touch的指令，可以生成一个指定文件名的文件，类似于mkdir产生一个文件夹。我们可以看到这个文件成功执行后会在当前的目录下生成一个uuid随机命名的txt文件：

[dechin@dechin-manjaro bandit_test]$ python3 subprocess_popen.py 
[dechin@dechin-manjaro bandit_test]$ ll
总用量 4
-rw-r--r-- 1 dechin dechin 0 1月 26 23:03 b7aa0fc8-5fe7-11eb-b5d3-058313e110e4.txt
-rw-r--r-- 1 dechin dechin 123 1月 26 23:03 subprocess_popen.py

然而，本次的关注点并不在与这个函数执行了什么功能，而是这个函数中用到了subprocess这个函数库。按照python的语言特点，当你的系统中如果存在这样的一个模块引用了subprocess库，那么任何可以调用该功能模块的函数，都可以调用到subprocess这个函数，以下是另外一个恶意用户的python代码：

# bad.py

from subprocess_popen import subprocess as subprocess

subprocess.popen('touch bad.txt', shell = true)

该代码的目的是在不直接import subprocess的条件下，通过前面创建好的subprocess_popen.py来进行搭桥调用subprocess的功能函数。这个脚本的执行结果如下：

[dechin@dechin-manjaro bandit_test]$ python3 bad.py 
[dechin@dechin-manjaro bandit_test]$ ll
总用量 12
-rw-r--r-- 1 dechin dechin 0 1月 26 23:13 0fda7ede-5fe9-11eb-80a8-ad279ab4e0a6.txt
-rw-r--r-- 1 dechin dechin 0 1月 26 23:03 b7aa0fc8-5fe7-11eb-b5d3-058313e110e4.txt
-rw-r--r-- 1 dechin dechin 113 1月 26 23:13 bad.py
-rw-r--r-- 1 dechin dechin 0 1月 26 23:13 bad.txt
drwxr-xr-x 2 dechin dechin 4096 1月 26 23:13 __pycache__
-rw-r--r-- 1 dechin dechin 123 1月 26 23:03 subprocess_popen.py

这个结果意味着，我们成功的使用bad.py调用了subprocess_popen.py中所引用的subprocess，成功touch了一个bad.txt的文件。

到这里我们的背景案例演示结束，但我们需要重新梳理这些案例中所包含的逻辑：我们原本是希望在自己的系统中不引入python的沙箱逃逸问题，我们会对其他人传递过来的代码进行扫描，如使用下文中将要介绍的bandit工具来屏蔽subprocess等"危险函数"。而如果我们在自己写的python库或者引入的第三方python库中存在类似于subprocess的引用，这就会导致我们的屏蔽失效，用户可以任意的通过这些引用的搭桥直接调用subprocess的函数功能。因此，在特殊的条件要求下，我们需要对自己的代码进行安全函数扫描，以免为其他人的系统带来不可预期的安全风险。bandit只是其中的一种安全函数扫描的工具，接下来我们介绍一下其基本安装和使用方法。

用pip安装bandit

这里直接使用pip来安装bandit，有需要的也可以从源码直接安装。关于在pip的使用中配置国内镜像源的方法，可以参考这篇中对python安装第三方库的介绍。

[dechin@dechin-manjaro bandit_test]$ python3 -m pip install bandit
collecting bandit
 downloading bandit-1.7.0-py3-none-any.whl (115 kb)
 |████████████████████████████████| 115 kb 101 kb/s 
requirement already satisfied: pyyaml>=5.3.1 in /home/dechin/anaconda3/lib/python3.8/site-packages (from bandit) (5.3.1)
collecting gitpython>=1.0.1
 downloading gitpython-3.1.12-py3-none-any.whl (159 kb)
 |████████████████████████████████| 159 kb 28 kb/s 
requirement already satisfied: six>=1.10.0 in /home/dechin/anaconda3/lib/python3.8/site-packages (from bandit) (1.15.0)
collecting stevedore>=1.20.0
 downloading stevedore-3.3.0-py3-none-any.whl (49 kb)
 |████████████████████████████████| 49 kb 25 kb/s 
collecting gitdb<5,>=4.0.1
 downloading gitdb-4.0.5-py3-none-any.whl (63 kb)
 |████████████████████████████████| 63 kb 28 kb/s 
collecting pbr!=2.1.0,>=2.0.0
 downloading pbr-5.5.1-py2.py3-none-any.whl (106 kb)
 |████████████████████████████████| 106 kb 26 kb/s 
collecting smmap<4,>=3.0.1
 downloading smmap-3.0.5-py2.py3-none-any.whl (25 kb)
installing collected packages: smmap, gitdb, gitpython, pbr, stevedore, bandit
successfully installed gitpython-3.1.12 bandit-1.7.0 gitdb-4.0.5 pbr-5.5.1 smmap-3.0.5 stevedore-3.3.0

安装结束之后，可以通过以下指令验证是否安装成功：

[dechin@dechin-manjaro bandit_test]$ bandit -h
usage: bandit [-h] [-r] [-a {file,vuln}] [-n context_lines] [-c config_file] [-p profile] [-t tests] [-s skips] [-l] [-i] [-f {csv,custom,html,json,screen,txt,xml,yaml}] [--msg-template msg_template] [-o [output_file]] [-v] [-d] [-q]
  [--ignore-nosec] [-x excluded_paths] [-b baseline] [--ini ini_path] [--exit-zero] [--version]
  [targets [targets ...]]

bandit - a python source code security analyzer

positional arguments:
 targets  source file(s) or directory(s) to be tested

optional arguments:
 -h, --help  show this help message and exit
 -r, --recursive find and process files in subdirectories
 -a {file,vuln}, --aggregate {file,vuln}
   aggregate output by vulnerability (default) or by filename
 -n context_lines, --number context_lines
   maximum number of code lines to output for each issue
 -c config_file, --configfile config_file
   optional config file to use for selecting plugins and overriding defaults
 -p profile, --profile profile
   profile to use (defaults to executing all tests)
 -t tests, --tests tests
   comma-separated list of test ids to run
 -s skips, --skip skips
   comma-separated list of test ids to skip
 -l, --level  report only issues of a given severity level or higher (-l for low, -ll for medium, -lll for high)
 -i, --confidence report only issues of a given confidence level or higher (-i for low, -ii for medium, -iii for high)
 -f {csv,custom,html,json,screen,txt,xml,yaml}, --format {csv,custom,html,json,screen,txt,xml,yaml}
   specify output format
 --msg-template msg_template
   specify output message template (only usable with --format custom), see custom format section for list of available values
 -o [output_file], --output [output_file]
   write report to filename
 -v, --verbose  output extra information like excluded and included files
 -d, --debug  turn on debug mode
 -q, --quiet, --silent
   only show output in the case of an error
 --ignore-nosec do not skip lines with # nosec comments
 -x excluded_paths, --exclude excluded_paths
   comma-separated list of paths (glob patterns supported) to exclude from scan (note that these are in addition to the excluded paths provided in the config file) (default:
   .svn,cvs,.bzr,.hg,.git,__pycache__,.tox,.eggs,*.egg)
 -b baseline, --baseline baseline
   path of a baseline report to compare against (only json-formatted files are accepted)
 --ini ini_path path to a .bandit file that supplies command line arguments
 --exit-zero  exit with 0, even with results found
 --version  show program's version number and exit

custom formatting
-----------------

available tags:

 {abspath}, {relpath}, {line}, {test_id},
 {severity}, {msg}, {confidence}, {range}

example usage:

 default template:
 bandit -r examples/ --format custom --msg-template \
 "{abspath}:{line}: {test_id}[bandit]: {severity}: {msg}"

 provides same output as:
 bandit -r examples/ --format custom

 tags can also be formatted in python string.format() style:
 bandit -r examples/ --format custom --msg-template \
 "{relpath:20.20s}: {line:03}: {test_id:^8}: defect: {msg:>20}"

 see python documentation for more information about formatting style:
 https://docs.python.org/3/library/string.html

the following tests were discovered and loaded:
-----------------------------------------------
 b101 assert_used
 b102 exec_used
 b103 set_bad_file_permissions
 b104 hardcoded_bind_all_interfaces
 b105 hardcoded_password_string
 b106 hardcoded_password_funcarg
 b107 hardcoded_password_default
 b108 hardcoded_tmp_directory
 b110 try_except_pass
 b112 try_except_continue
 b201 flask_debug_true
 b301 pickle
 b302 marshal
 b303 md5
 b304 ciphers
 b305 cipher_modes
 b306 mktemp_q
 b307 eval
 b308 mark_safe
 b309 httpsconnection
 b310 urllib_urlopen
 b311 random
 b312 telnetlib
 b313 xml_bad_celementtree
 b314 xml_bad_elementtree
 b315 xml_bad_expatreader
 b316 xml_bad_expatbuilder
 b317 xml_bad_sax
 b318 xml_bad_minidom
 b319 xml_bad_pulldom
 b320 xml_bad_etree
 b321 ftplib
 b323 unverified_context
 b324 hashlib_new_insecure_functions
 b325 tempnam
 b401 import_telnetlib
 b402 import_ftplib
 b403 import_pickle
 b404 import_subprocess
 b405 import_xml_etree
 b406 import_xml_sax
 b407 import_xml_expat
 b408 import_xml_minidom
 b409 import_xml_pulldom
 b410 import_lxml
 b411 import_xmlrpclib
 b412 import_httpoxy
 b413 import_pycrypto
 b501 request_with_no_cert_validation
 b502 ssl_with_bad_version
 b503 ssl_with_bad_defaults
 b504 ssl_with_no_version
 b505 weak_cryptographic_key
 b506 yaml_load
 b507 ssh_no_host_key_verification
 b601 paramiko_calls
 b602 subprocess_popen_with_shell_equals_true
 b603 subprocess_without_shell_equals_true
 b604 any_other_function_with_shell_equals_true
 b605 start_process_with_a_shell
 b606 start_process_with_no_shell
 b607 start_process_with_partial_path
 b608 hardcoded_sql_expressions
 b609 linux_commands_wildcard_injection
 b610 django_extra_used
 b611 django_rawsql_used
 b701 jinja2_autoescape_false
 b702 use_of_mako_templates
 b703 django_mark_safe

从这个列表中的屏蔽函数我们可以看出所谓的"危险函数"到底都有哪些，比如常用的subprocess和random都被包含在内。subprocess是因为其对shell的调用而被列为"危险函数"，而random则是因为其伪随机数的性质(这里简单说明一下，现在一般推荐使用secrets中的所谓安全随机数，但是实际上只有量子叠加测量才能够真正实现真随机数)。

bandit常用使用方法

直接对py文件进行扫描：

[dechin@dechin-manjaro bandit_test]$ bandit subprocess_popen.py 
[main] info profile include tests: none
[main] info profile exclude tests: none
[main] info cli include tests: none
[main] info cli exclude tests: none
[main] info running on python 3.8.5
[node_visitor] info unable to find qualified name for module: subprocess_popen.py
run started:2021-01-26 15:31:00.425603

test results:
>> issue: [b404:blacklist] consider possible security implications associated with subprocess module.
 severity: low confidence: high
 location: subprocess_popen.py:3
 more info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_imports.html#b404-import-subprocess
2
3 import subprocess
4 import uuid

--------------------------------------------------
>> issue: [b602:subprocess_popen_with_shell_equals_true] subprocess call with shell=true identified, security issue.
 severity: high confidence: high
 location: subprocess_popen.py:6
 more info: https://bandit.readthedocs.io/en/latest/plugins/b602_subprocess_popen_with_shell_equals_true.html
5
6 subprocess.popen('touch ' + str(uuid.uuid1()) +'.txt', shell = true)

--------------------------------------------------

code scanned:
 total lines of code: 3
 total lines skipped (#nosec): 0

run metrics:
 total issues (by severity):
  undefined: 0.0
  low: 1.0
  medium: 0.0
  high: 1.0
 total issues (by confidence):
  undefined: 0.0
  low: 0.0
  medium: 0.0
  high: 2.0
files skipped (0):

通过对刚才所创建的调用了危险函数subprocess的py文件subprocess_popen.py的扫描，我们识别出了其中的"危险函数"，注意这里的issue编号是602，定级是severity: low confidence: high。但是如果我们用bandit去扫描利用了其他函数对危险函数的调用搭桥来二次调用的bad.py文件，我们发现是另外一种结果：

[dechin@dechin-manjaro bandit_test]$ bandit bad.py 
[main] info profile include tests: none
[main] info profile exclude tests: none
[main] info cli include tests: none
[main] info cli exclude tests: none
[main] info running on python 3.8.5
[node_visitor] info unable to find qualified name for module: bad.py
run started:2021-01-26 15:30:47.370468

test results:                            
>> issue: [b404:blacklist] consider possible security implications associated with subprocess module.
 severity: low confidence: high                          
 location: bad.py:3                           
 more info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_imports.html#b404-import-subprocess                
2
3 from subprocess_popen import subprocess as subprocess
4
5 subprocess.popen('touch bad.txt', shell = true)

--------------------------------------------------
>> issue: [b604:any_other_function_with_shell_equals_true] function call with shell=true parameter identified, possible security issue.
 severity: medium confidence: low                         
 location: bad.py:5                           
 more info: https://bandit.readthedocs.io/en/latest/plugins/b604_any_other_function_with_shell_equals_true.html                
4
5 subprocess.popen('touch bad.txt', shell = true)

--------------------------------------------------

code scanned:                            
 total lines of code: 2
 total lines skipped (#nosec): 0

run metrics:                         
 total issues (by severity):
  undefined: 0.0
  low: 1.0
  medium: 1.0
  high: 0.0
 total issues (by confidence):
  undefined: 0.0
  low: 1.0
  medium: 0.0
  high: 1.0
files skipped (0):

注意这里虽然实现的功能跟上面那个例子是一样的，但是这里的issue编号为604，定级也变成了severity: medium confidence: low。这里的关键并不是定级变成了什么，而是定级被改变了，这是因为bandit是通过对字符串的处理来识别危险函数的，因此对于这种二次调用的特殊场景，bandit不一定都能够准确的识别出来对危险函数的调用，甚至可能出现二次调用后，完全无法识别风险函数的使用的可能性。

2.扫描一个目录下的所有py文件，并将结果写入txt文件

[dechin@dechin-manjaro bandit_test]$ bandit *.py -o test_bandit.txt -f txt
[main] info profile include tests: none
[main] info profile exclude tests: none
[main] info cli include tests: none
[main] info cli exclude tests: none
[main] info running on python 3.8.5
[node_visitor] info unable to find qualified name for module: bad.py
[node_visitor] info unable to find qualified name for module: subprocess_popen.py
[text] info text output written to file: test_bandit.txt

该案例就扫描了当前目录下的所有py文件，其实就是bad.py和subprocess_popen.py这两个，并且将最终的扫描结果保存至test_bandit.txt文件中，这里我们就不展示txt文件的具体内容，大概就是将上一章节的两个执行结果进行了整合。

3.扫描一个目录下的多层文件夹中的py文件，并将结果写入html文件

假如我们有如下所示的一个目录结构需要进行扫描测试：

[dechin@dechin-manjaro bandit_test]$ tree
.
├── bad.py
├── bad.txt
├── level2
│ └── test_random.py
├── subprocess_popen.py
├── test_bandit.html
└── test_bandit.txt

1 directory, 6 files
[dechin@dechin-manjaro bandit_test]$ cat level2/test_random.py 
# test_bandit.py

import random

a = random.random()

我们可以在当前目录下执行如下指令：

[dechin@dechin-manjaro bandit_test]$ bandit -r . -f html -o test_bandit.html
[main] info profile include tests: none
[main] info profile exclude tests: none
[main] info cli include tests: none
[main] info cli exclude tests: none
[main] info running on python 3.8.5
[html] info html output written to file: test_bandit.html

这里我们得到的结果是一个test_bandit.html文件，文件内容如下图所示：

使用bandit对目标python代码进行安全函数扫描的案例分析

4.使用配置文件禁用部分issue
在执行目录下创建一个.bandit文件，作如下配置就可以避免对b404的审查：

[bandit]
skips: b404

执行的扫描结果如下图所示，我们可以看到b404相关的issue已经不在列表中了：

使用bandit对目标python代码进行安全函数扫描的案例分析

5.在py文件中直接逃避bandit审计
在待扫描的py文件的对应风险函数后加上如下注释，即可在bandit审计过程中自动忽略：

# bad.py

from subprocess_popen import subprocess as sb

sb.popen('touch bad.txt', shell = 1) # nosec

这里我们可以看到最终的审计结果中，b604也随之而不见了，如下图所示。从这个案例中我们也可以知悉，bandit并不是一个用来作安全防护的工具，仅仅是用来做比较初步的python代码安全函数使用规范的审查工作，而扫描出来的问题是否处理，其实最终还是取决于开发者自己。

使用bandit对目标python代码进行安全函数扫描的案例分析

bandit简单性能测试

众所周知python语言的性能是极其受限的，因此bandit的性能也有可能十分的低下，这里让我们来定量的测试一下bandit的性能到底在什么水准。首先我们创建一个10000行的py文件，内容全部为危险函数的使用：

# gen.py

import os

with open('test_bandit_power.py', 'w') as py_file:
 py_file.write('import subprocess as sb\n')
 for i in range(10000):
 py_file.write('sb.popen(\'whoami\', shell = 1)\n')

通过执行python3 gen.py就可以生成一个10000行的危险函数文件test_bandit_power.py，大约300kb的大小。此时我们针对这单个的文件进行bandit扫描测试，我们发现这个过程极为漫长，并且生成了大量的错误日志：

[dechin@dechin-manjaro bandit_test]$ time bandit test_bandit_power.py -f html -o test_power.html
[main] info profile include tests: none
[main] info profile exclude tests: none
[main] info cli include tests: none
[main] info cli exclude tests: none
[main] info running on python 3.8.5
[node_visitor] info unable to find qualified name for module: test_bandit_power.py
[html] info html output written to file: test_power.html

real 0m6.239s
user 0m6.082s
sys 0m0.150s

我们可以简单估算，如果10000行的代码都需要6s的时间来进行扫描，那么对于比较大的项目的1000000+的代码的扫描时间，则有可能达到10min往上，这个时间虽然也不是特别长，但是对于大型的项目而言这绝对不是一个非常高效的选择。

总结概要

在一些对安全性要求较高的开发项目中，有可能会禁止使用危险函数，如subprocess等。而bandit的作用旨在通过对代码的扫描自动化的给出安全危险函数分析意见，至于是否采纳，还是以不同项目的管理者需求为准。同时经过我们的测试发现，bandit在实际使用场景下性能表现并不如意，因此在大型项目中我们并不推荐使用，如果一定要使用也可以考虑进行针对性的配置。

本文首发链接为：
作者id：dechinphy
更多原著文章请参考：

上一篇： http协议学习笔记（2）

下一篇： STOMP协议规范