tensorflow2.0系列（4）: Eager Execution和Auto Graph

程序员文章站 2022-06-12 19:38:26

...

静态图的弊端

tensorflow的最初版本是用静态图的方式运行的，在这种方式下，计算图将计算的定义和执行分隔开, 这是一种声明式（declaretive）的编程模型.

静态图的执行模式优点很多,但是在debug时确实非常不方便(类似于对编译好的C语言程序调用,此时是我们无法对其进行内部的调试), 因此有了Eager Execution, 这在TensorFlow v1.5首次引入，在2.0版本中成为了核心API。

引入的Eager Execution模式后, TensorFlow就拥有了类似于Pytorch一样动态图模型能力, 我们可以不必再等到see.run(*)才能看到执行结果, 可以方便在IDE随时调试代码,查看OPs执行结果. 动态图的引入也给写tf代码带来一些新的特性，需要注意。

Eager模式

Eager 模式有点儿类似于python的命令式编程，不需要编译直接运行，非常直观。

Eager execution的基本特性

对 numpy 的支持

eager 模式下对 numpy 的支持很友好，具体特性如下：

numpy 的操作可以接受 Tensor 作为参数；
tensorflow 的数学操作会将 python 对象和 numpy 的 arrays 转换成 Tensor；
tf.Tensor.numpy 方法返回 numpy的ndarray

例如：

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Thu Jan  9 10:21:24 2020

@author: [email protected]
"""

import numpy as np
import tensorflow as tf
tf.compat.v1.enable_eager_execution()

def example_of_tf_and_np():

    a = tf.constant([[1,2],[3,4]])
    b = tf.add(a,1)
    
    print(a)
    print(b)
    
    print('tf\'s multiply: ')
    print(a*b)
    
    c = np.multiply(a,b)
    print('numpy\'s multiply:')
    print(c)
    
    print('transfer tensor a to numpy ndarray from: ')
    print(a.numpy())
    
if __name__ == ‘__main__’:
    example_of_tf_and_np()

得到：

tf's multiply: 
tf.Tensor(
[[ 2  6]
 [12 20]], shape=(2, 2), dtype=int32)
numpy's multiply:
[[ 2  6]
 [12 20]]
transfer tensor a to numpy ndarray from: 
[[1 2]
 [3 4]]

虽然tensorflow的eager模式对tensor 和numpy的多维数据之间有很好的兼容性，但是并不意味着tf.Tensor() 定义的变量与python的其它变量等同。在实际使用中，一定要注意不能混淆了python变量和tf的Tensor对象。

Auto Graph - 动态图

eager模式支持python的控制流，也支持tf的动态流，对于tf的动态流，对于while循环或者类似的循环（也许使用for，if控制），形如：

while x>0:
    x = x-1

在tensorflow控制流中可以写为tf.while_loop(…, loop_vars=(x,))的形式。但是，tf.while_loop不能支持无限个变量，同时tensorflow 计算图的效率受到其中while loop循环数量的影响，所有不能随意地使用while loop。

AutoGraph使用静态分析来确定代码修改了哪些符号，以便将它们转换为控制流变量。静态分析通常是在单个函数上执行的——Python的动态特性限制了它跨函数的有效性。

static analysis VS dynamic flow

局部参数的可见域

在函数中的局部变量发生变化后，函数外的主程序那里这个变化是不可见的，类似地，在类定义的方法中，局部变量发生改变的时候，主程序也是不可见的，除非这些变量显式地作为输出参数返回。同理，对于类成员函数内部的参数而言，在函数外也是不可见的。

python collections 数据在tensorflow控制流的使用

tf的控制流支持大多数python数据结构，例如列表，字典和元组，包括collection对象的namedtuple对象，但是在tf的控制流中，这些变量被许是固定结构的，也即是说在loop中，列表不能改变长度，字典不能增加或者减少keys。啥是namedtuple，可以参考：https://docs.python.org/3/library/collections.html#collections.namedtuple


def fn():
  l = []

  def loop_cond(i):
    return i < 10

  def loop_body(i):
    i = i + 1
    l.append(i)
    return i,

  tf.while_loop(
      cond=loop_cond,
      body=loop_body,
      loop_vars=(0,))

  return l

print(fn()) # 输出：[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

tf.function(fn)() # ERROR

在eager execution下可以运行的代码，在tf.function(fn)()下就报错了。这是因为tf.function() 会启动 graph execution, 而tf 对graph execution采用了特殊的机制来保证运算顺序的正确性。

再比如下面的例子：

def fnn():
    l = []
    for i in tf.range(10):
      l.append(i)  # Error -- illegal tensor capture!
    return l

直接在eager execution模式下执行ll=fnn()，得到ll是一个eager exectuion的tensor list。但是同样用tf.function(fnn)()执行，报错如下：

InaccessibleTensorError: The tensor ‘Tensor(“placeholder:0”, shape=(), dtype=int32)’ cannot be accessed here: it is defined in another function or code block. Use return values, explicit Python locals or TensorFlow collections to access it. Defined in: FuncGraph(name=while_body_1396, id=5377892048); accessed from: FuncGraph(name=fnn, id=5374487632).

正确的方式应该是定义 l 为tf.TensorArray()类型的变量，在循环中调用TensorArray的write( )方法，逐步增加TensorArray中的元素。局部参数l在定义时赋值，长度为0，数据类型为int32，并且设置该TensorArrary是可变长度的（dynamic_size=True）

def fnn():
    l=tf.TensorArray(tf.int32,size=0,dynamic_size=True)
    for i in tf.range(10):
        l.write(l.size(), i)
    return l
tf.function(fnn)()

当然，上面的fnn()函数也可以直接用eager execution模式执行（ll=fnn()），得到ll是

ll
Out[188]: <tensorflow.python.ops.tensor_array_ops.TensorArray at 0x140957d90>

如果在tensorflow的流程控制中含有python collections，index是可变的，但是structure应当是固定的。
例如：


#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sat Jan 18 22:27:11 2020

@author: [email protected]
"""

import tensorflow as tf


if tf.executing_eagerly():
    tf.compat.v1.disable_eager_execution()
    
@tf.function
def dict_loop():
    d = {'a': tf.constant(3)}
    for i in tf.range(10):
      d = {key: value + i for key, value in d.items()}
    return d

@tf.function
def dict_loop2():
    d = {'a': tf.constant(3)}
    for i in tf.range(10):
      for key in d:
        d[key] += i  # Problem -- accessing `dict` using non-constant key
    return d
''
d = dict_loop() # d={'a': <tf.Tensor 'StatefulPartitionedCall_3:0' shape=() dtype=int32>}
但是
d2 = dict_loop2() # ERROR

这个例子中，dict_loop2()函数中定义的方式报错了，而dict_loop()函数就没有问题，官方给出的解释是应该采用函数式（functional style）的编程方法，在编写代码的时候一定要注意这个细微差别。

tensorflow 控制流中tensor的维度和数据类型

但是在tf的图控制流中，tensor维度和数据类型需要保持不变，不过这一个限制在Eager exectuion模式下无效，因为在eager模式下，采用的是python的控制流。所以将代码从eager模式下转到图模式下的时候，一定要注意这个问题。

动态计算与静态维度

tensor的shape与rank定义如下：

用.shape方法获取其静态的大小( static shape ), 用.shape.rank方法获取tensor的静态rank。当tensor是dinamic的时候，其shape和rank则应该分别用tf.shape(), tf.rank() 得到。

如果代码中需要用到动态维度，有两种处理方法：
1）可以用@tf.function装饰器，例如

@tf.function(input_signature=(tf.TensorSpec(shape=(None,))))
def f(x):  # x now has dynamic shape
  if tf.shape(x)[0] >= 3:  # Builds a tf.cond
    val = x[4]  # Okay, bounds checks are skipped when the shape is dynamic
  else:
    val = some_default_value

这里给input_signature赋值后，tf执行时会跳过shape相关的检查。
2）用python控制流，添加对参数是static还是dynamic的检查，例如

if x.shape[0] is None:  # Python bool, does not use tf.cond
  # ... use x.shape here ...
else:
  # ... use tf.shape(x) here ...

dtype和shape的一致性

在tf流程中，必须注意dtype和shape始终应该保持一致，例如下面的错误代码：

x = tf.cond(
    tf.random.uniform(()) > 0.5,
    lambda: tf.constant(1, dtype=tf.int32),
    lambda: tf.constant(1, dtype=tf.float32))  # Error -- inconsistent dtypes: int32, float32

# This won't work - "x" changes dtype inside the loop.
x = tf.while_loop(
    lambda _: tf.random.uniform(()) > 0.5,
    lambda x: tf.constant(1, dtype=tf.float32),
    loop_vars=(tf.constant(1, dtype=tf.int32),))  # Error -- inconsistent dtypes: int32, float32
# Example of illegal shape change in a loop:
x = tf.constant(1,)
while tf.random.uniform(()) > 0.5:
  x = tf.constant((1, 2, 3))  # Error -- inconsistent shapes: (), (3,)

如果控制流中，有None或者未定义的情况，同样也会报错。

原代码的可达性

eager模式下可以执行运行时可见的各种原代码，但是也有例外：
1）在python交互式环境中的代码无法执行，例如ipython或者jupyter lab
2）带有原生绑定的函数，例如其它语言的代码
3）用exec或者eval执行的动态代码

inspect.getsource(object)可以用来检查代码的可达性。https://docs.python.org/3/library/inspect.html#inspect.getsource

对于lambda类型的函数，例如：

foo = (
 'bar',
 lambda: x)

这种情况比较简单，函数的定义就在lambda表达式里，是没有问题的。如果有嵌套的情况，应该在调用之前对被调用函数进行申明，例如：

my_lambda = lambda: x
foo = ('bar', my_lambda)

Eager训练模式

首先来看这个例子：

w = tf.Variable([[1.0]])
# 前向计算，得到 loss
with tf.GradientTape() as tape:
  loss = w * w

grad = tape.gradient(loss, w)
print(grad)  # => tf.Tensor([[ 2.]], shape=(1, 1), dtype=float32)

这就是eager execution的训练模式。在eager 模式下，可以使用tf.GradientTape 跟踪、记录。Tape可以形象地理解为一个磁带，做反向计算就相当于是在“倒带”。以多元线性回归为例：

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sat Jan 18 01:36:15 2020

@author: [email protected]
"""
import tensorflow as tf


# A toy dataset of points around 3 * x + 2
# 加一点噪声生成训练数据；
NUM_EXAMPLES = 1000
training_inputs = tf.random.normal([NUM_EXAMPLES,4])
noise = tf.random.normal([NUM_EXAMPLES])

training_outputs = tf.matmul(training_inputs,[[2.7],[3.1],[5.4],[8.9]])+6.5+noise

def prediction(indata, weight, bias):
  return tf.matmul(indata, weight) + bias

# loss 采用均方误差
def loss(weights, biases):
  error = prediction(training_inputs, weights, biases) - training_outputs
  return tf.reduce_mean(tf.square(error))

# Return the derivative of loss with respect to weight and bias
def grad(weights, biases):
  # 前向计算，得到 loss，同时将操作记录到 tape 上，用于计算梯度
  with tf.GradientTape() as tape:
    loss_value = loss(weights, biases)
  # 反向播放 tape，得到梯度；
  return tape.gradient(loss_value, [weights, biases])

train_steps = 300
learning_rate = 0.01
# Start with arbitrary values for W and B on the same batch of data
W = tf.Variable([[0.],[0.],[0.],[0.]])
B = tf.Variable(0.)

print("Initial loss: {:.3f}".format(loss(W, B)))

for i in range(train_steps):
  dW, dB = grad(W, B)
  W.assign_sub(dW * learning_rate) # W = W - dW * learning_rate 
  B.assign_sub(dB * learning_rate) # B = B - dB * learning_rate
  if i % 50 == 0:
    print("Loss at step {:03d}: {:.3f}".format(i, loss(W, B)))

print("Final loss: {:.3f}".format(loss(W, B)))
print("W = {}, B = {}".format(W.numpy(), B.numpy()))

得到

Initial loss: 161.488
Loss at step 000: 155.372
Loss at step 050: 23.209
Loss at step 100: 4.175
Loss at step 150: 1.404
Loss at step 200: 0.996
Loss at step 250: 0.936
Final loss: 0.927
W = [[2.6918666]
[3.0815856]
[5.377633 ]
[8.876133 ]], B = 6.478857517242432

tensorflow2.0系列（4）: Eager Execution和Auto Graph

目录

静态图的弊端

Eager模式

Eager execution的基本特性

对 numpy 的支持

Auto Graph - 动态图

static analysis VS dynamic flow

局部参数的可见域

python collections 数据在tensorflow控制流的使用

tensorflow 控制流中tensor的维度和数据类型

动态计算与静态维度

dtype和shape的一致性

原代码的可达性

Eager训练模式

更多阅读：

tf.Variable() 及其assign

python collections

tensorflow2.0系列（4）: Eager Execution和Auto Graph