sqlalchemy在遍历大量结果ORM对象时占用内存问题解决

程序员文章站 2022-04-29 20:21:23

...

python在内存管理上有一个特点，分配出去的内存，及时对象的引用计数为0，也不会立即释放内存，而是作为内存缓存，等待下次分配，到某个时机才会回收内存，因此在使用sqlalchemy的时候，如果查询结果包含大量结果（大于1000个），需要遍历每个ORM的时候，调用query().all()方法会导致内存激增（sqlalchemy会把所有对象放在内存中），下面是遍历290000+条记录的内存激增：

[I 160802 18:17:05 xxxx:134] c6833 Memory:   3.7%   662M/7870M
[I 160802 18:18:53 xxxx:140] c6833 after xxxx
[I 160802 18:18:53 xxxx:141] c6833 Memory:  29.7%  2716M/7870M

改用query().yield_per(1000)之后，内存分配就不会那么多了：

[I 160802 18:12:15 xxxx:134] b1213 Memory:   2.9%   600M/7870M
[I 160802 18:13:39 xxxx:140] b1213 after xxxx
[I 160802 18:13:39 xxxx:141] b1213 Memory:   9.4%  1112M/7870M

官方手册对于yield_per的描述如下：

sqlalchemy官方手册写道

The purpose of this method is when fetching very large result sets (> 10K rows), to batch results in sub-collections and yield them out partially, so that the Python interpreter doesn’t need to declare very large areas of memory which is both time consuming and leads to excessive memory use. The performance from fetching hundreds of thousands of rows can often double when a suitable yield-per setting (e.g. approximately 1000) is used, even with DBAPIs that buffer rows (which are most).

sqlalchemy官方手册：http://docs.sqlalchemy.org/en/latest/orm/query.html