初学人工智能,因为TensorFlow最为流行,于是也入了TF的坑。最近在尝试做图片对比的模型,其中涉及到图片大小的问题,度娘查到空间金字塔池化对于图片大小的问题能够做出比较好的处理。论文请见Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition。于是把Paper读了一下,但论文里没有给代码实现,本想去GitHub或其他地方搜罗一下大牛们的实现,结果却发现居然没搜到TensorFlow的空间金字塔池化的实现。当然,也有可能是我没找到。找到了基于Caffe的,不过Caffe不熟悉,没辙,只有自己参考着用TensorFlow来实现了一下。废话就不多说了,看完论文SPP的原理肯定大家都懂,这里我就直接上TF的代码了。
def spp_layer(self, input_, levels=[2, 1], name = 'SPP_layer'): '''Multiple Level SPP layer. Works for levels=[1, 2, 3, 6].''' shape = input_.get_shape().as_list() with tf.variable_scope(name): pool_outputs = [] for l in levels: pool = tf.nn.max_pool(input_, ksize=[1, np.ceil(shape[1] * 1. / l).astype(np.int32), np.ceil(shape[2] * 1. / l).astype(np.int32), 1], strides=[1, np.floor(shape[1] * 1. / l + 1).astype(np.int32), np.floor(shape[2] * 1. / l + 1), 1], padding='SAME') print("Pool Level {:}: shape {:}".format(l, pool.get_shape().as_list())) h = int(pool.get_shape()[1]) w = int(pool.get_shape()[2]) pool = tf.reshape(pool,[-1]) pool = tf.reshape(pool,[-1, h * w, 64, 1]) # pool_outputs.append(tf.reshape(pool, [tf.shape(pool)[1], -1])) pool_outputs.append(pool) spp_pool = tf.concat(pool_outputs, 1) return spp_pool