skip to Main Content

I’m relatively new to Python and even more so to Tensorflow so I’ve been working through some tutorials such as this tutorial. A challenge given was to make an image greyscale. One approach taken here is to just take one colour channel value and duplicate it across all channels. Another is to take an average which can be achieved using tf.reduce_mean as done here. However there are many ways to make an image monochromatic as anyone who has played with GIMP or Photoshop will know. One standard method defined adjusts for the way humans perceive colour and requires that the three colour channels are individually adjusted this way:

Grey = (Red * 0.2126 + Green * 0.7152 + Blue * 0.0722)

Anyway I’ve achieved it by doing this:

import tensorflow as tf
import numpy as np
import matplotlib.image as mpimg
import matplotlib.pyplot as plt

filename = "MarshOrchid.jpg"
raw_image_data = mpimg.imread(filename)

image = tf.placeholder("float", [None, None, 3])

r = tf.slice(image,[0,0,0],[-1,-1,1])
g = tf.slice(image,[0,0,1],[-1,-1,1])
b = tf.slice(image,[0,0,2],[-1,-1,1])

r = tf.scalar_mul(0.2126,r)
g = tf.scalar_mul(0.7152,g)
b = tf.scalar_mul(0.0722,b)

grey = tf.add(r,tf.add(g,b))

out = tf.concat(2, [grey, grey, grey])
out = tf.cast(out, tf.uint8)


with tf.Session() as session:

    result = session.run(out, feed_dict={image: raw_image_data})

    plt.imshow(result)
    plt.show()

This to me seems hugely inelegant having to cut up the data and apply calculations and then recombine them. A matrix multiplication on individual RGB tuples would be efficient or barring that a function that takes an individual RGB tuple and returns a greyscaled tuple. I’ve looked at tf.map_fn but can’t seem to make it work for this.

Any suggestions or improvements?

2

Answers


  1. Chosen as BEST ANSWER

    So having really looked in to this topic, in the current release of tensorflow (r0.12) there doesn't appear to be a simple way to apply custom functions to tuples of values, especially if the result does not effect a reduce. As my initial effort and that of the answer from @xxi you pretty much have to dis-aggregate the tuples before applying a function to them collectively.

    I figured out another way to get the result that I wanted without slicing or unstacking but instead reshaping and matrix multiplication which is:

    import tensorflow as tf
    import numpy as np
    import matplotlib.image as mpimg
    import matplotlib.pyplot as plt
    
    filename = "MarshOrchid.jpg"
    raw_image_data = mpimg.imread(filename)
    
    image = tf.placeholder("float", [None, None, 3])
    
    out = tf.reshape(image, [-1,3])
    out = tf.matmul(out,[[0.2126, 0, 0], [0, 0.7152, 0], [0, 0, 0.0722]])
    out = tf.reduce_sum(out, 1, keep_dims=True)
    
    out = tf.concat(1, [out, out, out])
    out = tf.reshape(out, tf.shape(image))
    out = tf.cast(out, tf.uint8)
    
    
    with tf.Session() as session:
    
        result = session.run(out, feed_dict={image: raw_image_data})
    
        plt.imshow(result)
        plt.show()
    

    This worked for the narrow purpose of greyscaling an image but doesn't really give a design pattern to apply for dealing with more generic calculations.

    Out of curiosity I profiled these three methods in terms of execution time and memory usage. So which was better?

    • Method 1 - Slicing: 1.6 seconds & 1.0 GiB memory usage
    • Method 2 - Unstacking: 1.6 seconds & 1.1 GiB memory usage
    • Method 3 - Reshape: 1.4 seconds & 1.2 GiB memory usage

    So no major differences in performance but interesting nonetheless.

    In case you were wondering why the process is so slow, the image used is 5528 x 3685 pixels. But yeah still pretty slow compared to Gimp and others.


  2. How about this?

    img = tf.ones([100, 100, 3])
    r, g, b = tf.unstack(img, axis=2)
    grey = r * 0.2126 + g * 0.7152 + b * 0.0722
    out = tf.stack([grey, grey, grey], axis=2)
    out = tf.cast(out, tf.uint8)
    

    sample of map_fn
    shape of x is (2, 4), so shape of elms_fn is (4,)
    if shape of x is (100, 100, 3), shape of elms_fn will be (100, 3)

    x = tf.constant([[1, 2, 3, 4],
                     [5, 6, 7, 8]], dtype=tf.float32)
    
    def avg_fc(elms_fn):
        # shape of elms_fn is (4,)
        # compute average for each row and return it
        avg = tf.reduce_mean(elms_fn)
        return avg
    
    # map_fn will stack avg at axis 0
    res = tf.map_fn(avg_fc, x)
    
    with tf.Session() as sess:
        a = sess.run(res) #[2.5, 6.5]
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search