Problems with using a rough greyscale algorithm?
Clash Royale CLAN TAG#URR8PPP
Problems with using a rough greyscale algorithm?
So I'm designing a few programs for editing photos in python
using PIL
and one of them was converting an image to greyscale (I'm avoiding the use of any functions from PIL
).
python
PIL
PIL
The algorithm I've employed is simple: for each pixel (colour-depth is 24), I've calculated the average of the R
, G
and B
values and set the RGB values to this average.
R
G
B
My program was producing greyscale images which seemed accurate, but I was wondering if I'd employed the correct algorithm, and I came across this answer to a question, where it seems that the 'correct' algorithm is to calculate 0.299 R + 0.587 G + 0.114 B
.
0.299 R + 0.587 G + 0.114 B
I decided to compare my program to this algorithm. I generated a greyscale image using my program and another one (using the same input) from a website online (the top Google result for 'image to grayscale'
.
'image to grayscale'
To my naked eye, it seemed that they were exactly the same, and if there was any variation, I couldn't see it. However, I decided to use this website (top Google result for 'compare two images online'
) to compare my greyscale images. It turned out that deep in the pixels, they had slight variations, but none which were perceivable to the human eye at a first glance (differences can be spotted, but usually only when the images are laid upon each other or switched between within milliseconds).
'compare two images online'
My Questions (the first is the main question):
My key piece of code (if needed):
def greyScale(pixelTuple):
return tuple([round(sum(pixelTuple) / 3)] * 3)
The 'correct' algorithm (which seems to heavily weight green):
def greyScale(pixelTuple):
return tuple([round(0.299 * pixelTuple[0] + 0.587 * pixelTuple[1] + 0.114 * pixelTuple[2])] * 3)
My input image:
The greyscale image my algorithm produces:
The greyscale image which is 'correct':
When the greyscale images are compared online (highlighted red are the differences, using a fuzz of 10%):
Despite the variations in pixels highlighted above, the greyscale images above appear as nearly the exact same (at least, to me).
Also, regarding my first question, if anyone's interested, this site has done some analysis on different algorithms for conversions to greyscale and also has some custom algorithms.
EDIT:
In response to @Szulat's answer, my algorithm actually produces this image instead (ignore the bad cropping, the original image had three circles but I only needed the first one):
In case people are wondering what the reason for converting to greyscale is (as it seems that the algorithm depends on the purpose), I'm just making some simple photo editing tools in python
so that I can have a mini-Photoshop and don't need to rely on the Internet to apply filters and effects.
python
Reason for Bounty: Different answers here are covering different things, which are all relevant and helpful. This makes it quite difficult to choose which answer to accept. I've started a bounty because I like a few answers listed here, but also because it'd be nice to have a single answer which covers everything I need for this question.
Also check out that the way to grayscale has heavy influence on the aesthetic: photo.stackexchange.com/questions/86599/…
– Framester
Aug 13 at 9:00
To get the correct formula also check out: stackoverflow.com/questions/596216/…
– Framester
Aug 13 at 9:01
More detail is preserved in the bird's feathers in, as you say, the 'correct' algorithm
– Kaspars
Aug 13 at 11:35
"but none which were perceivable to the human eye": I put the two images in two Firefox tabs and used (Shift+)Ctrl+Tab to switch between them. To my eyes, the difference is very large; in fact, it is impossible not to see it. But I do agree that none of the options is 'obviously' better than the other, and -- of course -- the adjective 'better' is highly subjective and/or dependent on your particular application.
– Andreas Rejbrand
Aug 13 at 11:44
7 Answers
7
The images look pretty similar, but your eye can tell the difference, specially if you put one in place of the other:
For example, you can note that the flowers in the background look brighter in the averaging conversion.
It is not that there is anything intrinsically "bad" about averaging the three channels. The reason for that formula is that we do not perceive red, green and blue equally, so their contributions to the intensities in a grayscale image shouldn't be the same; since we perceive green more intensely, green pixels should look brighter on grayscale. However, as commented by Mark there is no unique perfect conversion to grayscale, since we see in color, and in any case everyone's vision is slightly different, so any formula will just try to make an approximation so pixel intensities feel "right" for most people.
I haven't awarded the bounty to you as I feel that the number of upvotes on your answer is enough :) I've accepted however, as I truly feel that your answer's quite nice, and the
.gif
only has a positive effect :) Thanks!– Adi219
Aug 22 at 12:26
.gif
@Adi219 No problem, the number of upvotes is indeed way more than I was expecting. Never underestimate bird gifs I guess. Thank you for accepting the answer.
– jdehesa
Aug 22 at 12:30
Indeed, I quite liked the bird
.gif
(and I feel loads of others did too :) ). I was also quite surprised by the amount of upvotes my question received as well as the number of views :) No problem! I also respect this answer as you've referred to @MarkSetchell's answer which I feel is quite nice, especially since his answer was the first and I was initially going to accept his answer back on the day on which I posted this question. I appreciate this! :)– Adi219
Aug 22 at 12:36
.gif
The most obvious example:
Original
Desaturated in Gimp (Lightness mode - this is what your algorithm does)
Desaturated in Gimp (Luminosity mode - this is what our eyes do)
So, don't average RGB. Averaging RGB is simply wrong!
(Okay, you're right, averaging might be valid in some obscure applications, even though it has no physical or physiological meaning when RGB values are treated as color. By the way, the "regular" way of doing weighted averaging is also incorrect in a more subtle way because of gamma. sRGB should be first linearized and then the final result converted back to sRGB (which would be equivalent of retrieving the L component in the Lab color space))
+1 for the very illustrative image. -1 for "averaging RGB is simply wrong", since it entirely depends on the current application.
– Andreas Rejbrand
Aug 13 at 16:42
en.wikipedia.org/wiki/File:7bit-each.svg another good demonstration image of eye sensitivity to rgb
– qwr
Aug 14 at 4:36
@AndreasRejbrand Despite the number of upvotes this answer has, no upvotes from me, as the illustrative image isn't what my algorithm produces (see my edit to my question) , despite this answer's claims that my algorithm has the same effect as Desaturated in Gimp in Lightness mode.
– Adi219
Aug 14 at 7:54
Despite the number of upvotes this answer has received, I'm not awarding this answer anything due to the technical inaccuracies present, even after they were pointed out days before.
– Adi219
Aug 22 at 12:27
yeah, the "crowd wisdom" is sometimes disappointing... of course the picture is incorrect and i still don't have time to update it :-( (although i believe it does not change the conclusion)
– szulat
Aug 22 at 13:13
You can use any conversion equation, scale, linearity. The one you found:
I = 0.299 R + 0.587 G + 0.114 B
is based on average human eye "average" primary color (R,G,B) perception sensitivity (at least for the time period and population/HW it was created on; bear in mind those standards were created before LED,TFT, etc. screens).
There are several problems you are fighting against:
our eyes are not the same
All humans do not perceive color the same way. There are major discrepancies between genders and smaller also between regions; even generation and age play a role. So even an average should be handled as "average".
We have different sensitivity to intensity of light across the visible spectrum. The most sensitive color is green (hence the highest weight on it). But the XYZ curve peaks can be at different wavelengths for different people (like me I got them shifted a bit causing difference in recognition of certain wavelengths like some shades of Aqua - some see them as green some as blue even if none of them have any color blindness disabilities or whatever).
monitors do not use the same wavelengths nor spectral dispersion
So if you take 2 different monitors, they might use slightly different wavelengths for R, G, B or even different widths of the spectral filter (just use a spectroscope and see). Yes they should be "normalized" by the HW but that is not the same as using normalized wavelengths. It is similar to problems using RGB vs. White Noise spectrum light sources.
monitor linearity
Humans do not see on a linear scale: we are usually logarithmic/exponential (depends how you look at it) so yes we can normalize that with HW (or even SW) but the problem is if we linearize for one human then means we damage it for another.
If you take all this together you can either use averages ... or special (and expensive) equipment to measure/normalize against some standard or against a calibrated person (depends on the industry).
But that is too much to handle in home conditions so leave all that for industry and use the weights for "average" like most of the world... Luckily our brain can handle it as you cannot see the difference unless you start comparing both images side by side or in an animation :). So I (would) do:
I = 0.299 R + 0.587 G + 0.114 B
R = I
G = I
B = I
I've awarded this answer the bounty as I feel that it deserves more upvotes and the answer itself is actually quite detailed, so it's quite nice :)
– Adi219
Aug 22 at 12:28
@Adi219 thx ... the lack of votes is most likely due to absence of images as I did not want to add redundant images other answers already got nor copy the images from linked QAs ... PS there are also applications where
I= R+G+B
is needed like this convert RGB pixel to wavelength but those are usually only for special reasons/tasks/HW ...– Spektre
Aug 22 at 15:51
I= R+G+B
I thought it was just due to the day you answered, but maybe you're right. But yeah, I understand your decision. Thanks!
– Adi219
Aug 23 at 8:53
There are many different methods for converting to greyscale, and they do give different results though the differences might be easier to see with different input colour images.
As we don't really see in greyscale, the "best" method is somewhat dependent on the application and somewhat in the eye of the beholder.
The alternative formula you refer to is based on the human eye being more sensitive to variations in green tones and therefore giving them a bigger weighting - similarly to a Bayer array in a camera where there are 2 green pixels for each red and blue one. Wiki - Bayer array
I know you were the first to answer, and I quite liked your answer, but simply put, the other answers just have more detail. Sorry :(
– Adi219
Aug 22 at 12:24
That's cool - you are at liberty to choose whichever answer you prefer - no complaints from me! Like other responders, I was just trying to help. Good luck with your project!
– Mark Setchell
Aug 22 at 12:27
Thanks for understanding :) Also, thanks for the answer! Nearly all the answers here have helped me tremendously; if all goes well, I'll put all of my photo-editing programs on Github for others to use. Thanks! :)
– Adi219
Aug 22 at 12:33
There are many formulas for the Luminance, depending on the R,G,B color primaries:
Rec.601/NTSC: Y = 0.299*R + 0.587*G + 0.114*B ,
Rec.709/EBU: Y = 0.213*R + 0.715*G + 0.072*B ,
Rec.2020/UHD: Y = 0.263*R + 0.678*G + 0.059*B .
This is all because our eyes are less sensitive to blue than to red than to green.
That being said, you are probably calculating Luma, not Luminance, so the formulas are all wrong anyway. For Constant-Luminance you must convert to linear-light
R = R' ^ 2.4 , G = G' ^ 2.4 , B = B' ^ 2.4 ,
apply the Luminance formula, and convert back to the gamma domain
Y' = Y ^ (1/2.4) .
Also, consider that converting a 3D color space to a 1D quantity loses 2/3 of the information, which can bite you in the next processing steps. Depending on the problem, sometimes a different formula is better, like V = MAX(R,G,B) (from HSV color space).
How do I know? I'm a follower and friend of Dr. Poynton.
I'm sorry, but who is Dr. Poynton???
– Adi219
Aug 22 at 6:53
Sorry, but this answer makes little sense to me as a non-expert in colour theory.
– Adi219
Aug 22 at 12:31
Dr. Poynton is Charles Poynton, recently promoted to PhD. He has written several authoritative books in the field of video processing. Recommended reading.
– StessenJ
yesterday
One of the things learned in video processing is that color space conversion must be done on linear-light signals. The compression done by the OETF, e.g. the gamma function, must first be undone, as in my example. If not, then the greyscale values for red and blue will be too low, too dark. This shows the "Constant Luminance Error" of (Y',Cb,Cr) signals, i.e. Cb,Cr carry some of the Luminance too, for red and blue.
– StessenJ
yesterday
The answers provided are enough, but I want to discuss a bit more on this topic in a different manner.
Since I learnt digital painting for interest, more often I use HSV.
It is much more controllable for using HSV during painting, but keep it short, the main point is the S: Saturation separating the concept of color from the light. And turning S to 0, is already the 'computer' grey scale of image.
from PIL import Image
import colorsys
def togrey(img):
if isinstance(img,Image.Image):
r,g,b = img.split()
R =
G =
B =
for rd,gn,bl in zip(r.getdata(),g.getdata(),b.getdata()) :
h,s,v = colorsys.rgb_to_hsv(rd/255.,gn/255.,bl/255.)
s = 0
_r,_g,_b = colorsys.hsv_to_rgb(h,s,v)
R.append(int(_r*255.))
G.append(int(_g*255.))
B.append(int(_b*255.))
r.putdata(R)
g.putdata(G)
b.putdata(B)
return Image.merge('RGB',(r,g,b))
else:
return None
a = Image.open('../a.jpg')
b = togrey(a)
b.save('../b.jpg')
This method truly reserved the 'bright' of original color. However, without considering how human eye process the data.
I think HSV has nothing to do with considering the human eye. If you look at the colorsys conversions you can see that
rgb
->hsv
sets v
to max(r, g, b)
, and converting hsv
->rgb
returns (v, v, v)
when s
== 0
. So, there's no magic -- just a different solution. Here's the image grey = max(r, g, b)
produces.– Alistair Carscadden
Aug 16 at 9:59
rgb
hsv
v
max(r, g, b)
hsv
rgb
(v, v, v)
s
0
grey = max(r, g, b)
More off topic, I really like the
grey = max(r, g, b)
image. Great contrast, bright bird.– Alistair Carscadden
Aug 16 at 10:00
grey = max(r, g, b)
@AlistairCarscadden , yes, as I stated at the end, this method is not considering how human eye process the data
– MatrixTai
Aug 16 at 10:03
And I commented because I don't agree, I think the method does consider how we see light. Something bright blue is bright, something bright green is bright, and something bright red is bright. So, max(r, g, b) considers that entirely.
– Alistair Carscadden
Aug 16 at 10:08
@AlistairCarscadden, not really, consider a photo taken under long exposure, you can recognize the color but losing most of detail. In this case, the pic using HSV method will definitely worst. So, in fact that's actually a contrast map.
– MatrixTai
Aug 16 at 12:25
In answer to your main question, there are disadvantages in using any single measure of grey. It depends on what you want from your image. For example, if you have colored text on white background, if you want to make the text stand out you can use the minimum of the r, g, b values as your measure. But if you have black text on a colored background, you can use the maximum of the values for the same result. In my software I offer the option of max, min or median value for the user to choose. The results on continuous tone images are also illuminating.
In response to comments asking for more details, the code for a pixel is below (without any defensive measures).
int Ind0[3] = 0, 1, 2; //all equal
int Ind1[3] = 2, 1, 0; // top, mid ,bot from mask...
int Ind2[3] = 1, 0, 2;
int Ind3[3] = 1, 2, 0;
int Ind4[3] = 0, 2, 1;
int Ind5[3] = 2, 0, 1;
int Ind6[3] = 0, 1, 2;
int Ind7[3] = -1, -1, -1; // not possible
int *Inds[8] = Ind0, Ind1, Ind2, Ind3, Ind4, Ind5, Ind6, Ind7;
void grecolor(unsigned char *rgb, int bri, unsigned char *grey)
//pick out bot, mid or top according to bri flag
int r = rgb[0];
int g = rgb[1];
int b = rgb[2];
int mask = 0;
mask
This is just an attempt at indirectly self-promoting your own software.
– Adi219
Aug 22 at 12:29
How can that be, if I have not identified the software? Actually I am pointing out that the discussion so far has focussed on correctness, but often it is user choice that is important.
– Steve J
Aug 23 at 13:08
You literally mention two use cases which use different methods, then state that your software allows you to choose which method you want. That's quite clearly indirect self-promotion, as anybody who wants to find out more is essentially going to be asking for your software, as there are no other significant points in your answer.
– Adi219
Aug 23 at 13:16
Sorry, I thought I had given enough information for people to implement the procedure. I have edited my post to include the code.
– Steve J
2 days ago
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
You should notice that the differences occur where the input image is very green, because the "correct" formula is weighted towards green.
– Mark Setchell
Aug 13 at 8:57