The problem of choosing colors for data visualization is expressed by this quote from information visualization guru Edward Tufte:
"Avoiding catastrophe becomes the first principle in bringing color to information: Above all, do no harm." — Envisioning Information, Edward Tufte, Graphics Press, 1990.
Tufte calls the most important use of color in information presentation "labeling". By this he means the function of distinguishing one element from another.
According to the Dataspora Blog, color is one of the most abused and neglected tools in the field of data visualization. It is abused when we make poor color choices; it is neglected when we rely on poor built-in color defaults. Yet despite its traditional poor treatment by groups of engineers and end-users alike, if used well, it can enhance and clarify a presentation. Color used poorly is likely to obscure information rather than highlight it. While there is a strong aesthetic component to color, using it well in information display is essentially about function: what information you are trying to convey, and how or whether color can enhance it.
Take a look at this example using Dr Who villains data below. At the top of the dashboard you see the information (total number of episodes and villain) displayed in a black and white bar chart. This graph is clear in displaying the two variables, and use of color would just confuse the reader because consciously or not, when people look at a data display and see visual differences, they try to determine the meaning to those differences (and in this case colour would add no meaning or value). But what if we wanted to cross compare this information with another measure? This is where color comes in handy.
Why use color in data graphics?
If the data is simple, a single color is sufficient, even preferable, as you see from the first chart. However, if we want to layer another dimension of data — number of stories total — into our chart, we can choose to do this by color. If we take a look at the same chart, but with the added dimension, this is displayed by the chart at the bottom. You can see that with a splash of color it is much clearer to see not only the number of episodes each villain appeared in, but also which ones appeared in the most and least number of storylines.
So why bother with color?
First, as compared to most print media, computer displays have fewer units of space, but a broader color gamut. So color is a compensatory strength.
For multi-dimensional data, color can illustrate additional dimensions inside a unit of space — and can do so immediately. Color differences can be detected within 200ms, before you are even conscious of paying attention. But the most important reason to use color in multivariate graphics is that color is, in itself, multidimensional.
We could have used other methods other than color - plotting symbols or small multiples for example - but to avoid having to change the chart type, we found color to be the most suitable. This shows color can be used in powerful ways to enhance the meaning and clarity of data displays, but only when we understand how it works and what it does well. Our advice : Whenever you’re tempted to add color to a data display, ask yourself these questions: “Will this color serve a purpose?” and “Will it serve this purpose effectively?”, and if the answer is yes to both - by all means go ahead and use it. As Tufte says, "If the information is worth displaying, it’s worth displaying well."