Purpose

Exploration

Testing hypotheses is a critical part of the scientific process.

But…you need to understand your data first.

  • What are some relationships, how are variables distributed, etc.

Visualizing your data can go a long way to helping you explore your data.

Preattentive Processing

We pick up on visual cues much quicker than reading (i.e., what catches our eye).

  • That is why entire disciplines are devoted to visual processing.

Some things hit us pretty quickly:

  • color - hue & intensity
  • shape
  • space/density
  • line width & orientation
  • size

Preattentive Processing In Action

Correlation Matrix

mpg cyl disp hp drat wt qsec vs
mpg 1.00 -0.85 -0.85 -0.78 0.68 -0.87 0.42 0.66
cyl -0.85 1.00 0.90 0.83 -0.70 0.78 -0.59 -0.81
disp -0.85 0.90 1.00 0.79 -0.71 0.89 -0.43 -0.71
hp -0.78 0.83 0.79 1.00 -0.45 0.66 -0.71 -0.72
drat 0.68 -0.70 -0.71 -0.45 1.00 -0.71 0.09 0.44
wt -0.87 0.78 0.89 0.66 -0.71 1.00 -0.17 -0.55
qsec 0.42 -0.59 -0.43 -0.71 0.09 -0.17 1.00 0.74
vs 0.66 -0.81 -0.71 -0.72 0.44 -0.55 0.74 1.00

Corrplot

Heat Map

Inference

Currently, most journals "need" to see test statistics to know if there is a relationship.

  • The golden p-value is the driver of science!

Not everything needs a test statistic to tell you that a real relationship exists (or not).

Interocular Traumatic Impact

Maybe A Better Fit

Principles And Properties

What Was And What Should Never Be

Color

Color is generally used for the following:

  • Groups

  • Intensity

Only use color when it is actually doing something to aid interpretation.

Color - Groups

Color - Intensity

Color - Brewer Palettes

Color - Tol Palettes

Color - Few Palettes

Size

Much like color, size can convey very distinct ideas:

  • Value size

  • Frequency

You need to be very explicit to which you are referring!

Size

Shape

Shape is a bit more limited than color and size.

  • It really only helps to denote group membership.

  • And only does well when things are sparse.

Shape - Linetypes

Where We Have Trouble…

Shape - Points

Alpha

  • No, we are not talking about Cronbach.
  • We probably all know about the RGB color model.
  • But…did you know that it can have a fourth channel?
  • Alpha!
  • Alpha is essentially transparency.
    • 0 is completely translucent

How Does Alpha Help

We are frequently fortunate enough to have a considerable amount of data.

Depending on what type of plotting we want to do, we could run into problems very quickly.

Alpha

Jitter

If you have ever held a jumping bean or drank too much coffee, you might already know about jitter.

Jitter is useful if we have some minor overplotting issues.

  • It would not be a good fix for what we just saw.

Looks Pretty Clear…

After Jittering

An Alternative

Other Ideas

Small Multiples/Faceting

We often want to compare a plot over several different levels.

We have already seen ways to use color and other items to make groups distinct.

If you want to be daring, though, there are other methods.

Small Multiples/Faceting

Multiple Relationships – Very Powerful

Combining Features

We just saw a glimpse about how we can use multiple principles to convey information.

  • We had a two-variable facet, size, and density estimates.

Combining Features

Other Combinations

Tips (And Maybe Tricks)

Grid Lines

The default is always some version of "No".

When To Use Grid Lines

Use for:

  • Large plots
  • Make differences clear
  • Comparing values on categorical scales
  • Narrowing focus

When you do use them, they should be very faint.

Large Plots

Make Differences Clearer

Comparing Categorical Scale

Narrowing Focus

Interactivity

Given the confines of publishing, we cannot always default to interactive visualization.

  • There are journals and outlets making progress on this front!

  • Journal of Computational and Graphical Statistics anyone?

However, it is a great place to start when exploring data.

  • It is also great for disseminating information on the web.

Making Old Trouble Disappear

Grid lines…who needs them?

We All Remember This Monster

The Clutter Is Strong With This One

Still A Better Version

Important People

Mike Bostok – D3

Cynthia Brewer – Color Brewer

William Cleveland – Pioneer (Citation Champ)

Stephen Few – All Things Viz

Michael Friendly – Psychologist and Viz Expert

Paul Tol – Physicist and Viz Expert

Edward Tufte – Information Display

Hadley Wickham – Reigning Data Science Heavyweight Champion

Leland Wilkinson – The Grammar of Graphics

Shiny Demo

Thanks!