Testing hypotheses is a critical part of the scientific process.
But…you need to understand your data first.
Visualizing your data can go a long way to helping you explore your data.
We pick up on visual cues much quicker than reading (i.e., what catches our eye).
Some things hit us pretty quickly:
mpg | cyl | disp | hp | drat | wt | qsec | vs | |
---|---|---|---|---|---|---|---|---|
mpg | 1.00 | -0.85 | -0.85 | -0.78 | 0.68 | -0.87 | 0.42 | 0.66 |
cyl | -0.85 | 1.00 | 0.90 | 0.83 | -0.70 | 0.78 | -0.59 | -0.81 |
disp | -0.85 | 0.90 | 1.00 | 0.79 | -0.71 | 0.89 | -0.43 | -0.71 |
hp | -0.78 | 0.83 | 0.79 | 1.00 | -0.45 | 0.66 | -0.71 | -0.72 |
drat | 0.68 | -0.70 | -0.71 | -0.45 | 1.00 | -0.71 | 0.09 | 0.44 |
wt | -0.87 | 0.78 | 0.89 | 0.66 | -0.71 | 1.00 | -0.17 | -0.55 |
qsec | 0.42 | -0.59 | -0.43 | -0.71 | 0.09 | -0.17 | 1.00 | 0.74 |
vs | 0.66 | -0.81 | -0.71 | -0.72 | 0.44 | -0.55 | 0.74 | 1.00 |
Currently, most journals "need" to see test statistics to know if there is a relationship.
Not everything needs a test statistic to tell you that a real relationship exists (or not).
Color is generally used for the following:
Groups
Intensity
Only use color when it is actually doing something to aid interpretation.
Much like color, size can convey very distinct ideas:
Value size
Frequency
You need to be very explicit to which you are referring!
Shape is a bit more limited than color and size.
It really only helps to denote group membership.
And only does well when things are sparse.
We are frequently fortunate enough to have a considerable amount of data.
Depending on what type of plotting we want to do, we could run into problems very quickly.
If you have ever held a jumping bean or drank too much coffee, you might already know about jitter.
Jitter is useful if we have some minor overplotting issues.
We often want to compare a plot over several different levels.
We have already seen ways to use color and other items to make groups distinct.
If you want to be daring, though, there are other methods.
We just saw a glimpse about how we can use multiple principles to convey information.
The default is always some version of "No".
Use for:
When you do use them, they should be very faint.
Given the confines of publishing, we cannot always default to interactive visualization.
There are journals and outlets making progress on this front!
Journal of Computational and Graphical Statistics anyone?
However, it is a great place to start when exploring data.
Grid lines…who needs them?
Mike Bostok – D3
Cynthia Brewer – Color Brewer
William Cleveland – Pioneer (Citation Champ)
Stephen Few – All Things Viz
Michael Friendly – Psychologist and Viz Expert
Paul Tol – Physicist and Viz Expert
Edward Tufte – Information Display
Hadley Wickham – Reigning Data Science Heavyweight Champion
Leland Wilkinson – The Grammar of Graphics