KNOWLEDGEBASE - ARTICLE #2183

Violin Plots and Logarithmic Axes

Violins are the result of a calculation based on the original data

This page does not get deeply involved in the mathematics behind how violin plots are created, but the most important thing to remember is that a violin is created as a means to show an estimated data density distribution, based on the original, entered data. Because of this, violins shown on an axis that is not linear (i.e. logarithmic axes or probability axes) will likely be confusing and potentially misleading many who view the graph. The rest of this page discusses specific details of plotting violins on logarithmic axes

Two main problems with Violin Plots and Logarithmic Axes

When considering a violin plot that has been graphed on a logarithmic Y axis, there are two important issues that must be considered. Each of these two issues result in their own unique visual properties of the violin plots (when using a logarithmic axis), and each can lead to serious confusion if not handled properly. A brief summary of these two issues is as follows:

Issue 1

Even though the data used to generate a violin plot contains only positive numbers, the violin itself may extend beyond zero into negative values. This is problematic because logarithms can't be negative (or zero).

                           Linear Y axis                                                           Logarithmic Y axis

Issue 2

The width of violin plots is determined by examining the distance between values in a linear fashion. This is problematic because the distance between values on a logarithmic axis is not uniform.

                           Linear Y axis                                                             Logarithmic Y axis

Main Takeaway

Using a violin plot on a logarithmic axis is more complicated than it may seem at first, and the results may be potentially misleading. As a result, it is strongly recommended that you avoid using this combination of settings without understanding what the results are showing you. The rest of this page provides a thorough explanation of both of the issues listed above, using visual examples of how these issue may present themselves when looking at violin plots on a logarithmic axis.

I don't have time to read this whole page... what should I do?

The most important thing to remember is that a violin plot is created from the original, entered data. In general, the width of the violin is directly related to the estimated distribution of the data at a given Y value. Changing the Y axis from linear to logarithmic doesn't transform the data, it only stretches/squishes where the Y values are displayed. It can be argued that the way Prism displays violin plots (beginning in 8.4.3) is the "most correct" way to depict this visualization of your original data. In fact, that's what the rest of this page attempts to do! However, if you've created a violin plot of your data, chosen a logarithmic axis for the Y axis, and the violin doesn't appear to "follow the data" as you expected, try the following:

  • Transform the original data using Y = log(Y)

  • Create a violin plot of the transformed data

  • In the Format Axes dialog, leave the Scale of the Y axis as Linear

  • In the same dialog, in the "Regularly spaced ticks" section, choose the option "Antilog" in the Format dropdown

The resulting graph will be a violin plot of data that was log transformed, but plotted on a linear axis.

                    Linear Y axis (original data)                  Linear Y axis (transformed data, Antilog ticks)

Issue 1: Logarithms can't be negative, but my violin plot is

Violin plots come in two main varieties: "truncated" or "extended". With a "truncated" violin plot, the curve of the violin extends only to the minimum and maximum values in the data set. At those values, the curve is trimmed, forming a horizontal line connecting both sides of the violin. With an "extended" violin plot, the curve of the violin extends beyond the minimum and maximum values as a result of the algorithm used to create the violin itself.* Depending on who you talk to, a "normal" violin plot could mean either one of these, and Prism provides the ability to choose which of these two approaches you'd like to use.

As you can see from this image, the truncated violin ends at the minimum value in the data. More importantly, this minimum data value is greater than zero. In comparison, the extended violin goes beyond the minimum and maximum value of the data, and in this case, the bottom of the violin actually extends into negative values. If we change the scale of the Y axis to a logarithmic scale, we get the following graph appearance (in this case, log10 is used, but all logarithmic scales will have similar appearances as logarithms can't be zero or negative).

Note what happened to each version of the violin plot. For the truncated violin plot, the minimum can be observed as it is greater than 0 (the minimum in the data set used to create these violins was 2). However, the extended violin appears to travel beyond the X axis (in the image above, the X axis intersects the Y axis at Y=1). This cannot be overcome by setting the X and Y axis intersection to a smaller Y value. In this case, the violin plot will always extend below the X axis since the X axis must intersect the Y axis at a positive Y value (once again, logarithms cannot be negative). Here's the same data with a logarithmic Y axis that extends from 100 down to 0.001:

What's important to remember about this issue?

First, you should remember that violins are created from the original, entered data. Even though the axis is being displayed on a logarithmic axis, the data have not been transformed in any way. As a result, the violin being displayed is simply being stretched/squished accordingly. When a violin extends into negative values and plotted on a logarithmic axis, it is - in essence - being stretched infinitely far (and you'll never be able to see the point where the two sides come back together). So instead, the violin simply extends to the X axis, regardless of what you set for the range of the Y axis.

Issue 2: The shape of my violin and the shape of the scatter plot (data points) aren't the same

Take a look at the violin plots on the graph below. Once again, the graph shows both a truncated and an extended violin plot. Additionally, this time each value is shown as an individual data point. The first thing to note is that this violin has been plotted on a linear axis. Like in the previous example, none of these values is actually negative (the minimum of this dataset is 1). On this scale, it's clear to see that there are a LOT of data points near the lower end of the range (values near zero). As such, the widest point of the violin occurs in this same general range. It may be slightly more difficult to see that the maximum width of this violin occurs at around a Y value of 800.

As in the previous section, the extended violin goes well into the negative values, so we expect that with a logarithmic Y axis, this violin will simply extend all the way to the X axis, while the truncated violin simply gets trimmed at the dataset minimum (again, at Y=1).

However, what MIGHT be surprising or perplexing is that the shape of the violin and the shape of the scatter plot no longer seem to match up. What happened here?

The explanation comes in two parts. The first part of the explanation is that the violin plot is created from the original, entered data. Changing the Y axis to a logarithmic scale doesn't change the original data, and thus shouldn't change the width of the generated violin. Remember earlier it seemed that the maximum width of the violin on the linear axis was at about 800. On the logarithmic axis, you can see that this maximum width is still at a Y value of just about 800. That's good! That means our violin is still showing the same information.

"Ok, but why does the scatter plot look different from the violin plot?" This is probably what you're asking yourself. The answer is that the data points - whether on an axis with a linear scale or a logarithmic scale - must still be placed at their given Y value. On a logarithmic scale, larger value ranges get "squished" compared to the same ranges on a linear scale. That means that for the values at the high end of this distribution, there's going to be less vertical space on a logarithmic scale for them to be plotted. As a result (and in order to show as many data points as possible without overlap), these points get shifted to the left and the right. The net result is that the violin is still showing the estimated distribution of the original, entered data for any given Y value, but the data points themselves have taken on the appearance of a log-transformation of the data.

Confusing, I know. But what's important to remember is that changing the scale of an axis does not change or transform the actual data! This problem frequently comes up when dealing with dose-response curves and X values that are either entered as raw concentration values or as log-transformed concentration values. Changing the scale of the axis doesn't actually transform these values, and so care must be used when selecting the appropriate model for curve-fitting.

"changing the scale of an axis does not change or transform the actual data"

Ultimately, Prism's defaults seem to be the "most correct" approach when generating violin plots on a linear or logarithmic scale. As demonstrated, when a violin is plotted on a logarithmic scale, it may not "match up" with the scatter of the data points. If you're still uncertain about the entire "violin plot on a logarithmic axis" issue, try selecting a different graph style (try just showing all of the data points!). However, it's very possible that you might want a violin plot that estimates this log-transformed distribution instead of the original, entered data. In an earlier section of this page, steps were provided on how to do just that. Simply log-transform the data before plotting it, and then create the violin plot from these transformed data.

 

*Violin plots are generated using a concept known as kernel density estimation (KDE). This FAQ will not go into the specific details of this technique, but if you'd like to know more Wikipedia has a somewhat "math-heavy" page explaining it. One important point to note about KDE is that the concept of "bandwidth" is strongly related to how smooth or jagged the resulting violin appears. Wider bandwidths tend to create smoother violins, while more narrow bandwidths create more variation in the edge of the violin. However, perhaps more importantly, when creating violin plots, the bandwidth is generally kept constant for all points making up the violin. This contributes to the second issue on this page since values that are numerically evenly distributed are not spatially evenly distributed on logarithmic axes. In other words, the "height" of the bandwidth is larger at the lower end of a logarithmic scale and smaller at the higher end of a logarithmic scale.



Keywords: violin plot logarithm logarithmic axis

Explore the Knowledgebase

Analyze, graph and present your scientific work easily with GraphPad Prism. No coding required.