Time Normalization
Dylan Hicks emailed me recently about “time normalization” of vertical jump data. If you’re unfamiliar, time normalization (see here for an example)–better known as interpolation–involves re-sampling a known-length data set to a new length.
Like the paper I linked above, a common reason for performing time normalization is to standardize a data set’s length prior to performing comparisons. For example, say we want to compare an athlete’s SJ and CMJ force-time characteristics at multiple external loads (0kg, 10kg, and 20kg). Before even looking at the data, we know the trial lengths will differ between jump types and the duration will increase with increasing load. In the linked data, the trials range from 280ms - 955ms. Apples-to-apples curve comparisons aren’t really possible with the raw data since they’re different lengths, but we can interpolate new standard-length curves (e.g., 101 data points to represent 0% - 100% of the jump) to overcome our length discrepancy problem.
Interpolation in R
The great thing about R is that there’s a function for everything (or you can write your own, but that’s beside the point). In the case of interpolation, we’re going to rely on the approx()
function. approx()
performs linear interpolation of the data (i.e. it draws a line through each pair of data points and estimates the value at the new location), although you can implement other interpolation methods (e.g. spline or cubic) by calling their respective functions (spline()
and pracma::pchip()
, respectively). I’m going to assume you’re sampling at a high enough frequency that linear vs. polynomial interpolation isn’t a huge factor…and by huge factor, I mean the interpolated force values aren’t statistically or practically different from one another. I’m unaware of any papers that have empirically investigated this (or the sampling frequency at which the two methods do produce different values), but the data I’ve included here (sampled at 1000 Hz) are virtually identical for both approx()
and spline()
. Maybe the enterprising among you can publish a paper on it and list me in the acknowledgements. :)
Anyway, the approx function is pretty straightforward:
args(approx)
## function (x, y = NULL, xout, method = "linear", n = 50, yleft,
## yright, rule = 1, f = 0, ties = mean)
## NULL
We need to provide approx()
with values for the arguments x
, y
, and n
. x
and y
are same-length vectors (e.g. time and force or index value and force), while n
is the number of points we want to interpolate our data to. Using the data I linked above, let’s walk through the process. First, we need to import our data.
# fread is from the data.table package and is much faster than read.csv when reading large amounts of data
jump_data <- fread("jump_trials.csv")
# Data aren't displayed due to the size of the data frame
It’s worth noting the example data are organized in a pretty peculiar manner. I created this data set probably five years ago when I was still an R newbie, so don’t judge me too harshly. Let’s start off by putting things in a saner format.
Edit: It’s worth pointing out this step probably isn’t necessary for your data. The data in this example are in wide format, meaning each row represents a trial. In most software that spits out force-time data, trials will be arranged by columns instead. Sorry for any confusion!
# Again, transpose() comes from data.table
jump_data <- transpose(jump_data)
# Alternatively, using base R
alt_transpose_1 <- data.frame(t(jump_data))
# Or piping via the tidyverse
alt_transpose_2 <- jump_data %>%
t %>%
data.frame
With our shiny new long data in hand, let’s interpolate some new values. Remember, we need x
(the locations of y
, e.g. time or index location), y
(the data), and n
(the new length). Let’s start off by interpolating trial 1 (V1 or X1 depending on whether you used transpose()
or t()
above) to a length of 101 points (0% - 100% of the trial).
approx(1:length(jump_data$V1), jump_data$V1, n = 101)
## $x
## [1] 1.00 9.69 18.38 27.07 35.76 44.45 53.14 61.83 70.52 79.21
## [11] 87.90 96.59 105.28 113.97 122.66 131.35 140.04 148.73 157.42 166.11
## [21] 174.80 183.49 192.18 200.87 209.56 218.25 226.94 235.63 244.32 253.01
## [31] 261.70 270.39 279.08 287.77 296.46 305.15 313.84 322.53 331.22 339.91
## [41] 348.60 357.29 365.98 374.67 383.36 392.05 400.74 409.43 418.12 426.81
## [51] 435.50 444.19 452.88 461.57 470.26 478.95 487.64 496.33 505.02 513.71
## [61] 522.40 531.09 539.78 548.47 557.16 565.85 574.54 583.23 591.92 600.61
## [71] 609.30 617.99 626.68 635.37 644.06 652.75 661.44 670.13 678.82 687.51
## [81] 696.20 704.89 713.58 722.27 730.96 739.65 748.34 757.03 765.72 774.41
## [91] 783.10 791.79 800.48 809.17 817.86 826.55 835.24 843.93 852.62 861.31
## [101] 870.00
##
## $y
## [1] 658.515994 649.379220 640.331911 631.879330 623.584359 614.324304
## [7] 603.097639 589.529908 573.922460 557.029491 539.891934 523.690095
## [13] 509.438574 496.952157 484.949169 472.286676 458.389636 443.856387
## [19] 430.012987 418.109340 408.728879 402.061446 399.570826 404.211114
## [25] 416.259345 432.474051 448.140441 460.439238 470.200577 480.204110
## [31] 492.336535 506.817073 522.627300 538.451562 553.828468 568.714075
## [37] 583.211994 597.729222 613.742874 633.031957 656.844017 685.375517
## [43] 717.312283 750.107935 781.341543 809.573299 834.244476 855.778391
## [49] 875.476081 895.252876 916.483554 940.830454 969.165875 1000.883505
## [55] 1034.586934 1068.650115 1101.899997 1133.434751 1162.049659 1187.555766
## [61] 1210.867641 1232.865991 1253.954299 1274.421249 1293.447920 1309.479565
## [67] 1321.744179 1330.096045 1334.880012 1336.699071 1336.489713 1335.620329
## [73] 1335.907529 1338.357643 1342.899723 1349.396180 1357.966187 1369.044265
## [79] 1382.943453 1399.448286 1417.106000 1435.371568 1453.604835 1470.312076
## [85] 1484.890367 1496.711817 1504.299495 1505.483694 1496.409366 1473.150133
## [91] 1431.732874 1366.362132 1269.803685 1136.728087 966.787323 767.363009
## [97] 553.793176 346.721613 182.028978 67.890906 2.957444
You’ll notice I didn’t add a time column to the data prior to using approx()
. Instead, I used the index locations of the points via 1:length(jump_data$V1)
. Using either index location or a user-defined time column is perfectly fine and won’t affect the results. You’ll also notice approx()
returns interpolated values for both x
and y
. We’re only concerned with y
, however, so you should adjust the above function slightly:
approx(1:length(jump_data$V1), jump_data$V1, n = 101)$y
## [1] 658.515994 649.379220 640.331911 631.879330 623.584359 614.324304
## [7] 603.097639 589.529908 573.922460 557.029491 539.891934 523.690095
## [13] 509.438574 496.952157 484.949169 472.286676 458.389636 443.856387
## [19] 430.012987 418.109340 408.728879 402.061446 399.570826 404.211114
## [25] 416.259345 432.474051 448.140441 460.439238 470.200577 480.204110
## [31] 492.336535 506.817073 522.627300 538.451562 553.828468 568.714075
## [37] 583.211994 597.729222 613.742874 633.031957 656.844017 685.375517
## [43] 717.312283 750.107935 781.341543 809.573299 834.244476 855.778391
## [49] 875.476081 895.252876 916.483554 940.830454 969.165875 1000.883505
## [55] 1034.586934 1068.650115 1101.899997 1133.434751 1162.049659 1187.555766
## [61] 1210.867641 1232.865991 1253.954299 1274.421249 1293.447920 1309.479565
## [67] 1321.744179 1330.096045 1334.880012 1336.699071 1336.489713 1335.620329
## [73] 1335.907529 1338.357643 1342.899723 1349.396180 1357.966187 1369.044265
## [79] 1382.943453 1399.448286 1417.106000 1435.371568 1453.604835 1470.312076
## [85] 1484.890367 1496.711817 1504.299495 1505.483694 1496.409366 1473.150133
## [91] 1431.732874 1366.362132 1269.803685 1136.728087 966.787323 767.363009
## [97] 553.793176 346.721613 182.028978 67.890906 2.957444
Let’s plot the interpolated data against the raw data.
interpolated_data <- approx(1:length(jump_data$V1),
jump_data$V1,
n = 101)$y
plot_ly() %>%
add_lines(data = jump_data,
x = ~1:length(V1),
y = ~V1,
name = "Raw") %>%
add_lines(x = 1:length(interpolated_data),
y = interpolated_data,
name = "Interpolated")
Typically, we want to time normalize multiple trials. Thankfully, R makes this a cakewalk with lapply()
.
lapply_interpolation <- data.frame(lapply(jump_data,
function(x) approx(1:length(x),
x,
n = 101)$y))
Or if you’re a data.table
user…
data_table_interpolate <- jump_data[, lapply(.SD,
function(x) approx(1:length(x),
x,
n = 101)$y)]
In either case, enjoy your shiny new time normalized data!