About DF search
The single parameter to be set by the user is the number of degrees of freedom (DF) to employ. The DF parameter controls how closely the curve (spline) fits the input data. It is necessary to ensure that the curve is not over-fitting or under-fitting the data.
This parameter is dependent on the study design (number of time-points, sampling rate, time-scale of the function of time under study) and therefore only needs to be selected once per dataset.
Some indications based on simulated data and diverse datasets can guide the selection of DF:
- DF controls the "complexity" of the model employed. A substantial difference can be found when going from 2 to 10, but very little change will take place when going from 10 to 50 (the model only gets more complex, but the general shape won't change).
- More time points do not automatically require a higher DF. More inflexions (more complex shape) could require a higher DF if the number of points is sufficient (and the sampling frequency high).
- A lower DF value is often more suited and generalisable (less over-fitted).
- If the DF is for example 10, all individuals trajectories with less of 10 time-points cannot be fitted and will be rejected.
- On simulated data, the results (p-values) are resilient to most values of DF, however the plots can look dramatically different.
- Try multiple values of DF on a subset of variables (using the GUI) and then select the fit that approximate the time evolution the best without over-fitting:
- DF=5 is a reasonable starting point in most cases (even more so if less than 10 time-points are available).
- If the number of time-points is large and the curves seem very under-fitted, DF can be increased to 6, 7 or more. Values higher than 10 should rarely be required and will provide with a diminishing return. DF=number of time-points will result in a curve passing through all points (over-fitted).
- If the number of time points is lower or the trajectories seem over-fitted, DF can be decreased to 4 or 3. (3 will be similar to a second degree polynomial, while 2 will be a linear model).
- If the plots "looks right" and don't seem to "invent" information between measured data-points, the DF is close to optimal.
While it does not seem possible to fully automate the selection of the number of degrees of freedom, DF Search implements visualisation approaches to assist in the selection of an adequate DF to apply across all variables for a given dataset.
Eigen-trajectories estimation
- A PCA extracts the eigen-trajectories across all variables. The DF that will best fit that subset of eigen-trajectories is expected to be satisfactory for all trajectories in the dataset.
- Set parameters on the left and press the run button to generate the eigen-trajectories
Auto-Fit
- Auto-Fit returns the optimal DF based on different goodness of fit metrics.
Parameter Evolution
- Plot the evolution of different goodness of fit metrics for all possible DF.
-
Set parameters and press the update step button to calculate the metrics.
Plot Fit
- Plot eigen-trajectories fitted with a selected DF (left panel).
-
Set parameters (automatic fitted spline in red) and press the update button to generate the plot.
Missing Value
-
The number of trajectories that must be rejected due to a too low number of observations (depending on the DF selected) can be visualised.