I want to experiment with different interpolation techniques. Some determining factors for choosing the appropriate interpolation techniques are:
1. Density of data (Checking if the data is dense or sparse)
2. Dimensionality of data (High dimensional or low dimensional)
3. Size of the data (for lower computation time)
Since:
-> Radial Basis is good for sparse & high dimensional data
-> Cubic Spline is good for dense data
-> Polynomial Interpolation is better for small and low dimensional datasets
as it can fit accurate lines/curves for functions upto 3 degrees.
My doubts are:
1) Is there a sure shot way of checking the density of the interpolation column's data distribution? Some techniques I have identified are:
1: If the number of missing/empty values is significant (>=50%) the data is sparse.
2: If the range of majority of the data values is small & the standard deviation is small, then the data is considered dense, else, sparse.
3: Visualization using a scatter plot.
2) Is the following concept regarding high & low dimensionality right?
If the number of dimensions/variables is the same as or outnumbers the number of rows, then the data is high dimensional, else, low-dimensional.
3) Can I include a rule in my code that if my data is low dimensional & the number of rows is less than 100,000 then it's a small dataset, else, large?
What I have tried:
I gathered the above points after a lot of research. Do let me know if my above points right.
Note: I understand that some of the above points like size of the dataset & differentiating between sparse & dense data distribution are subjective, but I want to know if they are accurate to a decent extent.