improve data cleaning method

Cleaning datasets based on Z function is not a good idea. Cleaning with Z = +/- 3 is terrible, since it assumes a normal distribution based on the mean and std dev, which will leave out a huge number of data for heavily skewed datasets. We can expand this method in a number of ways. Some ideas to get started:

at the very least, use a higher default Z score. Robert already made a commit with 5 instead of 3
is there a better alternative to the Z function that can take into account skewness? obviously we can't assume a distribution here, that's the point of this whole package, so it needs to be very generalized...perhaps we can compute skewness and or information about each tail and make a Z score for each side?
give user information first about the number of data eliminated for various z values, then ask to update z threshold?
regardless of approach, number of eliminated data should be stored as an attribute in the class instance after cleaning
if cleaning is, or can be, performed multiple times, maybe store this in a list, recording sequential cleanings? For example, clean with Z=10 and see that x data are removed, clean again with Z = 8 and see that y data are removed; total data removed = x + 7
tracking of eliminated points above can be done for nan's also

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information