#statistics #pitfalls #math #bias #sampling
- You, strongly, select a specific group of data that answer you queries in a good way, with high confidence, which not really happens all the time in statistics.
- Data Snooping: Searching relentlessly on the date until something interesting appears
- Vast Search Effect: Try different models with different questions using large datasets, sometime something interesting will appear.
-
How to avoid the two process above:
- Holdout Dataset (If possible, more than one)
- Targets Shuffling (Permutation of Target for predicting)
- Focusing on the extreme values is a way of selection bias