Selection Bias

#statistics #pitfalls #math #bias #sampling

  • You, strongly, select a specific group of data that answer you queries in a good way, with high confidence, which not really happens all the time in statistics.
  • Data Snooping: Searching relentlessly on the date until something interesting appears
  • Vast Search Effect: Try different models with different questions using large datasets, sometime something interesting will appear.
  • How to avoid the two process above:
    • Holdout Dataset (If possible, more than one)
    • Targets Shuffling (Permutation of Target for predicting)
  • Focusing on the extreme values is a way of selection bias
Links to this page
#statistics #pitfalls #math #bias #sampling