Structured Data

#statistics #math

  • They are two type of data: numerical and categorical
  • numerical are divided into continuo and discrete
  • categorical are set of values and can be ordinal (containing order), binaries (yes/no, true/false, 0/1) and
    • It is different from text type, because: preserve order on plots, help on storage and indexing, possible values are enforced
    • The enforced can bring strange behavior if an unknown value appears. For R this value can receive NA but for python/pandas does not have a standard behavior.
  • Each row is known as sample for DS, but can be a record or observation as well
  • A sample can be a collection of rows for statistician
Links to this page
#statistics #math