October 21, 2013

From counts to models

Almost all data science starts with counts[1]. How many people are clicking which box? How much time are people spending on a particular page?

Only after that stage data science does gets complex (and more interesting).

There is a similar development from descriptive to inferential statistics in other sciences. Measures of central tendency (i.e., mean, median, and mode) and variance are first calculated, then models and model comparisons (e.g., regression and ANOVA) are applied.

Social media is just now entering the count stage.  Most businesses have a social media presence (binary - yes or no). They have reached a critical mass of data and are starting to organize it. The organization is counts and sums. Very few organizations are thinking about moving to the social media strategy stage (model comparisons - making choices based on data).

Many problems have followed the same pattern. Today's data scale is larger, but the analysis has the same stages. The same class of solutions can be applied at each stage.


  1. A data scientist or statistician is usually brought in after data collection is well under way and there is an realization of its potential value. Collecting the best data in the right format probably did not happen. The classic, “I should have been at a much earlier meeting.”  ↩

No comments:

Post a Comment