You’ll constantly add more data to track

Tracking new data often involves adding more columns. It’s not that this is inherently bad, but once the number of columns reaches a certain point, it becomes inconvenient and more difficult to keep track of all of them.

It also uses more memory when loading.

After I noticed the fifth cached_stat field in the product model, I started moving them to a separate state model that offered a better architecture.

So, you can make a separate model for the stats

I made a model that had the following structure:

  • parent:string: timestamp or date, for instance ‘2020-01-01’
  • parsed_parent:datetime: the same thing but in the datetime format that helps query ranges
  • resolution:string: the range of the parent, hour, day, week, or month. It indicates for what period it applies. Helpful for combining multiple entries into cumulative ones.
  • name:string: name of the state
  • value:decimal(15, 4): the value
  • statable_id:integer: the ID of the entry that owns the stat, as part of a Rails polymorphic relationship
  • statable_type:string: the class name of the entry that owns the stat

Then I made scopes to make querying stats more human friendly. In the end could do something like this:

product.stats.called('sales').per('day').between(Time.now.at_beginning_of_month, Time.now.at_end_of_month)

It’s extensible, doesn’t pollute the main model, serves as a cache

It is much easier to track new stats this way.

The main models are also cleaner since the stat columns are no longer being added.

In case you stop tracking a stat, you also won’t have any abandoned columns.

It also functions as a model for caching intermediate steps. At the end of the month, all the stats with day resolution are combined into one with month resolution.

It also gives a good base for exporting because queries can run on this model only.

I also duplicated some operational stats so that queries could run on this model rather than the operational one. For instance, I copied the quantities ordered from order items to the stats.

But look out for performance degradation

The one thing to look out for is the performance. Eventually, this table can get very large the queries can start getting slow.

Two patterns can mitigate this. First, separate the stats into different tables. For instance, I moved all stats called view into a separate table. Secondly, you can also horizontally shard it. It’s also prudent to run it on a separate database.

Logging, of course, needs to be done in the background and not on the main thread.