You’ll constantly add more data to track
Tracking new data often involves adding more columns. It’s not that this is inherently bad, but once the number of columns reaches a certain point, it becomes inconvenient and more difficult to keep track of all of them.
It also uses more memory when loading.
After I noticed the fifth cached_stat
field in the product model, I started moving them to a separate state model that offered a better architecture.
So, you can make a separate model for the stats
I made a model that had the following structure:
parent:string
: timestamp or date, for instance ‘2020-01-01’parsed_parent:datetime
: the same thing but in the datetime format that helps query rangesresolution:string
: the range of the parent,hour
,day
,week
, ormonth
. It indicates for what period it applies. Helpful for combining multiple entries into cumulative ones.name:string
: name of the statevalue:decimal(15, 4)
: the valuestatable_id:integer
: the ID of the entry that owns the stat, as part of a Rails polymorphic relationshipstatable_type:string
: the class name of the entry that owns the stat
Then I made scopes to make querying stats more human friendly. In the end could do something like this:
product.stats.called('sales').per('day').between(Time.now.at_beginning_of_month, Time.now.at_end_of_month)
It’s extensible, doesn’t pollute the main model, serves as a cache
It is much easier to track new stats this way.
The main models are also cleaner since the stat columns are no longer being added.
In case you stop tracking a stat, you also won’t have any abandoned columns.
It also functions as a model for caching intermediate steps. At the end of the month, all the stats with day
resolution are combined into one with month
resolution.
It also gives a good base for exporting because queries can run on this model only.
I also duplicated some operational stats so that queries could run on this model rather than the operational one. For instance, I copied the quantities ordered from order items to the stats.
But look out for performance degradation
The one thing to look out for is the performance. Eventually, this table can get very large the queries can start getting slow.
Two patterns can mitigate this. First, separate the stats into different tables. For instance, I moved all stats called view
into a separate table. Secondly, you can also horizontally shard it. It’s also prudent to run it on a separate database.
Logging, of course, needs to be done in the background and not on the main thread.