caching - How do I ensure consistency of aggregates with high availability? -
my team needs find solution following problem:
our application allows users view total sales enterprise, totals product, totals region, totals region x product, totals regions x division, etc. idea. there many values need aggregated many of totals cannot computed on fly - have pre-aggregate them provide decent response times, process takes 5 minutes.
the problem, thought common 1 can find no references to, how allow updates various sales without shutting off users. also, users cannot accept eventual consistency - if drill down on total of 12 better see numbers add 12. need consistency + availability.
the best solution we've come far direct queries redundant database, "b" (optimized queries) while updates directed primary database, "a". when decide spend 5 minutes update aggregates, update database "c", yet redundant database "b". then, new user sessions directed "c", while existing user sessions continue use "b". eventually, warning left using "b", kill sessions on "b" , re-aggregate there, swapping roles of "b" , "c". typical drain-stop scenario.
we surprised cannot find discussion of , concerned over-engineering problem or maybe it's not problem think is. advice greately appreciated.
this interesting problem thought on train, , came idea of storing timestamp each row in database aggregate over. (i think technique has name, escapes me , googling isn't finding it...)
the timestamp indicate when row inserted. in addition:
-if rows can updated, have 2 'versions' of row @ once, 1 more recent other.
-if rows can deleted, there need 'deleted version' row specifies when deleted.
now can things such as:
1) update aggregates @ jan 1 2000 midnight. can have views of table return table's data though jan 1 2000 midnight, ignoring inserts/updates/deletes more recent that. aggregates date data in view , can keep adding data underlying table.
2) don't know how feasible/easy guarantee it's reliable be, have 'differentially computed aggregates' on jan 2 2000 midnight, take aggregates of jan 1 2000 midnight , update them data has been changed since time - saving recomputing historical data. (of course, gets hairier once consider rows being updated or deleted older 24 hours)
3) whenever bring aggregates date, can merge updated , deleted rows older version , rid of older version, have keep duplicates of rows around when need them separate rows have been aggregated , rows aren't (this means that, instance, if aggregates run @ once, , update row 3 times in quick succession, need keep recent update-indicating row)
Comments
Post a Comment