note: asked similar question previously, not clear enough on looking for, , marked answer aggressively. looking confirmed yes/no on specific point.
i want build automated job performs offline processing on documentdb documents querying documentdb on schedule, looking documents have changed since last time check performed.
given metadata available in documentdb, looks way following:
- the first time process runs, retrieve documents.
- store largest _ts value result set highwatermark, along ids , etags of documents have particular value _ts value.
- for each subsequent query, include "where _ts >= highwatermark" clause. filter out previously-recorded documents etags have not changed. result set of changes since last time query ran.
my question is guaranteed work? guaranteed not miss documents? far can tell, comes down transactional semantics around _ts within documentdb's implementation, not documented level of detail. want know if it's guaranteed no document can updated _ts value lower largest _ts value returned during query returns most-recently changed document in collection.
edit, prompted david's comment:
to little more precise, couple of specific scenarios:
- if updates 2 documents, d0 , d1, applied database @ t0 , t1 (where t1 > t0, such arbitrary query may return d0 not d1), possible d0._ts > d1._ts? use of strictly-greater-than intentional, proposed implementation deals multiple updates receiving same _ts of them being retrieved query.
- assume execute implementation's query @ time t0, , query takes long time run, and/or requires couple of executenextasync() calls pull multiple batches server. during period, 2 different documents (d1 , d2) updated, getting _ts values of t1 , t2 (where t1 < t2). possible d2 appear in result set? more importantly, if does, d1 guaranteed included?
with default consistency not guaranteed work because document lower _ts can show later. however, if can guarantee update requests far enough apart (say 60 seconds) risk low.
i don't think david's edge case worry long treat every document higher _ts new.
you might want consider append-only approach using richard snodgrass' temporal model. makes idempotency semantics easier.
Comments
Post a Comment