- 05 Jun, 2019 3 commits
-
-
Aurélien Campéas authored
-
Aurélien Campéas authored
This is because of history/diffmode.
-
Aurélien Campéas authored
-
- 20 May, 2019 3 commits
-
-
Aurélien Campéas authored
-
Aurélien Campéas authored
It will help to seamlessly assemble queries out of * plain string * query parameters * pre-built `sqlp` string + parameters elements
-
Aurélien Campéas authored
`sqlp` carries an sql string fragment plus its needed parameters
-
- 10 May, 2019 2 commits
-
-
Aurélien Campéas authored
-
Aurélien Campéas authored
Since supervision uses one, we'd better do this right. Was lots in the transition from sqlalchemy to plain sql.
-
- 09 May, 2019 2 commits
-
-
Aurélien Campéas authored
-
Aurélien Campéas authored
These orphans are created when strip is invoked. Next, strip will use it.
-
- 06 May, 2019 1 commit
-
-
Aurélien Campéas authored
While doing so we also get rid of * old migrations * the rename utility
-
- 05 Dec, 2018 1 commit
-
-
Arnaud Campeas authored
-
- 14 Sep, 2018 1 commit
-
-
Aurélien Campéas authored
We must make sure we serialize a `contiguous` C array. In the case of a sorted numpy array, the underlying C array must be rebuilt.
-
- 22 Aug, 2018 1 commit
-
-
Aurélien Campéas authored
-
- 03 Sep, 2018 1 commit
-
-
Aurélien Campéas authored
Json serialization is replaced with a more low-level scheme, affecting both string and numeric series. Purpose is to drop the cost of de-serialization, which is currently quite high. For numerical values, we serialize the underlying C array (while recording the in-memory layout/dtype). Perf improvement on the reading phase is quite worthwhile: Before: TSH GET 0.005136966705322266 TSH HIST 0.5647647380828857 DELTA all value dates 2.0582079887390137 DELTA 1 day 0.20743083953857422 class test time 0 TimeSerie bigdata_insert 1.332391 1 TimeSerie bigdata_history_all 1.718589 2 TimeSerie bigdata_history_chunks 1.613754 3 TimeSerie manydiffs_insert 0.940170 4 TimeSerie manydiffs_history_all 0.996268 5 TimeSerie manydiffs_history_chunks 2.115351 After: TSH GET 0.004252910614013672 TSH HIST 0.11956286430358887 DELTA all value dates 1.7346818447113037 DELTA 1 day 0.16817998886108398 class test time 0 TimeSerie bigdata_insert 1.297348 1 TimeSerie bigdata_history_all 0.173700 2 TimeSerie bigdata_history_chunks 0.181005 3 TimeSerie manydiffs_insert 0.846298 4 TimeSerie manydiffs_history_all 0.084483 5 TimeSerie manydiffs_history_chunks 0.216825 A few notes: * serialization of strings is a bit tricky since we need to encode None/nans in its serialization and have a separator for their concatenation (we forbid ascii control characters 0 and 3 to be ever used) * we have to wrap the `index` low-level bytes string into a python array to work around an obscure pandas bug in index.isin computation (isin is attempting a mutation !) Thanks to Alain Leufroy for the proposal ! Resolves #49.
-
- 19 Jul, 2018 2 commits
-
-
Aurélien Campéas authored
-
Aurélien Campéas authored
While the read_json ensures a good final order, inspection showed it was in reverse of the wanted order.
-
- 06 Jul, 2018 2 commits
-
-
Aurélien Campéas authored
It will be possible to create a nameless postgresql index (postgres will handle the naming by himself). This what we now do for series and snapshot tables.
-
Aurélien Campéas authored
We were using `extend_existing` instead of `keep_existing`.
-
- 04 Jul, 2018 2 commits
-
-
Aurélien Campéas authored
-
Aurélien Campéas authored
Let's look at the test_perf outputs (relevant ones): BEFORE: AVG (3) TSH HIST 2.49 class test time 4 TimeSerie manydiffs_history_all 3.43 5 TimeSerie manydiffs_history_chunks 7.52 6 TimeSerie manydiffs_history_chunks_valuedate 0.99 AFTER: AVG (3) TSH HIST 1.41 class test time 4 TimeSerie manydiffs_history_all 3.04 5 TimeSerie manydiffs_history_chunks 6.40 6 TimeSerie manydiffs_history_chunks_valuedate 1.32 We get better numbers for the common bulky operations. The last item would be now better served with a staircase query (if it has any meaning at all). Closes #44.
-
- 27 Jun, 2018 2 commits
-
-
Aurélien Campéas authored
-
Aurélien Campéas authored
-
- 26 Jun, 2018 1 commit
- 25 Jun, 2018 1 commit
-
-
Aurélien Campéas authored
It is a kind of premature optimization and should not be baked into the store as is. We can keep open the possibility to coalesce many small chunks together, but this should come as an explicit extra optimisation that could be dropped without loss.
-
- 13 Jun, 2018 1 commit
-
-
Aurélien Campéas authored
This is needed by the tshistory_editor package.
-
- 08 Jun, 2018 1 commit
-
-
Aurélien Campéas authored
This is necessary for proper serie renaming. Some latent bugs wrt name handling were fixed. Noteworthy: * registry tablename stores the unqualified name (no namespace) * we have more local caches to mitigate the small queries price
-
- 06 Jun, 2018 1 commit
-
-
Aurélien Campéas authored
For usage clarity.
-
- 04 Jun, 2018 1 commit
-
-
Aurélien Campéas authored
The postgres table name limit should not creep up.
-
- 18 May, 2018 1 commit
-
-
Aurélien Campéas authored
-
- 05 May, 2018 2 commits
-
-
Aurélien Campéas authored
Idea is to move de-serialisation costs one level and work a bit with the low-level json byte strings before the expensive calls to to_json. Next step would be considering the opportunity to entirely *drop* the diff representation. Related to #32. Perf notes: * we're still inserting at twice the rate of pandas series .to_sql * reading 10 times faster than the plain sql representation and all this with small max bucket size (25), with a lot of effective data sharing (iow: non-redundancy) as shown by the (small) `noparent` metric. However the impact on bigdata_insert would be bad (up to 10 seconds, compared to the 1.5 secs at 250). max_bucket_size, snap_size, noparent, time : 25 297422 20 9.39448356628418 sql insert 20.521775722503662 SQL GET 0.011615991592407227 TSH GET 0.0010657310485839844 class diffsize test time 0 TimeSerie 930659.0 bigdata_insert 1.495404 1 TimeSerie NaN bigdata_history_all 4.703245 2 TimeSerie NaN bigdata_history_chunks 4.151413 3 TimeSerie 58710.0 manydiffs_insert 0.985946 4 TimeSerie NaN manydiffs_history_all 3.153212 5 TimeSerie NaN manydiffs_history_chunks 6.816579 6 TimeSerie NaN manydiffs_history_chunks_valuedate 0.663319
-
Aurélien Campéas authored
We now maintain a robust snapshot at each changeset, this is unneeded.
-
- 04 May, 2018 1 commit
-
-
Aurélien Campéas authored
The forecast insertion scenario looks good performance wise, especially the read performance is quite nice. max_bucket_size, snap_size, noparent, time : 250 501867 199 9.453744649887085 sql insert 20.50874090194702 SQL GET 0.012586355209350586 TSH GET 0.0010492801666259766
-
- 18 May, 2018 1 commit
-
-
Aurélien Campéas authored
We now detect the append situation and don't rewrite the head chunk, thus maintaining good data sharing. We introduce a `_min_bucket_size` to avoid construction of too small chunks (at the cost of some storage redundancy). Completes #32.
-
- 03 May, 2018 1 commit
-
-
Aurélien Campéas authored
-
- 12 Apr, 2018 1 commit
-
-
Aurélien Campéas authored
This should provide a significant speed bonus for many common operatiions. Notes below about this commit contents: * conftest: more robust cleanup at test startup time When debugging, we might have got phantom inserts of a previous session because of an unclean exit. * tsio: remove customization entry point It was not a good idea. * tsio, snapshot: cache the sqlachemy Table objects It turns out these are very expensive to instantiate, and we do that a lot. * tests/perf: becnhmark a forecast-like insertion * tsio: slight optimisation in _create Resolves #32.
-
- 15 Mar, 2018 2 commits
-
-
Aurélien Campéas authored
-
Aurélien Campéas authored
-
- 14 Feb, 2018 1 commit
-
-
Aurélien Campéas authored
We extract the perf related test to their own module and temporarily disable them. We'll come back there when the internal structure is stabilized. Related to #32.
-