1. 05 Jun, 2019 3 commits
  2. 20 May, 2019 3 commits
  3. 10 May, 2019 2 commits
  4. 09 May, 2019 2 commits
  5. 06 May, 2019 1 commit
  6. 05 Dec, 2018 1 commit
  7. 14 Sep, 2018 1 commit
  8. 22 Aug, 2018 1 commit
  9. 03 Sep, 2018 1 commit
    • Aurélien Campéas's avatar
      snapshot/storage: low-level optimisation · dbce79810ccf
      Aurélien Campéas authored
      Json serialization is replaced with a more low-level scheme,
      affecting both string and numeric series.
      
      Purpose is to drop the cost of de-serialization, which is
      currently quite high.
      
      For numerical values, we serialize the underlying C array
      (while recording the in-memory layout/dtype).
      
      Perf improvement on the reading phase is quite worthwhile:
      
      Before:
      
      TSH GET 0.005136966705322266
      TSH HIST 0.5647647380828857
      DELTA all value dates 2.0582079887390137
      DELTA 1 day  0.20743083953857422
      
             class                      test      time
      0  TimeSerie            bigdata_insert  1.332391
      1  TimeSerie       bigdata_history_all  1.718589
      2  TimeSerie    bigdata_history_chunks  1.613754
      3  TimeSerie          manydiffs_insert  0.940170
      4  TimeSerie     manydiffs_history_all  0.996268
      5  TimeSerie  manydiffs_history_chunks  2.115351
      
      After:
      
      TSH GET 0.004252910614013672
      TSH HIST 0.11956286430358887
      DELTA all value dates 1.7346818447113037
      DELTA 1 day  0.16817998886108398
      
             class                      test      time
      0  TimeSerie            bigdata_insert  1.297348
      1  TimeSerie       bigdata_history_all  0.173700
      2  TimeSerie    bigdata_history_chunks  0.181005
      3  TimeSerie          manydiffs_insert  0.846298
      4  TimeSerie     manydiffs_history_all  0.084483
      5  TimeSerie  manydiffs_history_chunks  0.216825
      
      
      A few notes:
      
      * serialization of strings is a bit tricky since we need to
        encode None/nans in its serialization and have a separator
        for their concatenation (we forbid ascii control characters
        0 and 3 to be ever used)
      
      * we have to wrap the `index` low-level bytes string into
        a python array to work around an obscure pandas bug in
        index.isin computation (isin is attempting a mutation !)
      
      
      Thanks to Alain Leufroy for the proposal !
      
      Resolves #49.
      dbce79810ccf
  10. 19 Jul, 2018 2 commits
  11. 06 Jul, 2018 2 commits
  12. 04 Jul, 2018 2 commits
  13. 27 Jun, 2018 2 commits
  14. 26 Jun, 2018 1 commit
  15. 25 Jun, 2018 1 commit
    • Aurélien Campéas's avatar
      snapshot: drop the idea of a min bucket size · 33a47a857023
      Aurélien Campéas authored
      It is a kind of premature optimization and should not be baked into
      the store as is. We can keep open the possibility to coalesce
      many small chunks together, but this should come as an explicit
      extra optimisation that could be dropped without loss.
      33a47a857023
  16. 13 Jun, 2018 1 commit
  17. 08 Jun, 2018 1 commit
  18. 06 Jun, 2018 1 commit
  19. 04 Jun, 2018 1 commit
  20. 18 May, 2018 1 commit
  21. 05 May, 2018 2 commits
    • Aurélien Campéas's avatar
      snapshot: last mile optimisation, and we're getting good · d9d9c56a3646
      Aurélien Campéas authored
      Idea is to move de-serialisation costs one level and work a bit
      with the low-level json byte strings before the expensive calls
      to to_json.
      
      Next step would be considering the opportunity to entirely *drop*
      the diff representation.
      
      Related to #32.
      
      Perf notes:
      
      * we're still inserting at twice the rate of pandas series .to_sql
      * reading 10 times faster than the plain sql representation
      
      and all this with small max bucket size (25), with a lot of effective
      data sharing (iow: non-redundancy) as shown by the (small) `noparent`
      metric.
      
      However the impact on bigdata_insert would be bad (up to 10 seconds,
      compared to the 1.5 secs at 250).
      
      
      max_bucket_size, snap_size, noparent, time :  25 297422 20 9.39448356628418
      sql insert 20.521775722503662
      SQL GET 0.011615991592407227
      TSH GET 0.0010657310485839844
      
             class  diffsize                                test      time
      0  TimeSerie  930659.0                      bigdata_insert  1.495404
      1  TimeSerie       NaN                 bigdata_history_all  4.703245
      2  TimeSerie       NaN              bigdata_history_chunks  4.151413
      3  TimeSerie   58710.0                    manydiffs_insert  0.985946
      4  TimeSerie       NaN               manydiffs_history_all  3.153212
      5  TimeSerie       NaN            manydiffs_history_chunks  6.816579
      6  TimeSerie       NaN  manydiffs_history_chunks_valuedate  0.663319
      d9d9c56a3646
    • Aurélien Campéas's avatar
      snapshot: definitely retire build_up_to · 16776ee24de9
      Aurélien Campéas authored
      We now maintain a robust snapshot at each changeset,
      this is unneeded.
      16776ee24de9
  22. 04 May, 2018 1 commit
    • Aurélien Campéas's avatar
      snapshot: bring down _max_bucket_size to 250 · 2082c0791eac
      Aurélien Campéas authored
      The forecast insertion scenario looks good performance wise,
      especially the read performance is quite nice.
      
      max_bucket_size, snap_size, noparent, time :  250 501867 199 9.453744649887085
      sql insert 20.50874090194702
      SQL GET 0.012586355209350586
      TSH GET 0.0010492801666259766
      2082c0791eac
  23. 18 May, 2018 1 commit
  24. 03 May, 2018 1 commit
  25. 12 Apr, 2018 1 commit
    • Aurélien Campéas's avatar
      tsio/snapshots: we now have chunked snapshots · ba11d01bcfd1
      Aurélien Campéas authored
      This should provide a significant speed bonus for many common operatiions.
      Notes below about this commit contents:
      
      * conftest: more robust cleanup at test startup time
        When debugging, we might have got phantom inserts of a previous
        session because of an unclean exit.
      
      * tsio: remove customization entry point
        It was not a good idea.
      
      * tsio, snapshot: cache the sqlachemy Table objects
        It turns out these are very expensive to instantiate,
        and we do that a lot.
      
      * tests/perf: becnhmark a forecast-like insertion
      
      * tsio: slight optimisation in _create
      
      
      Resolves #32.
      ba11d01bcfd1
  26. 15 Mar, 2018 2 commits
  27. 14 Feb, 2018 1 commit