README.md 3.4 KB
Newer Older
1
2
TSHISTORY
===========
Aurélien Campéas's avatar
Aurélien Campéas committed
3
4
5
6

This is a library to store/retrieve pandas timeseries to/from a
postgres database, tracking their successive versions.

7
8
[TOC]

Aurélien Campéas's avatar
Aurélien Campéas committed
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78

# Principles

There are many ways to represent timeseries in a relational database,
and `tshistory` provides two things:

* a base python API which abstracts away the underlying storage

* a postgres model, which uses JSONB fields to store the bulk of the
  series data.

The core idea of tshistory is to handle successive versions of
timeseries as they grow in time, allowing to get older states of any
series.

Series state can be indexed by either a timestamp (which typically
matches the moment a new insertion took place) or a `changeset id`
which denotes the exact change leading to a given version.


# Basic usage


Complete usage is shown in the test suite. However here's a simple
example:

```python
 from datetime import datetime
 from sqlalchemy import create_engine
 import pandas as pd
 from tshistory.tsio import TimeSerie

 engine = create_engine('postgres://me:password@localhost/mydb')
 tsh = TimeSerie()

 serie = pd.Series([1, 2, 3],
                  pd.date_range(start=datetime(2017, 1, 1),
                                freq='D', periods=3))
 # db insertion
 tsh.insert(engine, serie, 'my_serie', 'aurelien@campeas.fr')

 # db retrieval
 assert tsh.get(engine, 'my_serie') == serie
```

This is good. Now, let's add more:

```python
 serie = pd.Series([7, 8, 9],
                  pd.date_range(start=datetime(2017, 1, 3),
                                freq='D', periods=3))
 # db insertion
 tsh.insert(engine, serie, 'my_serie', 'aurelien@campeas.fr')

 # db retrieval
 stored = tsh.get(engine, 'my_serie')

 assert """
2017-01-01    1
2017-01-02    2
2017-01-03    7
2017-01-04    8
2017-01-04    9
Freq: D
""".strip() == stored.to_string().strip()
```

It is important to note that the third value was replaced, and the two
last values were just appended.

79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145

# Command line operations


A command line tool is provided, called `tsh`. It provides its usage
guidelines:

```shell
 $ tsh --help
 Usage: tsh [OPTIONS] COMMAND [ARGS]...

 Options:
   --help  Show this message and exit.

 Commands:
   info
   log
```

It currently provides two basic operations `info` and `log`.

`Info` provides an overview of the time series repository (number of
committed changes, number and series and their names).

```shell
 $ tsh info postgres://babar:babarpassword@dataserver:5432/banana_studies
 changeset count: 209
 series count:    144
 series names:    banana_spot_price, banana_trades, banana_turnover
```

`Log` provides the full history of editions to time series in the
repository.

```shell
 $ tsh log postgres://babar:babar@dataserver:5432/banana_studies --limit 3
 revision: 206
 author:   BABAR
 date:     2017-06-06 15:32:51.502507
 series:   banana_spot_price

 revision: 207
 author:   BABAR
 date:     2017-06-06 15:32:51.676507
 series:   banana_trades

 revision: 209
 author:   CELESTE
 date:     2017-06-06 15:32:51.977507
 series:   banana_turnover
```

All options of all commands can be obtained by using the `--help`
switch:

```shell
 $ tsh log --help
 Usage: tsh log [OPTIONS] DB_URI

 Options:
   -l, --limit TEXT
   --show-diff
   -s, --serie TEXT
   --from-rev TEXT
   --to-rev TEXT
   --help            Show this message and exit.
```