@@ -37,6 +37,14 @@ The supervision and failover relies on the contributions of
4.*ca-proxies* do not store any state in memory, so that they can be restarted at any time without losing information, and failures have limited consequences, ideally compromising only the operation contextual to the failure.
## Database
Each database table is discussed in the documentation of the project that (mainly) accesses it in *write* mode.
For example, the *register* table is principally managed by the [caserver-proxy](https://gitlab.elettra.eu/puma/server/caserver-proxy)
and is described in its *README.md* project file [here](https://gitlab.elettra.eu/puma/server/caserver-proxy#database),
while the *activity* table is illustrated in this document [here](activity-table).
## ca-proxy tasks
The [ca-proxy](https://gitlab.elettra.eu/puma/server/caserver-proxy) receives requests from clients.
...
...
@@ -154,7 +162,114 @@ more required.
## <a href="casup-db-tables"/>Database tables
### <a href="activity-table"/> running activities and recovery-related tables
#### 1. The activity table
The *activity* table is written by the ca-supervisor.
Every *subscribe* and *unsubscribe* (methods * "s", "S" and "u" *) message received from the clients by the
[caserver-proxy](https://gitlab.elettra.eu/puma/server/caserver-proxy) is forwarded to the *ca-supervisor*
in order to be respectively added or removed from the *activity* table.
The *activity* table is a *snapshot* of the *sources* being monitored at a given time, showing
which [caserver-proxy](https://gitlab.elettra.eu/puma/server/caserver-proxy) has registered them,
which client is monitoring them and on what *[nchan] channel*.
```
CREATE TABLE activity (
srv_id integer NOT NULL,
cli_id integer NOT NULL,
"source" varchar(512) NOT NULL,
chan varchar (1024) NOT NULL
);
```
operations on the *activity* table shall allow multiple contemporary running instances of the *ca-supervisor*,
as discussed in the ensuing *note*.
##### <a href="activity-table-note"/>Note
**Scenario**: three services, A, B and C. One *or more* supervisors.
If service A is monitoring *src1* on *ch1* and service B is assigned *src1* on *ch1* as well,
then *ca-supervisor* shall *update the activity table* rather than insert a new row, setting service B
as *srv_id* of *src1* and *ch1*
As a result,
- if service A goes down, there is no need to take care of (*src1, ch1*) in the failover operation
- if service B goes down, the failover operation will reassign (*src1, ch1*) to another available service (let's say either A or C),
and *ca-supervisor* will again update the *cli_id* where *source* and *chan* are (*src1, ch1*).
When a *recovery operation* is undertaken, the activities associated to a failing service shall be *moved* from the
*activity* to the *recover_activities* table. This avoids that another *supervisor* possibly inititates the same
operation at the same time. Contextually, activity information is temporarily moved to the
*recover_activities* table, where the operation data is saved alongside other accessory information stored within
*recover_operations*.
In case a recover operation *fails* (f.e. no other services available to take over stray sources), the involved
set of sources shall be moved again into the *activity* table, so that the operation can be retried later.
The important point here is that, in order to avoid concurrent operations by multiple *ca-supervisor* services,
the activities object of recovery must be removed from the *activity* table beforehand.
A final *recovery* table defined as follows:
#### 2. The register, service, clients and srvconf* tables
The *register, service, clients and srvconf* tables are discussed in the