Infrastructure¶
Architecture Overview¶
Data¶
The data are under the LVRA_DATA_ROOT directory. On the Oxford Lasair prod servers
that is /home/lasair/data/lvra (it is already defined in the .bashrc).
There directory structure logic is as follows: TYPE > YEAR > DATE
Here the type referes to the file types:
JSON: contain the raw JSON alert data from Lasair LSST. These are created by thekafka_consumer.pywhich ingests a broad Lasair filter calledlvra_fodder.csv: contain the feature csv files created from the JSON alert data by each feature makign pipeline. For example for ther0bVRA, there is ar0b_feature_maker.pyscript.logs: contain log text files.db: contain the SQLite database files. NOTE: this is a flat directory, no timestamped sub dirs here.
Code¶
The code is under LVRA_CODE_ROOT, it is the lvra python package.
There are two kinds of scripts used in production: the python pipelines and their bash wrappers.
The bash wrappers are their to set the environment so that when the code is run from cron everything
works as expected. They also redirect the stderr to stdout and write it to an error log file so that
cron jobs do not fail silently and we can track what is going on.
The bash scripts are under LVRA_CODE_ROOT/bash and the python scripts are under
LVRA_CODE_ROOT/lvra/pypeline (not a typo, a pun between python and pipeline. ha. ha.).
A lot of the code needs config files that are stored under LVRA_CODE_ROOT/data.
Any secret information such as tokens are directly stored in environment variables on the server,
nothing in the files.
Useful Definitions¶
Status Codes: These are integers used in the database tables
Status |
Description |
|---|---|
0 |
Initialised |
1 |
Successfully Processed |
21 |
File Not Found (INPUT) |
22 |
File Not Found (OUTPUT) |
23 |
Not Expected Input Data Type |
30 |
Key Error (missing in data) |
31 |
Missing Columns (INPUT) |
40 |
Lasair Annotation Issue |
41 |
Failure to create Lasair client |
99 |
Generic Error |
The 2X errors refer a problem with the input or outputs such that they can’t be loaded.
The 3X errors correspond to issues with the data structure or content. So it _was_ loaded, but the contents cause problems
Code 30 likely means the files you are trying to use don’t have the structure you expect. This is most likely due to a change in the alert or clean data format. Causes may vary: changes in LSST data, changes in Lasair, changes in your code.
The 4X errors are specific to Lasair
Stems: These are the core names of our files and take the format
YYYYMMDD_HHMMSS. Each path name is constructed with the formatTYPE/YEAR/DATE/stem.extension. The stem is also used as the primary key for the status tables (see below)
Log files and SQLite database¶
Log files are written to the LVRA_DATA_ROOT/logs/YEAR/DAY/[logname].log directory.
There is also a SQLite database to keep track of the status of various processes and the
history of the predictions of various VRAs. It is located under LVRA_DATA_ROOT/db/log.db.
There are three kinds of tables:
Status tables: primary key is the stem and the columns are named after VRAs. Each cell contains a status code (see table above).
Mapping table: primary key is the LSST
diaObjectIdand contains the mapping between thediaObjectIdand the stem name. For now there is only one mapping table but I make need mapping between other ids in the future… Note that here the stem is not technically a foreign key because I have not enforced that the stem exists in the status tables.Provenance table: to keep track of the history of our model inferences. [NOT IMPLEMENTED YET]
Table List¶
feature_making[Status table]: Records which alerts files have been successfully processed. New columns can be added for each LVRA
stem (str) |
r0b (int) |
|---|---|
20260127_105636 |
1 |
20260127_111728 |
0 |
annotating[Status table]: Records which alerts files have been successfully annotated. New columns can be added for each LVRA
stem (str) |
r0b (int) |
|---|---|
20260127_105636 |
0 |
20260127_111728 |
0 |
diaobjid_stems[Mapping table]: Records the mapping between LSSTdiaObjectIdand the alert stem name.
diaObjectId (int) |
stem (str) |
|---|---|
169755827469549632 |
20260128_154837 |
169843765851193449 |
20260128_154837 |
169843765880029260 |
20260128_154837 |
Log files¶
[list log fiels and explain]
Infra Set-up Instructions¶
Directories¶
Here is a bash script that can be run in the LVRA_DATA_ROOT of choice to create the full directory
sub-stucture.
STEP 1: Save this as make_dirs.sh under the LVRA_DATA_ROOT directory.
#!/usr/bin/env bash
mkdir -p JSON
mkdir -p csv
mkdir -p logs
mkdir -p db
years_arr=(2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
)
work_dir=$(pwd)
today=$(date +"%Y%m%d")
today_year=$(date +"%Y")
for dir in JSON csv logs; do
for year in "${years_arr[@]}"; do
mkdir -p "$work_dir/$dir/$year"
done
mkdir -p "$work_dir/$dir/$today_year/$today"
done
STEP 2: Run the following commands
chmod u+x make_dirs.sh
./make_dirs.sh
SQLite database¶
Now go to the databse subdirectory. From LVRA_DATA_ROOT
cd db
Then copy paste this code in a new file called log_schema.sql.
CREATE TABLE IF NOT EXISTS feature_making (
stem TEXT PRIMARY KEY,
timestamp TEXT NOT NULL DEFAULT current_timestamp,
r0b INTEGER
);
CREATE TABLE IF NOT EXISTS annotating (
stem TEXT PRIMARY KEY,
timestamp TEXT NOT NULL DEFAULT current_timestamp,
r0b INTEGER
);
CREATE TABLE IF NOT EXISTS diaobjid_stems (
diaObjectId INTEGER PRIMARY KEY,
stem TEXT NOT NULL,
timestamp TEXT NOT NULL DEFAULT current_timestamp
);
CREATE TABLE IF NOT EXISTS provenance (
ID INTEGER PRIMARY KEY,
diaObjectId INTEGER,
diaSourceId INTEGER,
stem TEXT,
score REAL,
model_name TEXT,
model_version TEXT,
timestamp TEXT NOT NULL DEFAULT current_timestamp
);
CREATE TABLE threshold_flags_provenance(
ID INTEGER PRIMARY KEY,
diaObjectId INTEGER,
diaSourceId INTEGER,
stem TEXT,
n_gt22 INTEGER,
n_gt21 INTEGER,
n_gt20 INTEGER,
n_gt19 INTEGER,
n_gt18 INTEGER,
brighter22 INTEGER,
brighter21 INTEGER,
brighter20 INTEGER,
brighter19 INTEGER,
brighter18 INTEGER,
first22 INTEGER,
first21 INTEGER,
first20 INTEGER,
first19 INTEGER,
first18 INTEGER,
timestamp TEXT NOT NULL DEFAULT current_timestamp
);
Then run:
sqlite3 log.db < log_schema.sql
Tip
To ensure tables are nicely formatted when you run sqlite3 from the command line you need to
add a .sqliterc file in your home directory with the following content: .headers on and
.mode column (on two separate lines).
Local dev env¶
I have not set up a docker (and will not for now) but I have local directories that mimick what is on remote.
Most importantly I have environments that need to be the same.
conda¶
For the python environment I exported a yaml file of the remote conda environement:
conda env export --no-builds > lvra_env.yml
Then I copied it in my local package:
scp lasair@oxdb1:code/lvra_env.yml ./software/lvra
Then I created the environment with:
conda env create -f software/lvra/lvra_env.yml -n lvra
env variables¶
export LVRA_SETTINGS='/home/stevance/software/lvra/data/public_settings_local.yaml'
export LVRA_TRAINING_ROOTDIR='/home/stevance/Science/lvra-training/'
export LASAIR_LSST_TOKEN = [see my .bashrc]
export LVRA_TNS_API_KEY = [see my .bashrc]
Lasair Virtual Research Assistants