Updated README.md

e25262a9 · Andrea Giannetti · dab489de · e25262a9 · e25262a9
Commit e25262a9 authored 1 year ago by Andrea Giannetti
--- a/README.md
+++ b/README.md
-The Swiss Army Knife project has the objective to assess whether CH3OH can be used as an effective volume density probe and in which regime.
+The Swiss Army Knife project has the objective to calibrate specific line ratios as effective and efficient density probes, to offset the asymmetry between ease of use temperature and density tracers in molecular gas. By providing convenient tracers of number density we hope to reduce the friction to their use, so that this parameter is estimated reliably and with care.

-Important resources are listed in the [documentation.md file](documentation/documentation.md).
\ No newline at end of file
+The simplest way to obtain and use SAK is by pulling the docker image from this repository through the command:
+
+```bash
+docker pull git.ia2.inaf.it:5050/andrea.giannetti/swiss_army_knife_stable
+```
+or if you prefer an apptainer (singularity) image:
+```bash
+singularity pull --disable-cache docker://git.ia2.inaf.it:5050/andrea.giannetti/swiss_army_knife_stable
+```
+A docker-compose script is included in the root folder of this repository ([docker-compose.yaml](docker-compose.yaml)), so that the entire program can be executed via:
+```bash 
+docker compose up --build
+```
+The following parameters can be set by adding the `command` override in the `etl` section of the docker-compose file:
+
+* --run_id: the run id of the grid to process; if not provided, generates a new run_id;
+* --cleanup_scratches: whether to empty the mld/scratches directory;
+* --distributed: whether the grid is processed in a distributed environment, in which case it uses a queue from the database to process all models in the grid, otherwise, executes with multiprocessing.
+
+For example:
+```
+  etl:
+    [...]
+    command: 'python main.py --distributed False --run_id \"7dd5b365-875e-4857-ae11-2707820a33c1\"'
+  [...]
+```
+
+If you prefer to have the source code, you can get it through:
+```bash
+git clone https://www.ict.inaf.it/gitlab/andrea.giannetti/swiss_army_knife_stable.git
+```
+and use it to rebuild the images or run it directly.
+Input and outputs are described in detail in the [documentation.md file](documentation/documentation.md)
+
+## Quickstart
+
+By running the [main.py](main.py) file (executed by the pipeline, that can be run as described above) SAK creates a grid of massive clumps similar to the one used in the reference paper. The grid is sparser for faster computation.
+
+The global configuration file contains the grid spacing (`dust_temperature_step`, `gas_density_step`), and the lines and ratios to process (`lines_to_process`).
+
+After the pipeline has finished, it's possible to train the ML models (or use the ones provided in the package registry), if desired by running:
+```bash
+python prs_ml_training.py
+python prs_expand_modelling_with_ml.py]
+```
+or
+```bash
+python prs_ml_training_ratio.py
+python prs_expand_ratio_modelling_with_ml.py]
+```
+the second option is the preferred one, emulating directly the ratio.
+
+Finally, to perform inference, run the following commands:
+```bash
+python prepare_inference_input.py
+python prs_density_inference.py
+```
\ No newline at end of file
--- a/documentation/documentation.md
+++ b/documentation/documentation.md
@@ -8,65 +8,63 @@ paper https://git.overleaf.com/6373bb408e4040043398e495.
 The project remote repository for the code is:
 https://www.ict.inaf.it/gitlab/andrea.giannetti/swiss_army_knife_stable/-/tree/main

-The first paper is here:
-https://git.overleaf.com/6373bb408e4040043398e495

 ## Pipeline

 The SAK non-LTE, toy model pipeline uses three main layers:

-1. **Staging layer:** In the staging layer (`etl/stg`), the `stg_radmc_input_generator.py` file takes care of preparing
+1. **Staging layer:** In the staging layer (`etl/stg`), the [stg_radmc_input_generator.py](../etl/stg/stg_radmc_input_generator.py) file takes care of preparing
   the input files for RADMC, and saves them in the `etl/mdl/radmc_files` folder.

-   The `etl/stg/config/config.yml` file contains the default values of the parameters used to prepare the RADMC files
+   The [etl/stg/config/config.yml](../etl/stg/config/config.yml) file contains the default values of the parameters used to prepare the RADMC files
   for the postprocessing. All of them are described in detail in the following.

 2. **Model layer:** The model layer (`etl/mdl`) takes care of preparing and executing the RADMC command according to the
-   configuration in the `etl/mdl/config/config.yml` file. This is done by the `mdl_execute_radmc_command.py` script,
+   configuration in the [etl/mdl/config/config.yml](../etl/mdl/config/config.yml) file. This is done by the [mdl_execute_radmc_command.py](../etl/mdl/mdl_execute_radmc_command.py) script,
   that also creates `radmc3d_postprocessing.sh`.

   The results are then converted to fits cubes, which are saved by default into `prs/fits/cubes`, for later
   processing.

 3. **Presentation layer:** In the presentation layer (`etl/prs`) moment-0 maps and line-ratio maps are computed
-   executing the `prs_compute_integrated_fluxes_and_ratios.py` script. At the moment, the integration limits cannot be
+   executing the [prs_compute_integrated_fluxes_and_ratios.py](../etl/prs/prs_compute_integrated_fluxes_and_ratios.py) script. At the moment, the integration limits cannot be
   specified, and the entire cube is collapsed. *WARNING:* Pay attention to the presence of secondary modeled lines in
   the simulated spectra.

-   The script `prs_inspect_results.py` reduces the ratio maps to single points and produces an image of the ratio values
+   The script [prs_inspect_results.py](../etl/prs/prs_inspect_results.py) reduces the ratio maps to single points and produces an image of the ratio values
   as a function of gas number density and temperature. Be aware that at the moment the query is hardcoded and works for
   SAK-generated models only.

-   The script `prs_prepare_backup.py` compressed and copies the output of a model run for sharing.
+   There is a final script [prs_density_inference.py](../etl/prs/prs_density_inference.py), that is used for preparing the KDE model and to perform density
+   inference, given the measured ratios. It uses the YAML file [etl/config/density_inference_input.yml](../etl/config/density_inference_input.yml) to provide the
+   needed input for the script. It produces the `output/run_type/{provided_run_type}/density_pdf*.png` output file,
+   where `provided_run_type` is defined in the global configuration file and the wildcard represents the
+   source name.
   
-   The scripts `prs_ml_training.py`, `prs_ml_training_ratio.py` and `prs_expand_modelling_with_ml.py` `prs_expand_ratio_modelling_with_ml.py` can be run to perform ML-assisted emulation of
+   The scripts [prs_ml_training.py](../etl/prs/prs_ml_training.py), [prs_ml_training_ratio.py](../etl/prs/prs_ml_training_ratio.py) and [prs_expand_modelling_with_ml.py](../etl/prs/prs_expand_modelling_with_ml.py) [prs_expand_ratio_modelling_with_ml.py](../etl/prs/prs_expand_ratio_modelling_with_ml.py) can be run before the inference to perform ML-assisted emulation of
   the modelled data, in
   order to expand the grid performed with actual RT computation. These scripts rely on
-   the `etl/config/ml_modelling.yml` to perform training and evaluation of the emulation model, and to actually produce
-   emulated data, which are saved to the `etl/inferred_data.csv`. This file is used by the `prs_density_inference.py` to
-   concatenate these data to those from the formal computation to perform the density inference (see below). In our
+   the [etl/config/ml_modelling.yml](../etl/config/ml_modelling.yml) to perform training and evaluation of the emulation model, and to actually produce
+   emulated data, which are saved to the `etl/inferred_data.csv`. This file is used by the [prs_density_inference.py](../etl/prs/prs_density_inference.py) to
+   concatenate these data to those from the formal computation to perform the density inference. In our
   case, XGBoost worked best, and we used this model to perform emulation.

-   There is a final script `prs_density_inference.py`, that is used for preparing the KDE model and to perform density
-   inference, given the measured ratios. It uses the YAML file `etl/config/density_inference_input.yml` to provide the
-   needed input for the script. It produces the `output/run_type/{provided_run_type}/density_pdf*.png` output file,
-   where `provided_run_type` is defined in the global configuration file and the wildcard represents the
-   source name.
-
-The entire ETL pipeline is executed by the `main.py` script, where it is possible to define overrides for the default
+The entire ETL pipeline is executed by the [main.py](../etl/main.py) script, where it is possible to define overrides for the default
 values in the specific stage configuration file (so that it's possible to specify an entire grid of models). These
-overrides are included into the `etl/config/config.yml` configuration file.
+overrides are included into the [etl/config/config.yml](../etl/config/config.yml) configuration file.

 4. **Additional files**:
-The script `prs_analytical_representations.py` provides a convenient way of checking the analytical representations of the ratio vs. density curves.
-The file `prs_check_biases_poc_sample.py` checks for biases in the massive clump sample used in the proof-of-concept.
-The scripts `prs_poc_figures.py`, and `prs_poc_latex_table.py` can be used to reproduce the content of the paper, regarding the POC.
+The script [prs_analytical_representations.py](../etl/prs/prs_analytical_representations.py) provides a convenient way of checking the analytical representations of the ratio vs. density curves.
+The file [prs_check_biases_poc_sample.py](../etl/prs/prs_check_biases_poc_sample.py) checks for biases in the massive clump sample used in the proof-of-concept.
+The scripts [prs_poc_figures.py](../etl/prs/prs_poc_figures.py), and [prs_poc_latex_table.py](../etl/prs/prs_poc_latex_table.py) can be used to reproduce the content of the paper, regarding the POC.

 ### Running the pipeline

 The pipeline is now dockerized. To run it clone the repository and in bash run:

-`docker compose up --build`
+```bash 
+docker compose up --build
+```

 from the root project directory. Docker compose will bring up a local database for your runs, with a persistent storage,
 so that all the results can be found and inspected. Similarly, a local volume is mounted, so that intermediate files (
@@ -125,7 +123,7 @@ If density inference is performed, the posterior PDF is saved as `density_infere
 In this paragraph we describe in more detail the parameters that can be set in the different configuration files, and
 their meaning.

-#### The staging configuration file (`etl/stg/config/config.yml`)
+#### The staging configuration file ([etl/stg/config/config.yml](../etl/stg/config/config.yml))

 The staging config file has three main categories:

@@ -178,7 +176,7 @@ The staging config file has three main categories:
    * collision_partners: the list of collision partners to be used; it must appear in the correct order as in the
      molecule_{molname}.inp file of the molecule to be simulated, e.g. ['p-h2']

-#### The model configuration file (`etl/mdl/config/config.yml`)
+#### The model configuration file ([etl/mdl/config/config.yml](../etl/mdl/config/config.yml))

 The model configuration file has two categories:

@@ -201,7 +199,7 @@ The model configuration file has two categories:
    * threads: number of threads to be used by radmc
    * image_size_pc: the size of the image to produce; it is useful to get a good alignment

-#### The global configuration file (`etl/config/config.yml`)
+#### The global configuration file ([etl/config/config.yml](../etl/config/config.yml))

 The global configuration file, in addition to the run_type name, has two categories, "computation" and "overrides":

@@ -226,7 +224,7 @@ The global configuration file, in addition to the run_type name, has two categor
      ratios
      to compute.

-#### The density inference input file (`etl/config/density_inference_input.yml`)
+#### The density inference input file ([etl/config/density_inference_input.yml](../etl/config/density_inference_input.yml))

 This file contains the measured ratios, their uncertainties, and a few other parameters to perform the inference.

@@ -248,7 +246,7 @@ This file contains the measured ratios, their uncertainties, and a few other par
  distributions.
 * nthreads [optional]: the number of threads to be used for computation.

-#### The ML-emulation input file (`etl/config/ml_modelling.yml`)
+#### The ML-emulation input file ([etl/config/ml_modelling.yml](../etl/config/ml_modelling.yml))

 This file determines what tasks are performed as part of the `prs_ml_training.py` `prs_expand_modelling_with_ml.py`
 scripts.