Skip to content
Snippets Groups Projects
Commit e25262a9 authored by Andrea Giannetti's avatar Andrea Giannetti
Browse files

Updated README.md

parent dab489de
No related branches found
No related tags found
No related merge requests found
The Swiss Army Knife project has the objective to assess whether CH3OH can be used as an effective volume density probe and in which regime.
The Swiss Army Knife project has the objective to calibrate specific line ratios as effective and efficient density probes, to offset the asymmetry between ease of use temperature and density tracers in molecular gas. By providing convenient tracers of number density we hope to reduce the friction to their use, so that this parameter is estimated reliably and with care.
Important resources are listed in the [documentation.md file](documentation/documentation.md).
\ No newline at end of file
The simplest way to obtain and use SAK is by pulling the docker image from this repository through the command:
```bash
docker pull git.ia2.inaf.it:5050/andrea.giannetti/swiss_army_knife_stable
```
or if you prefer an apptainer (singularity) image:
```bash
singularity pull --disable-cache docker://git.ia2.inaf.it:5050/andrea.giannetti/swiss_army_knife_stable
```
A docker-compose script is included in the root folder of this repository ([docker-compose.yaml](docker-compose.yaml)), so that the entire program can be executed via:
```bash
docker compose up --build
```
The following parameters can be set by adding the `command` override in the `etl` section of the docker-compose file:
* --run_id: the run id of the grid to process; if not provided, generates a new run_id;
* --cleanup_scratches: whether to empty the mld/scratches directory;
* --distributed: whether the grid is processed in a distributed environment, in which case it uses a queue from the database to process all models in the grid, otherwise, executes with multiprocessing.
For example:
```
etl:
[...]
command: 'python main.py --distributed False --run_id \"7dd5b365-875e-4857-ae11-2707820a33c1\"'
[...]
```
If you prefer to have the source code, you can get it through:
```bash
git clone https://www.ict.inaf.it/gitlab/andrea.giannetti/swiss_army_knife_stable.git
```
and use it to rebuild the images or run it directly.
Input and outputs are described in detail in the [documentation.md file](documentation/documentation.md)
## Quickstart
By running the [main.py](main.py) file (executed by the pipeline, that can be run as described above) SAK creates a grid of massive clumps similar to the one used in the reference paper. The grid is sparser for faster computation.
The global configuration file contains the grid spacing (`dust_temperature_step`, `gas_density_step`), and the lines and ratios to process (`lines_to_process`).
After the pipeline has finished, it's possible to train the ML models (or use the ones provided in the package registry), if desired by running:
```bash
python prs_ml_training.py
python prs_expand_modelling_with_ml.py]
```
or
```bash
python prs_ml_training_ratio.py
python prs_expand_ratio_modelling_with_ml.py]
```
the second option is the preferred one, emulating directly the ratio.
Finally, to perform inference, run the following commands:
```bash
python prepare_inference_input.py
python prs_density_inference.py
```
\ No newline at end of file
......@@ -8,65 +8,63 @@ paper https://git.overleaf.com/6373bb408e4040043398e495.
The project remote repository for the code is:
https://www.ict.inaf.it/gitlab/andrea.giannetti/swiss_army_knife_stable/-/tree/main
The first paper is here:
https://git.overleaf.com/6373bb408e4040043398e495
## Pipeline
The SAK non-LTE, toy model pipeline uses three main layers:
1. **Staging layer:** In the staging layer (`etl/stg`), the `stg_radmc_input_generator.py` file takes care of preparing
1. **Staging layer:** In the staging layer (`etl/stg`), the [stg_radmc_input_generator.py](../etl/stg/stg_radmc_input_generator.py) file takes care of preparing
the input files for RADMC, and saves them in the `etl/mdl/radmc_files` folder.
The `etl/stg/config/config.yml` file contains the default values of the parameters used to prepare the RADMC files
The [etl/stg/config/config.yml](../etl/stg/config/config.yml) file contains the default values of the parameters used to prepare the RADMC files
for the postprocessing. All of them are described in detail in the following.
2. **Model layer:** The model layer (`etl/mdl`) takes care of preparing and executing the RADMC command according to the
configuration in the `etl/mdl/config/config.yml` file. This is done by the `mdl_execute_radmc_command.py` script,
configuration in the [etl/mdl/config/config.yml](../etl/mdl/config/config.yml) file. This is done by the [mdl_execute_radmc_command.py](../etl/mdl/mdl_execute_radmc_command.py) script,
that also creates `radmc3d_postprocessing.sh`.
The results are then converted to fits cubes, which are saved by default into `prs/fits/cubes`, for later
processing.
3. **Presentation layer:** In the presentation layer (`etl/prs`) moment-0 maps and line-ratio maps are computed
executing the `prs_compute_integrated_fluxes_and_ratios.py` script. At the moment, the integration limits cannot be
executing the [prs_compute_integrated_fluxes_and_ratios.py](../etl/prs/prs_compute_integrated_fluxes_and_ratios.py) script. At the moment, the integration limits cannot be
specified, and the entire cube is collapsed. *WARNING:* Pay attention to the presence of secondary modeled lines in
the simulated spectra.
The script `prs_inspect_results.py` reduces the ratio maps to single points and produces an image of the ratio values
The script [prs_inspect_results.py](../etl/prs/prs_inspect_results.py) reduces the ratio maps to single points and produces an image of the ratio values
as a function of gas number density and temperature. Be aware that at the moment the query is hardcoded and works for
SAK-generated models only.
The script `prs_prepare_backup.py` compressed and copies the output of a model run for sharing.
There is a final script [prs_density_inference.py](../etl/prs/prs_density_inference.py), that is used for preparing the KDE model and to perform density
inference, given the measured ratios. It uses the YAML file [etl/config/density_inference_input.yml](../etl/config/density_inference_input.yml) to provide the
needed input for the script. It produces the `output/run_type/{provided_run_type}/density_pdf*.png` output file,
where `provided_run_type` is defined in the global configuration file and the wildcard represents the
source name.
The scripts `prs_ml_training.py`, `prs_ml_training_ratio.py` and `prs_expand_modelling_with_ml.py` `prs_expand_ratio_modelling_with_ml.py` can be run to perform ML-assisted emulation of
The scripts [prs_ml_training.py](../etl/prs/prs_ml_training.py), [prs_ml_training_ratio.py](../etl/prs/prs_ml_training_ratio.py) and [prs_expand_modelling_with_ml.py](../etl/prs/prs_expand_modelling_with_ml.py) [prs_expand_ratio_modelling_with_ml.py](../etl/prs/prs_expand_ratio_modelling_with_ml.py) can be run before the inference to perform ML-assisted emulation of
the modelled data, in
order to expand the grid performed with actual RT computation. These scripts rely on
the `etl/config/ml_modelling.yml` to perform training and evaluation of the emulation model, and to actually produce
emulated data, which are saved to the `etl/inferred_data.csv`. This file is used by the `prs_density_inference.py` to
concatenate these data to those from the formal computation to perform the density inference (see below). In our
the [etl/config/ml_modelling.yml](../etl/config/ml_modelling.yml) to perform training and evaluation of the emulation model, and to actually produce
emulated data, which are saved to the `etl/inferred_data.csv`. This file is used by the [prs_density_inference.py](../etl/prs/prs_density_inference.py) to
concatenate these data to those from the formal computation to perform the density inference. In our
case, XGBoost worked best, and we used this model to perform emulation.
There is a final script `prs_density_inference.py`, that is used for preparing the KDE model and to perform density
inference, given the measured ratios. It uses the YAML file `etl/config/density_inference_input.yml` to provide the
needed input for the script. It produces the `output/run_type/{provided_run_type}/density_pdf*.png` output file,
where `provided_run_type` is defined in the global configuration file and the wildcard represents the
source name.
The entire ETL pipeline is executed by the `main.py` script, where it is possible to define overrides for the default
The entire ETL pipeline is executed by the [main.py](../etl/main.py) script, where it is possible to define overrides for the default
values in the specific stage configuration file (so that it's possible to specify an entire grid of models). These
overrides are included into the `etl/config/config.yml` configuration file.
overrides are included into the [etl/config/config.yml](../etl/config/config.yml) configuration file.
4. **Additional files**:
The script `prs_analytical_representations.py` provides a convenient way of checking the analytical representations of the ratio vs. density curves.
The file `prs_check_biases_poc_sample.py` checks for biases in the massive clump sample used in the proof-of-concept.
The scripts `prs_poc_figures.py`, and `prs_poc_latex_table.py` can be used to reproduce the content of the paper, regarding the POC.
The script [prs_analytical_representations.py](../etl/prs/prs_analytical_representations.py) provides a convenient way of checking the analytical representations of the ratio vs. density curves.
The file [prs_check_biases_poc_sample.py](../etl/prs/prs_check_biases_poc_sample.py) checks for biases in the massive clump sample used in the proof-of-concept.
The scripts [prs_poc_figures.py](../etl/prs/prs_poc_figures.py), and [prs_poc_latex_table.py](../etl/prs/prs_poc_latex_table.py) can be used to reproduce the content of the paper, regarding the POC.
### Running the pipeline
The pipeline is now dockerized. To run it clone the repository and in bash run:
`docker compose up --build`
```bash
docker compose up --build
```
from the root project directory. Docker compose will bring up a local database for your runs, with a persistent storage,
so that all the results can be found and inspected. Similarly, a local volume is mounted, so that intermediate files (
......@@ -125,7 +123,7 @@ If density inference is performed, the posterior PDF is saved as `density_infere
In this paragraph we describe in more detail the parameters that can be set in the different configuration files, and
their meaning.
#### The staging configuration file (`etl/stg/config/config.yml`)
#### The staging configuration file ([etl/stg/config/config.yml](../etl/stg/config/config.yml))
The staging config file has three main categories:
......@@ -178,7 +176,7 @@ The staging config file has three main categories:
* collision_partners: the list of collision partners to be used; it must appear in the correct order as in the
molecule_{molname}.inp file of the molecule to be simulated, e.g. ['p-h2']
#### The model configuration file (`etl/mdl/config/config.yml`)
#### The model configuration file ([etl/mdl/config/config.yml](../etl/mdl/config/config.yml))
The model configuration file has two categories:
......@@ -201,7 +199,7 @@ The model configuration file has two categories:
* threads: number of threads to be used by radmc
* image_size_pc: the size of the image to produce; it is useful to get a good alignment
#### The global configuration file (`etl/config/config.yml`)
#### The global configuration file ([etl/config/config.yml](../etl/config/config.yml))
The global configuration file, in addition to the run_type name, has two categories, "computation" and "overrides":
......@@ -226,7 +224,7 @@ The global configuration file, in addition to the run_type name, has two categor
ratios
to compute.
#### The density inference input file (`etl/config/density_inference_input.yml`)
#### The density inference input file ([etl/config/density_inference_input.yml](../etl/config/density_inference_input.yml))
This file contains the measured ratios, their uncertainties, and a few other parameters to perform the inference.
......@@ -248,7 +246,7 @@ This file contains the measured ratios, their uncertainties, and a few other par
distributions.
* nthreads [optional]: the number of threads to be used for computation.
#### The ML-emulation input file (`etl/config/ml_modelling.yml`)
#### The ML-emulation input file ([etl/config/ml_modelling.yml](../etl/config/ml_modelling.yml))
This file determines what tasks are performed as part of the `prs_ml_training.py` `prs_expand_modelling_with_ml.py`
scripts.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment