diff --git a/README.md b/README.md
index 5d953917fcb86b8cd9369ca1dabffdd63418b51b..92638fd799fb3ffa77ca7dc9d34e08d81182f99e 100644
--- a/README.md
+++ b/README.md
@@ -1,92 +1,3 @@
-# swiss_army_knife
+The Swiss Army Knife project has the objective to assess whether CH3OH can be used as an effective volume density probe and in which regime.
 
-
-
-## Getting started
-
-To make it easy for you to get started with GitLab, here's a list of recommended next steps.
-
-Already a pro? Just edit this README.md and make it your own. Want to make it easy? [Use the template at the bottom](#editing-this-readme)!
-
-## Add your files
-
-- [ ] [Create](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#create-a-file) or [upload](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#upload-a-file) files
-- [ ] [Add files using the command line](https://docs.gitlab.com/ee/gitlab-basics/add-file.html#add-a-file-using-the-command-line) or push an existing Git repository with the following command:
-
-```
-cd existing_repo
-git remote add origin https://www.ict.inaf.it/gitlab/andrea.giannetti/swiss_army_knife.git
-git branch -M main
-git push -uf origin main
-```
-
-## Integrate with your tools
-
-- [ ] [Set up project integrations](https://www.ict.inaf.it/gitlab/andrea.giannetti/swiss_army_knife/-/settings/integrations)
-
-## Collaborate with your team
-
-- [ ] [Invite team members and collaborators](https://docs.gitlab.com/ee/user/project/members/)
-- [ ] [Create a new merge request](https://docs.gitlab.com/ee/user/project/merge_requests/creating_merge_requests.html)
-- [ ] [Automatically close issues from merge requests](https://docs.gitlab.com/ee/user/project/issues/managing_issues.html#closing-issues-automatically)
-- [ ] [Enable merge request approvals](https://docs.gitlab.com/ee/user/project/merge_requests/approvals/)
-- [ ] [Automatically merge when pipeline succeeds](https://docs.gitlab.com/ee/user/project/merge_requests/merge_when_pipeline_succeeds.html)
-
-## Test and Deploy
-
-Use the built-in continuous integration in GitLab.
-
-- [ ] [Get started with GitLab CI/CD](https://docs.gitlab.com/ee/ci/quick_start/index.html)
-- [ ] [Analyze your code for known vulnerabilities with Static Application Security Testing(SAST)](https://docs.gitlab.com/ee/user/application_security/sast/)
-- [ ] [Deploy to Kubernetes, Amazon EC2, or Amazon ECS using Auto Deploy](https://docs.gitlab.com/ee/topics/autodevops/requirements.html)
-- [ ] [Use pull-based deployments for improved Kubernetes management](https://docs.gitlab.com/ee/user/clusters/agent/)
-- [ ] [Set up protected environments](https://docs.gitlab.com/ee/ci/environments/protected_environments.html)
-
-***
-
-# Editing this README
-
-When you're ready to make this README your own, just edit this file and use the handy template below (or feel free to structure it however you want - this is just a starting point!). Thank you to [makeareadme.com](https://www.makeareadme.com/) for this template.
-
-## Suggestions for a good README
-Every project is different, so consider which of these sections apply to yours. The sections used in the template are suggestions for most open source projects. Also keep in mind that while a README can be too long and detailed, too long is better than too short. If you think your README is too long, consider utilizing another form of documentation rather than cutting out information.
-
-## Name
-Choose a self-explaining name for your project.
-
-## Description
-Let people know what your project can do specifically. Provide context and add a link to any reference visitors might be unfamiliar with. A list of Features or a Background subsection can also be added here. If there are alternatives to your project, this is a good place to list differentiating factors.
-
-## Badges
-On some READMEs, you may see small images that convey metadata, such as whether or not all the tests are passing for the project. You can use Shields to add some to your README. Many services also have instructions for adding a badge.
-
-## Visuals
-Depending on what you are making, it can be a good idea to include screenshots or even a video (you'll frequently see GIFs rather than actual videos). Tools like ttygif can help, but check out Asciinema for a more sophisticated method.
-
-## Installation
-Within a particular ecosystem, there may be a common way of installing things, such as using Yarn, NuGet, or Homebrew. However, consider the possibility that whoever is reading your README is a novice and would like more guidance. Listing specific steps helps remove ambiguity and gets people to using your project as quickly as possible. If it only runs in a specific context like a particular programming language version or operating system or has dependencies that have to be installed manually, also add a Requirements subsection.
-
-## Usage
-Use examples liberally, and show the expected output if you can. It's helpful to have inline the smallest example of usage that you can demonstrate, while providing links to more sophisticated examples if they are too long to reasonably include in the README.
-
-## Support
-Tell people where they can go to for help. It can be any combination of an issue tracker, a chat room, an email address, etc.
-
-## Roadmap
-If you have ideas for releases in the future, it is a good idea to list them in the README.
-
-## Contributing
-State if you are open to contributions and what your requirements are for accepting them.
-
-For people who want to make changes to your project, it's helpful to have some documentation on how to get started. Perhaps there is a script that they should run or some environment variables that they need to set. Make these steps explicit. These instructions could also be useful to your future self.
-
-You can also document commands to lint the code or run tests. These steps help to ensure high code quality and reduce the likelihood that the changes inadvertently break something. Having instructions for running tests is especially helpful if it requires external setup, such as starting a Selenium server for testing in a browser.
-
-## Authors and acknowledgment
-Show your appreciation to those who have contributed to the project.
-
-## License
-For open source projects, say how it is licensed.
-
-## Project status
-If you have run out of energy or time for your project, put a note at the top of the README saying that development has slowed down or stopped completely. Someone may choose to fork your project or volunteer to step in as a maintainer or owner, allowing your project to keep going. You can also make an explicit request for maintainers.
+Important resources are listed in the [documentation.md file](documentation/documentation.md).
\ No newline at end of file
diff --git a/documentation/documentation.md b/documentation/documentation.md
new file mode 100644
index 0000000000000000000000000000000000000000..b6546aabaf6f8c92e34821d9cd9236e5fe455098
--- /dev/null
+++ b/documentation/documentation.md
@@ -0,0 +1,439 @@
+# SAK documentation
+
+This file contains the overview of the SAK tool for inferring volume densities, described in the
+paper https://git.overleaf.com/6373bb408e4040043398e495.
+
+## Repositories
+
+The project remote repository for the code is:
+https://www.ict.inaf.it/gitlab/andrea.giannetti/swiss_army_knife_stable/-/tree/main
+
+The first paper is here:
+https://git.overleaf.com/6373bb408e4040043398e495
+
+## Pipeline
+
+The SAK non-LTE, toy model pipeline uses three main layers:
+
+1. **Staging layer:** In the staging layer (`etl/stg`), the `stg_radmc_input_generator.py` file takes care of preparing
+   the input files for RADMC, and saves them in the `etl/mdl/radmc_files` folder.
+
+   The `etl/stg/config/config.yml` file contains the default values of the parameters used to prepare the RADMC files
+   for the postprocessing. All of them are described in detail in the following.
+
+2. **Model layer:** The model layer (`etl/mdl`) takes care of preparing and executing the RADMC command according to the
+   configuration in the `etl/mdl/config/config.yml` file. This is done by the `mdl_execute_radmc_command.py` script,
+   that also creates `radmc3d_postprocessing.sh`.
+
+   The results are then converted to fits cubes, which are saved by default into `prs/fits/cubes`, for later
+   processing.
+
+3. **Presentation layer:** In the presentation layer (`etl/prs`) moment-0 maps and line-ratio maps are computed
+   executing the `prs_compute_integrated_fluxes_and_ratios.py` script. At the moment, the integration limits cannot be
+   specified, and the entire cube is collapsed. *WARNING:* Pay attention to the presence of secondary modeled lines in
+   the simulated spectra.
+
+   The script `prs_inspect_results.py` reduces the ratio maps to single points and produces an image of the ratio values
+   as a function of gas number density and temperature. Be aware that at the moment the query is hardcoded and works for
+   SAK-generated models only.
+
+   The script `prs_prepare_backup.py` compressed and copies the output of a model run for sharing.
+
+   The scripts `prs_ml_training.py`, `prs_ml_training_ratio.py` and `prs_expand_modelling_with_ml.py` `prs_expand_ratio_modelling_with_ml.py` can be run to perform ML-assisted emulation of
+   the modelled data, in
+   order to expand the grid performed with actual RT computation. These scripts rely on
+   the `etl/config/ml_modelling.yml` to perform training and evaluation of the emulation model, and to actually produce
+   emulated data, which are saved to the `etl/inferred_data.csv`. This file is used by the `prs_density_inference.py` to
+   concatenate these data to those from the formal computation to perform the density inference (see below). In our
+   case, XGBoost worked best, and we used this model to perform emulation.
+
+   There is a final script `prs_density_inference.py`, that is used for preparing the KDE model and to perform density
+   inference, given the measured ratios. It uses the YAML file `etl/config/density_inference_input.yml` to provide the
+   needed input for the script. It produces the `output/run_type/{provided_run_type}/density_pdf*.png` output file,
+   where `provided_run_type` is defined in the global configuration file and the wildcard represents the
+   source name.
+
+The entire ETL pipeline is executed by the `main.py` script, where it is possible to define overrides for the default
+values in the specific stage configuration file (so that it's possible to specify an entire grid of models). These
+overrides are included into the `etl/config/config.yml` configuration file.
+
+4. **Additional files**:
+The script `prs_analytical_representations.py` provides a convenient way of checking the analytical representations of the ratio vs. density curves.
+The file `prs_check_biases_poc_sample.py` checks for biases in the massive clump sample used in the proof-of-concept.
+The scripts `prs_poc_figures.py`, and `prs_poc_latex_table.py` can be used to reproduce the content of the paper, regarding the POC.
+
+### Running the pipeline
+
+The pipeline is now dockerized. To run it clone the repository and in bash run:
+
+`docker compose up --build`
+
+from the root project directory. Docker compose will bring up a local database for your runs, with a persistent storage,
+so that all the results can be found and inspected. Similarly, a local volume is mounted, so that intermediate files (
+radmc files, cubes, moment zero- and ratio images) can be found in the project directory structure, after a run.
+Remember to set the environment variables:
+
+* DB_USER;
+* DB_PASS;
+* DB_HOST;
+* DB_NAME;
+  so that the database is correctly initialized.
+
+The main routine accepts the following parameters as input:
+
+* --run_id: the run id of the grid to process; if not provided, generates a new run_id;
+* --cleanup_scratches: whether to empty the mld/scratches directory;
+* --distributed: whether the grid is processed in a distributed environment, in which case it uses a queue from the
+  database to process all models in the grid, otherwise, executes with multiprocessing.
+
+To reset the database, use the command:
+
+`docker volume rm swiss_army_knife_db`.
+
+Be aware that at the moment the first run (also after DB reset) fails due to an issue in the DB initialization. Run the
+docker compose command once more to start the pipeline.
+
+The pipeline produces a series of files that are used to produce the final output.
+
+* It compresses and archives in `stg/archive` all the input files for radmc in a zip file, whose name is also used as a
+  primary key part in the database. This file can be used to inspect run results and/or to repeat a run manually if
+  needed.
+* It saves the H2 volume density, dust temperature grids, and the volume density of the species included in the model as
+  fits files in `prs/fits/grids`.
+* Converts the `image.out` file to a fits cube in `prs/fits/cubes`.
+* Computes the moment-zero maps by integrating the full cube, persisting them in `prs/fits/moments`.
+* Determines the line ratios pixel-by-pixel and saves the maps in `prs/fits/ratios`.
+
+After the last model is completed, the results are summarized in a series of figures, that are saved in
+the `prs/output/run_type/{provided_run_type}/figures`
+folder.
+
+The files named `ratio_grid_lines_*.png` show the average ratio across the entire clump, as a function of the
+characteristic temperature and volume density.
+
+The pixel-by-pixel ratios as a function of the average line-of-sight H2 volume density is shown
+in `ratio_vs_avg_density_los_*.png`. The KDE-smoothed data are computed and save
+in `ratio_vs_avg_density_los_kde_*.png`.
+
+To check the opacity of the lines, we include a plot showing the integrated emission px-by-px as a function of the
+molecular colum density in that pixel (`coldens_moments_lines_*.png`)).
+
+If density inference is performed, the posterior PDF is saved as `density_inference.png`.
+
+### Configuration files parameters
+
+In this paragraph we describe in more detail the parameters that can be set in the different configuration files, and
+their meaning.
+
+#### The staging configuration file (`etl/stg/config/config.yml`)
+
+The staging config file has three main categories:
+
+* **grid:**
+    * grid_type: regular or spherical
+    * coordinate_system: cartesian or polar
+    * central_density: the central gas number density of the toy model. If you are using a power-law distribution, it
+      corresponds also to the maximum possible density in the grid.
+    * density_unit: the units in which the number density is expressed, e.g. 'cm^-3'
+    * density_powerlaw_idx: the power-law index of the density distribution
+    * density_at_reference: the gas number density at a reference value, e.g. at 1 pc
+    * distance_reference: the reference radius for scaling the power-law
+    * distance_reference_unit: the units of the reference radius, e.g. 'pc', 'AU', 'cm'
+    * dust_temperature: the dust temperature value
+    * dust_temperature_unit: the dust temperature unit
+    * dust_temperature_powerlaw_idx: the dust temperature power-law index
+    * microturbulence: the unresolved turbulent velocity of the gas
+    * microturbulence_unit: the units of the microturbulence value, e.g. 'km/s'
+    * dimN: a dictionary-like parameter, it should include the 'size', 'size_unit', 'shape', and 'refpix' keys. E.g. {"
+      size":1, "size_units": "pc", "shape": 3, "refpix": 1}. If only dim1 is provided, the grid is cubic.
+        * size: the grid size in physical units
+        * size_unit: the units in which the grid size is expressed, e.g. 'pc'
+        * shape: the number of cells in the grid for this axis
+        * refpix: the reference pixel that corresponds to the grid "centre" from which the distance are computed for
+          power-law models, for instance
+    * velocity_field: velocity field to apply to the gas, it can only be 'solid' at the moment (the gas is assumed to
+      rotate around the y axis). Power-law is in principle also supported, but the wiring is still to be implemented.
+    * velocity_gradient: the value of the velocity gradient for solid-body roation.
+    * velocity_gradient_unit: the unit of the velocity gradient, e.g. "km/s/pc"
+
+* **stars:**
+  Adding the star configuration section makes the program compute the dust temperature distribution given the properties
+  of the stars included. _Caveat:_ be careful, test runs show unexpectedtly low dust temperatures using a blackbody.
+    * nstars: the number of stars to include
+    * rstars: the radius of the star in cm; it is normally ignored, unless the parameter `istar_sphere` is set to 1
+    * mstars: the mass in grams; not used in the current version of RADMC
+    * star_positions: x, y, z coordinates of the star in cm, expressed as a list of lists
+    * star_fluxes: stellar fluxes at each of the computed lambdas in erg cm$^-2$ s$-1$ Hz$-1$; if negative, interpreted
+      as the peak temperature of a blackbody
+    * nlambdas: number of wavelengths to compute
+    * spacing: log or linear
+    * lambdas_micron_limits: the limits in wavelength to consider in the run, expressed in micron
+
+* **lines:**
+    * species_to_include: the list of molecular species to include in the RADMC postprocessing, e.g. ['e-ch3oh']
+    * molecular_abundances: a dict-like parameter, containing the species name and the corresponding fractional
+      abundance, e.g. {"e-ch3oh": 1e-8, "p-h2": 1}
+    * lines_mode: the line transfer mode. It can be 'lte', 'lvg', 'optically_thin_non_lte', 'user_defined_populations' (
+      see the RADMC documentation if in doubt)
+    * collision_partners: the list of collision partners to be used; it must appear in the correct order as in the
+      molecule_{molname}.inp file of the molecule to be simulated, e.g. ['p-h2']
+
+#### The model configuration file (`etl/mdl/config/config.yml`)
+
+The model configuration file has two categories:
+
+* **radmc_postprocessing:**
+    * nphotons: the number of photons to use for the postprocessing
+    * scattering_mode_max: override the scattering settings in the dust opacity files; 0 excludes scattering, 1 treats
+      it in an isotropic way (if defined), 2 includes anisotropic scattering (if defined)
+    * iranfreqmode: 1
+    * tgas_eq_tdust: whether the gas temperature is equal to the dust temperature (if not, it must be specified or
+      computed separately!)
+
+* **radmc_observation:**
+    * inclination: the inclination wrt. the line-of-sight in degrees
+    * position_angle: the position angle wrt. the observer
+    * imolspec: the index of the species to be modeled; defaults to the first
+    * iline: the line identifier for the line to be modeled, according to the molecule_{molname}.inp
+    * width_kms: the range in km/s to be modeled around the line
+    * nchannels: the number of channels to be considered
+    * npix: the number of pixels in the final image; WARNING: it must be a multiple of the grid shape
+    * threads: number of threads to be used by radmc
+    * image_size_pc: the size of the image to produce; it is useful to get a good alignment
+
+#### The global configuration file (`etl/config/config.yml`)
+
+The global configuration file, in addition to the run_type name, has two categories, "computation" and "overrides":
+
+* run_type: name of the folder where the results are grouped and saved in the prs step
+* **computation:**
+    * threads: number of threads to include in the multiprocessing pool
+
+* **overrides:**
+    * dust_temperature_grid_type: the spacing in the dust temperature grid, can be: 'linear', 'log'
+    * dust_temperature_limits: the limits in the dust temperature grid, e.g. [10, 30]; the last point is excluded
+    * dust_temperature_step: the step size in the grid; if the spacing is logarithmic, it represents the increase
+      factor,
+      e.g. 10 means steps of one order of magnitude
+    * gas_density_grid_type: the spacing in the gas number density grid, can be: 'linear', 'log'
+    * gas_density_limits: the limits in the gas number density grid, e.g. [1e4, 1e8]; the last point is excluded
+    * gas_density_step: the step size in the grid; if the spacing is logarithmic, it represents the increase factor,
+      e.g. 10
+      means steps of one order of magnitude
+    * gas_density_unit: the units in which the gas number density is expressed, e.g. 'cm^-3'
+    * lines_to_process: the list of line identifiers to process, according to the molecule_{molname}.inp file,
+      e.g. [['87', '86'], ['88', '87']]. The parameter is gives as a list of lists, so that the program knows which
+      ratios
+      to compute.
+
+#### The density inference input file (`etl/config/density_inference_input.yml`)
+
+This file contains the measured ratios, their uncertainties, and a few other parameters to perform the inference.
+
+* use_model_for_inference: the name of the folder where the model results to be used are stored.
+* integrated_intensities_uncertainties: a (potentially nested) dictionary of measured integrated intensities. The keys
+  are the line id and the source name that will be concatenated to the output plot names. A subset can be given,
+  depending on what is available from observations. Each key can contain both a float representing the integrated
+  intensity or a dictionary of sources, each with their own data.
+* measured_integrated_intensities: the rms uncertainty associated to the integrated intensity measurement. It has the
+  same structure as the integrated intensity dictionary.
+* ratios_to_include: a list of strings representing the ratios to consider in the density determination, in the form
+  of 'transition_id_1-transition_id_2'.
+* simulated_ratio_realizations: the number of simulated ratios to generate
+* recompute_kde: True - retrain the model, False - unpickles it
+* probability_threshold: the probability threshold to be used to compute the highest probability density interval (this
+  is the probability in the wings!).
+* limit_rows [optional]: limits the number of rows to extract from the data to reduce computational time for tests.
+* points_per_axis [optional]: the number of points on the density and ratio axes, to compute the probability
+  distributions.
+* nthreads [optional]: the number of threads to be used for computation.
+
+#### The ML-emulation input file (`etl/config/ml_modelling.yml`)
+
+This file determines what tasks are performed as part of the `prs_ml_training.py` `prs_expand_modelling_with_ml.py`
+scripts.
+
+* retrain: boolean to specify whether the model should be retrained from scratch (`prs_ml_training.py`) or reloaded to
+  perform inference (`prs_expand_modelling_with_ml.py`).
+* overrides: same as in the global configuration file.
+* model_type: the kind of model to use to perform emulation (relevant only if retrain is true); admitted values are '
+  XGBoost', 'XGBoost_gridsearch' (XGBoost with gridsearch on the hyperparameters and CV), 'RandomForest', 'auto_skl' (
+  AutoSklearnRegressor).
+* model_parameters: depending on model_type, a dictionary of the parameters to pass to the model. For example, XGBoost
+  supports `n_estimators`, `max_depth`, `learning_rate`, etc; compare the documentation of the xgboost, sklearn, and
+  auto-sklearn packages for a comprehensive list. For the XGBoost with grid search, it is a nested dictionary with two
+  keys:
+    * param_grid: a dictionary of the hyperparameters to explore, each with a corresponding list of [start, stop, step]
+      values to pass to numpy.arange.
+    * param_gridsearch: a dictionary of parameters for the GridSearchCV object; verbosity is put to 3 by default.
+
+## Database
+
+The database is an instance of PostgreSQL, created via docker compose or with the `apptainer.def` singularity definition
+file.
+
+The database ER diagram is shown in `publications/6373bb408e4040043398e495/figures/sak_database.png`. In the following
+we document the tables and their relationships.
+
+### Description of the tables
+
+The tables and their column are described in detail in this section.
+
+#### grid_parameters
+
+The `grid_parameters` table contains the metadata of the models grid computed in the SAK run. The primary key is
+composed of the `run_id` and `zipped_grid_name` columns.
+
+| Column name                     | Type         | PK    | FK    | FK Table | Description                                                                                                                                                               |
+|---------------------------------|--------------|-------|-------|----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| zipped_grid_name                | varchar(150) | True  |       |          | The name of the compressed grid file containing all the input used to generate the model                                                                                  |
+| grid_type                       | text         | False |       |          | The type of the grid to be constructed (to date regular or spherical)                                                                                                     |
+| coordinate_system               | text         | False |       |          | The coordinate system to use in the grid (to date cartesian or polar)                                                                                                     |
+| central_density                 | Float        | False |       |          | The maximum density above which the grid is cut and filled with; if 1e8 cm^-3, every point above this threshold is set to 1e8 cm^-3 instead; expressed in cm^-3           |
+| density_powerlaw_index          | Float        | False |       |          | The power-law index used to construct the volume density grid                                                                                                             |
+| density_at_reference            | Float        | False |       |          | The value of the volume density at the reference radius; expressed in cm^-3                                                                                               |
+| dust_temperature                | Float        | False |       |          | The maximum value of the dust temperature above which the grid is cut and filled with; if 2000K, every point above this threshold is set to 2000K instead; expressed in K |
+| dust_temperature_powerlaw_index | Float        | False |       |          | The power-law index used to construct the dust temperature grid                                                                                                           |
+| dust_temperature_at_reference   | Float        | False |       |          | The dust temperature value at the reference radius; expressed in K                                                                                                        |
+| microturbulence                 | Float        | False |       |          | The microturbulence value (standard deviation) to add to the thermal broadening of the lines; expressed in km/s                                                           |
+| velocity_field                  | text         | False |       |          | The type of velocity field to use to construct the velocity grid (to date 'solid' for solid-body rotation                                                                 |
+| velocity_gradient               | text         | False |       |          | The value of the velocity gradient to use for the velocity field construction; expressed in km/s/pc for solid-body rotation                                               |
+| velocity_powerlaw_index         | Float        | False |       |          | The index of the power-law used to construct the velocity grid                                                                                                            |
+| velocity_at_reference           | Float        | False |       |          | The value of the velocity at the reference radius; expressed in km/s                                                                                                      |
+| distance_reference              | Float        | False |       |          | The value of the reference radius; expressed in pc                                                                                                                        |
+| maximum_radius                  | Float        | False |       |          | Grid cutoff radius, the external radius of the fragment                                                                                                                   |
+| grid_size_1                     | Float        | False |       |          | The dimension of the grid along the x axis; expressed in pc                                                                                                               |
+| grid_shape_1                    | Float        | False |       |          | The number of pixels along the x axis                                                                                                                                     |
+| grid_refpix_1                   | Float        | False |       |          | The reference pixel for the x axis                                                                                                                                        |
+| grid_size_2                     | Float        | False |       |          | The dimension of the grid along the y axis; expressed in pc                                                                                                               |
+| grid_shape_2                    | Float        | False |       |          | The number of pixels along the y axis                                                                                                                                     |
+| grid_refpix_2                   | Float        | False |       |          | The reference pixel for the y axis                                                                                                                                        |
+| grid_size_3                     | Float        | False |       |          | The dimension of the grid along the z axis; expressed in pc                                                                                                               |
+| grid_shape_3                    | Float        | False |       |          | The reference pixel for the z axis                                                                                                                                        |
+| grid_refpix_3                   | Float        | False |       |          | The reference pixel for the z axis                                                                                                                                        |
+| created_on                      | DateTime     | False | False |          | The timestamp of the record creation                                                                                                                                      |
+| run_id                          | text         | True  |       |          | The id of the run                                                                                                                                                         |
+
+#### grid_files
+
+This table contains the name of the fits files where the physical grids have been saved.
+
+| Column name      | Type         | PK    | FK    | FK Table        | Description                                                                              |
+|------------------|--------------|-------|-------|-----------------|------------------------------------------------------------------------------------------|
+| zipped_grid_name | varchar(150) | True  | True  | grid_parameters | The name of the compressed grid file containing all the input used to generate the model |
+| quantity         | varchar(30)  | True  | False |                 | The quantity saved in the fits file e.g. temperature, volume density, etc.               |
+| fits_grid_name   | text         | False | False |                 | The name of the fits file with the physical grid                                         |
+| created_on       | DateTime     | False | False |                 | The timestamp of the record creation                                                     |
+| run_id           | text         | True  | True  | grid_parameters | The id of the run                                                                        |
+
+#### stars_parameters
+
+In this table there are the parameters used to describe the (potential) distribution of stars in the grid.
+
+| Column name           | Type         | PK    | FK    | FK Table        | Description                                                                              |
+|-----------------------|--------------|-------|-------|-----------------|------------------------------------------------------------------------------------------|
+| zipped_grid_name      | varchar(150) | True  | True  | grid_parameters | The name of the compressed grid file containing all the input used to generate the model |
+| nstars                | integer      | False | False |                 | The number of stars to include in the grid                                               |
+| rstars                | Array(float) | False | False |                 | The radiuses of the stars in the grid (as described in the RADMC3D docs)                 |
+| mstars                | Array(float) | False | False |                 | The masses of the stars in the grid (as described in the RADMC3D docs)                   |
+| star_positions        | Array(float) | False | False |                 | The array of stellar positions                                                           |
+| star_fluxes           | Array(float) | False | False |                 | The arrays of the stellar fluxes, for each star                                          |
+| nlamdbas              | integer      | False | False |                 | The number of wavelengths to consider for the stellar spectrum                           |
+| spacing               | text         | False | False |                 |                                                                                          |
+| lambdas_micron_limits | Array(float) | False | False |                 | The spectrum limits array, expressed in microns                                          |
+| created_on            | DateTime     | False | False |                 | The timestamp of the record creation                                                     |
+| run_id                | text         | True  | True  | grid_parameters | The id of the run                                                                        |
+
+#### lines_parameters
+
+The `lines_parameters` table contains the mode used by RADMC to perform the radiative transfer.
+
+| Column name      | Type         | PK    | FK    | FK Table        | Description                                                                              |
+|------------------|--------------|-------|-------|-----------------|------------------------------------------------------------------------------------------|
+| zipped_grid_name | varchar(150) | True  | True  | grid_parameters | The name of the compressed grid file containing all the input used to generate the model |
+| lines_mode       | varchar(20)  | False | False |                 | The mode used by RADMC to perform the radiative transfer                                 |
+| created_on       | DateTime     | False | False |                 | The timestamp of the record creation                                                     |
+| run_id           | text         | True  | True  | grid_parameters | The id of the run                                                                        |
+
+#### species_and_partners
+
+This table contains the species and their partners that are used in the radiative transfer computation.
+
+| Column name                           | Type         | PK    | FK    | FK Table        | Description                                                                                    |
+|---------------------------------------|--------------|-------|-------|-----------------|------------------------------------------------------------------------------------------------|
+| zipped_grid_name                      | varchar(150) | True  | True  | grid_parameters | The name of the compressed grid file containing all the input used to generate the model       |
+| species_to_include                    | varchar(100) | True  | False |                 | A molecular species for which RT was performed                                                 |
+| molecular_abundance                   | float        | False | False |                 | The base abundance of the molecule                                                             |
+| threshold                             | float        | False | False |                 | The temperature threshold above which the temperature is assumed to evaporate from dust grains |
+| abundance_jump                        | float        | False | False |                 | The factor by which the base abundance is scaled after evaporation                             |
+| collision_partner                     | varchar(100) | True  | False |                 | A collision partner used for excitation                                                        |
+| molecular_abundance_collision_partner | float        | False | False |                 | The molecular abundance of the collision partner                                               |
+| created_on                            | DateTime     | False | False |                 | The timestamp of the record creation                                                           |
+| run_id                                | text         | True  | True  | grid_parameters | The id of the run                                                                              |
+
+#### model_pars
+
+This table includes the configuration parameters used for RADMC.
+
+| Column name         | Type         | PK    | FK    | FK Table        | Description                                                                              |
+|---------------------|--------------|-------|-------|-----------------|------------------------------------------------------------------------------------------|
+| zipped_grid_name    | varchar(150) | False | True  | grid_parameters | The name of the compressed grid file containing all the input used to generate the model |
+| fits_cube_name      | varchar(100) | True  | False |                 | A molecular species for which RT was performed                                           |
+| nphotons            | float        | False | False |                 | The number of photons used in the RT computation                                         |
+| scattering_mode_max | integer      | False | False |                 | The type of scattering considered in RADMC                                               |
+| iranfreqmode        | integer      | False | False |                 |                                                                                          |
+| tgas_eq_tdust       | integer      | False | False |                 | Whether or not the gas and dust temperatures are assumed equal                           |
+| inclination         | float        | False | False |                 | The model inclination                                                                    |
+| position_angle      | float        | False | False |                 | The model position angle                                                                 |
+| imolspec            | float        | False | False |                 | The species identifier                                                                   |
+| iline               | float        | False | False |                 | The line identifier, according to the molecular transitions descriptor file              |
+| width_kms           | float        | False | False |                 | The width of the spectrum to be computed                                                 |
+| nchannels           | float        | False | False |                 | The number of channel into which the spectrum will be divided                            |
+| npix                | float        | False | False |                 | The number of pixels used for the output image                                           |
+| created_on          | DateTime     | False | False |                 | The timestamp of the record creation                                                     |
+| run_id              | text         | True  | True  | grid_parameters | The id of the run                                                                        |
+
+#### moment_zero_maps
+
+This table contains the information on the moment-zero images and their computation.
+
+| Column name            | Type         | PK    | FK    | FK Table         | Description                                                                                       |
+|------------------------|--------------|-------|-------|------------------|---------------------------------------------------------------------------------------------------|
+| mom_zero_name          | varchar(150) | True  | False |                  | The name of the moment-zero fits file                                                             | 
+| fits_cube_name         | varchar(150) | False | True  | model_parameters | The name of the fits cube from which the moment zero was computed                                 |                               
+| integration_limit_low  | float        | False | False |                  | The lower integration limit for the computation of the moment zero; if null includes all channels |
+| integration_limit_high | float        | False | False |                  | The upper integration limit for the computation of the moment zero; if null includes all channels |
+| aggregated_moment_zero | float        | False | False |                  | The moment zero computed for the entire cube, according to aggregation_function                   |
+| aggregation_function   | varchar(20)  | False | False |                  | The function used to aggregate the pixel-values of the moment-zero maps                           |
+| created_on             | DateTime     | False | False |                  | The timestamp of the record creation                                                              |
+| run_id                 | text         | True  | True  | moment_zero_maps | The id of the run                                                                                 |
+
+#### ratio_maps
+
+This table includes the information about the ratio computation.
+
+| Column name          | Type         | PK    | FK    | FK Table         | Description                                                               |
+|----------------------|--------------|-------|-------|------------------|---------------------------------------------------------------------------|
+| ratio_map_name       | varchar(150) | True  | False |                  | The name of the fits file that contains the ratio maps                    |
+| mom_zero_name_1      | varchar(150) | False | True  | moment_zero_maps | The name of the moment-zero map constituting the numerator of the ratio   |
+| mom_zero_name_2      | varchar(150) | False | True  | moment_zero_maps | The name of the moment-zero map constituting the denominator of the ratio |
+| aggregated_ratio     | float        | False | False |                  | The value of the ratio computed over the full simulated cubes             |
+| aggregation_function | varchar(20)  | False | False |                  | The function used to compute the aggregated ratio                         |
+| created_on           | DateTime     | False | False |                  | The timestamp of the record creation                                      |
+| run_id               | text         | True  | True  | moment_zero_maps | The id of the run                                                         |
+
+#### tmp_execution_queue
+
+An auxiliary table used to efficiently parallelize computation in a distributed environment.
+
+| Column name              | Type    | PK    | FK    | FK Table | Description                                                                                          |
+|--------------------------|---------|-------|-------|----------|------------------------------------------------------------------------------------------------------|
+| row_id                   | integer | False | False |          | A row counter                                                                                        |
+| run_id                   | text    | True  | False |          | The id of the run                                                                                    |
+| dust_temperature         | float   | True  | False |          | The characteristic dust temperature of the model                                                     |
+| density                  | float   | True  | False |          | The characteristic number density temperature of the model                                           |
+| line                     | integer | True  | False |          | The line ID                                                                                          |
+| density_keyword          | text    | True  | False |          | The column used in the query to select the input (central_density or density_at_reference)           |
+| dust_temperature_keyword | text    | True  | False |          | The column used in the query to select the input (dust_temperature or dust_temperature_at_reference) |
+| fits_cube_name           | text    | False | False |          | The name of the associated fits cube                                                                 |
+| done                     | bool    | False | False |          | Whether the image was processed already                                                              |
\ No newline at end of file
diff --git a/etl/assets/commons/__init__.py b/etl/assets/commons/__init__.py
index fe82e70163f4ce868e1e78d74e4d47e3a11bba52..3e5f5a68e379237f4c10b337b37ff372a0da55c3 100644
--- a/etl/assets/commons/__init__.py
+++ b/etl/assets/commons/__init__.py
@@ -201,6 +201,16 @@ def get_postprocessed_data(limit_rows: Union[None, int] = None,
 def prepare_matrix(filename: str,
                    columns: list,
                    use_model_for_inference: Union[None, str] = None) -> pd.DataFrame:
+    """
+    Retrieve and prepare the data matrix from a specified file and columns.
+
+    :param filename: The name of the file to read the data from.
+    :param columns: The list of columns to extract from the dataframe.
+    :param use_model_for_inference: The folder within prs/output/run_type to get the data for inference;
+        defaults to the fiducial model ('constant_abundance_p15_q05') if None is provided.
+    :return: A pandas DataFrame containing the specified columns from the file with 'nh2' and 'tdust' columns
+        rounded to one decimal place and converted to string type.
+    """
     _use_model_for_inference = validate_parameter(
         use_model_for_inference,
         default='constant_abundance_p15_q05'
@@ -214,6 +224,18 @@ def prepare_matrix(filename: str,
 def get_data(limit_rows: Union[int, None] = None,
              use_model_for_inference: Union[None, str] = None,
              log_columns: Union[None, List] = None):
+    """
+    Retrieve and preprocess dataset.
+
+    :param limit_rows: The number of rows to use from the original dataset; useful for running tests and limiting
+        computation time. Defaults to None, which uses all rows.
+    :param use_model_for_inference: The folder within prs/output/run_type to get the data for inference;
+        defaults to the fiducial model ('constant_abundance_p15_q05') if None is provided.
+    :param log_columns: The list of columns to apply a logarithmic transformation to. Defaults to
+        ['log_nh2', 'log_tdust', 'avg_nh2', 'avg_tdust', 'molecule_column_density', 'std_nh2'] if None is provided.
+    :return: A pandas DataFrame containing the merged and processed data from multiple sources, with specified
+        columns logarithmically transformed.
+    """
     _use_model_for_inference = validate_parameter(
         use_model_for_inference,
         default='constant_abundance_p15_q05'
diff --git a/etl/assets/commons/training_utils.py b/etl/assets/commons/training_utils.py
index e8c30b8eb08e1540f3fb906d001b1bd32cda40b3..c65832dd4dc9b7d0dc8b10f178ea89e7cd6cb1a2 100644
--- a/etl/assets/commons/training_utils.py
+++ b/etl/assets/commons/training_utils.py
@@ -48,6 +48,17 @@ def compute_and_add_similarity_cols(average_features_per_target_bin: pd.DataFram
 def plot_results(inferred_data: pd.DataFrame,
                  use_model_for_inference: str = None,
                  ratios_to_process: Union[List[List[str]], None] = None):
+    """
+        Plot the results of inferred data against postprocessed data for specified line ratios, in oder to quickly check
+         the results.
+
+        :param inferred_data: The DataFrame containing the inferred data to plot.
+        :param use_model_for_inference: The folder within prs/output/run_type to get the data for inference;
+            defaults to the fiducial model ('constant_abundance_p15_q05') if None is provided.
+        :param ratios_to_process: The list of line ratios to plot, each specified as a list of two strings.
+            Defaults to [['87', '86'], ['88', '87'], ['88', '86'], ['257', '256'], ['381', '380']] if None is provided.
+        :return: None. Saves the plot as a PNG file named according to the model used for inference.
+        """
     _use_model_for_inference = validate_parameter(
         use_model_for_inference,
         default='constant_abundance_p15_q05'
@@ -234,9 +245,11 @@ def split_data(merged: pd.DataFrame,
         6.561e+06
     ]
     subsample = merged[
-        merged['nh2'].isin(nh2_list) & ~merged['nh2'].isin(_test_models) & ~merged['nh2'].isin(_validation_models)]
-    assert _test_models not in subsample.nh2.unique()
-    assert _validation_models not in subsample.nh2.unique()
+        merged['nh2'].isin(nh2_list) & (~merged['nh2'].isin(_test_models)) & (~merged['nh2'].isin(_validation_models))]
+    for nh2_test in _test_models:
+        assert nh2_test not in list(subsample['nh2'].round(1).unique())
+    for nh2_validation in _validation_models:
+        assert nh2_validation not in list(subsample['nh2'].round(1).unique())
     y_sub = np.log10(subsample[target_column].copy())
     x_sub = subsample[_predictor_columns].copy()
     x_train = x_sub[(~condition_test) & (~condition_validation)].reset_index(drop=True)
diff --git a/etl/main.py b/etl/main.py
index fd74caa900b5d065d77c48d03e118bb25f04debe..0379995231f9626b2632038f30abd2c8bb67afa0 100644
--- a/etl/main.py
+++ b/etl/main.py
@@ -19,7 +19,22 @@ from prs.prs_compute_integrated_fluxes_and_ratios import main as prs_main
 from prs.prs_inspect_results import main as prs_inspection_main
 
 
-def compute_full_grid(tdust, nh2, line, density_keyword, dust_temperature_keyword) -> Tuple[float, float, int, str]:
+def compute_full_grid(tdust: float,
+                      nh2: float,
+                      line: int,
+                      density_keyword: str,
+                      dust_temperature_keyword: str) -> Tuple[float, float, int, str]:
+    """
+        Compute the full grid for a given dust temperature, hydrogen density, and line identifier, and return the results.
+
+        :param tdust: The dust temperature.
+        :param nh2: The H2 number density.
+        :param line: The line identifier for the RADMC-3D observation.
+        :param density_keyword: The keyword for the density in the grid configuration.
+        :param dust_temperature_keyword: The keyword for the dust temperature in the grid configuration.
+        :return: A tuple containing the dust temperature, H2 number density, line identifier, and the name of the
+            resulting FITS file.
+    """
     scratch_dir = os.path.join('mdl', 'scratches', str(uuid.uuid4()))
     stg_overrides = {
         'grid': {
@@ -48,6 +63,15 @@ def compute_full_grid(tdust, nh2, line, density_keyword, dust_temperature_keywor
 def initialize_queue(engine: sqlalchemy.engine,
                      run_id: str,
                      run_arguments: Iterator):
+    """
+        Initialize the execution queue for a specific run ID with given run arguments if not already initialized.
+
+        :param engine: The SQLAlchemy engine used to interact with the database.
+        :param run_id: The unique identifier for the run.
+        :param run_arguments: An iterator of run arguments, each containing dust temperature, density, line,
+            density keyword, and dust temperature keyword.
+        :return: None. Inserts entries into the execution queue if the queue is not already initialized.
+    """
     is_initialized = engine.execute(f"select count(*) from tmp_execution_queue where run_id='{run_id}'").first()[0] != 0
     if is_initialized is False:
         for arguments in run_arguments:
@@ -75,6 +99,14 @@ def initialize_queue(engine: sqlalchemy.engine,
 
 def get_run_pars(engine: sqlalchemy.engine,
                  run_id: str):
+    """
+        Get and mark the next pending row from the execution queue associated with the given run ID as done.
+
+        :param engine: The SQLAlchemy engine used to interact with the database.
+        :param run_id: The unique identifier for the run.
+        :return: The row corresponding to the next pending task in the execution queue not marked as done, or None if there
+            are no more pending tasks.
+    """
     sql_query = sqlalchemy.text(f"""UPDATE tmp_execution_queue 
                 SET done = true 
                 WHERE row_id = (SELECT row_id 
@@ -89,6 +121,14 @@ def get_run_pars(engine: sqlalchemy.engine,
 
 def verify_run(engine: sqlalchemy.engine,
                run_id: str):
+    """
+        Verify the completion status of a run by resetting completed but unfinished tasks to pending.
+
+        :param engine: The SQLAlchemy engine used to interact with the database.
+        :param run_id: The unique identifier for the run.
+        :return: True if all completed tasks for the run have associated FITS cube names, indicating completion,
+            otherwise False.
+    """
     sql_query = sqlalchemy.text(f"""UPDATE tmp_execution_queue 
                 SET done = false
                 WHERE row_id in (SELECT row_id 
@@ -104,6 +144,14 @@ def verify_run(engine: sqlalchemy.engine,
 def insert_fits_name(engine: sqlalchemy.engine,
                      row_id: int,
                      fits_cube_name: str):
+    """
+        Insert the FITS cube name into the row of the execution queue with the specified row ID.
+
+        :param engine: The SQLAlchemy engine used to interact with the database.
+        :param row_id: The unique identifier for the row in the execution queue.
+        :param fits_cube_name: The name of the FITS cube associated with the row.
+        :return: None. Updates the row in the execution queue with the FITS cube name.
+    """
     sql_query = sqlalchemy.text(f"""UPDATE tmp_execution_queue 
                 SET fits_cube_name = '{fits_cube_name}'
                 WHERE row_id = {row_id}""")
@@ -111,6 +159,12 @@ def insert_fits_name(engine: sqlalchemy.engine,
 
 
 def compute_grid_elements(run_id: str):
+    """
+        Compute grid elements for a given run ID by initializing the execution queue with parallel arguments.
+
+        :param run_id: The unique identifier for the run.
+        :return: None. Initializes the execution queue for the specified run ID.
+    """
     init_db()
     parallel_args, _ = get_parallel_args_and_nprocesses()
     engine = get_pg_engine(logger=logger)
@@ -121,6 +175,11 @@ def compute_grid_elements(run_id: str):
 
 
 def get_parallel_args_and_nprocesses() -> Tuple[Iterator, int]:
+    """
+        Get parallel computation arguments and the number of processes for computation.
+
+        :return: A tuple containing an iterator of parallel arguments and the number of processes to use.
+    """
     _tdust_model_type, _model_type, dust_temperatures, densities, line_pairs, n_processes, _ = parse_input_main()
     line_set = set(chain.from_iterable(line_pairs))
     density_keyword = 'central_density' if _model_type == 'homogeneous' else 'density_at_reference'
@@ -130,6 +189,13 @@ def get_parallel_args_and_nprocesses() -> Tuple[Iterator, int]:
 
 
 def compute_model(run_id: str):
+    """
+        Compute a model associated with a given run ID from parameters retrieved from the execution queue.
+
+        :param run_id: The unique identifier for the run.
+        :return: None. Computes a model and updates the database with the associated FITS cube name if parameters are
+            available in the execution queue.
+    """
     engine = get_pg_engine(logger=logger)
     parameters_set = get_run_pars(engine=engine,
                                   run_id=run_id)
@@ -150,6 +216,12 @@ def compute_model(run_id: str):
 
 
 def initialize_run():
+    """
+        Initialize a new run by generating a run ID if not provided, computing grid elements for the run,
+        and saving the run ID to a file for future reference.
+
+        :return: The generated or provided run ID.
+    """
     if args.run_id is not None:
         run_id = args.run_id
     else:
@@ -163,6 +235,13 @@ def initialize_run():
 
 
 def compute_remaining_models(run_id: Union[None, str] = None) -> int:
+    """
+        Compute the number of pending models for a given run ID.
+
+        :param run_id: Optional. The unique identifier for the run. If None, it defaults to the value of the
+            'run_id' environment variable.
+        :return: The number of remaining models.
+    """
     _run_id = validate_parameter(run_id, default=os.getenv('run_id'))
     logger.info(_run_id)
     sql_query = sqlalchemy.text(f"""SELECT count(*)
@@ -179,6 +258,14 @@ def compute_remaining_models(run_id: Union[None, str] = None) -> int:
 
 def get_results(engine: sqlalchemy.engine,
                 run_id: str):
+    """
+        Retrieve the results of a given run ID from the execution queue.
+
+        :param engine: The SQLAlchemy engine used to interact with the database.
+        :param run_id: The unique identifier for the run.
+        :return: A list of tuples containing the dust temperature, density, line, and FITS cube name for each model
+            associated with the run ID.
+    """
     sql_query = sqlalchemy.text(f"""SELECT dust_temperature
                                            , density
                                            , line
@@ -190,6 +277,13 @@ def get_results(engine: sqlalchemy.engine,
 
 def cleanup_tmp_table(run_id: str,
                       engine: sqlalchemy.engine):
+    """
+        Cleanup the temporary execution queue for a given run ID.
+
+        :param run_id: The unique identifier for the run.
+        :param engine: The SQLAlchemy engine used to interact with the database.
+        :return: None. Deletes all rows from the execution queue associated with the specified run ID.
+    """
     sql_query = sqlalchemy.text(f"""DELETE
                                     FROM tmp_execution_queue
                                     WHERE run_id = '{run_id}'""")
@@ -246,6 +340,12 @@ def main_presentation_step(run_id: str,
 
 
 def process_models(distributed: bool = False) -> Tuple[Union[None, dict], int]:
+    """
+        Process models either in a distributed environment or on a single machine (with multiprocessing).
+
+        :param distributed: A boolean flag indicating whether to process models in parallel. Defaults to False.
+        :return: A tuple containing results (if processed in parallel) and the number of remaining models.
+    """
     if distributed is True:
         compute_model(run_id=run_id)
         results = None
diff --git a/etl/prs/prs_density_inference.py b/etl/prs/prs_density_inference.py
index c52911f414684fc18f5abd4f7e63b0c8f5d5da0a..de16b713fc6a8600f716387e946dd93b04b6ffa9 100644
--- a/etl/prs/prs_density_inference.py
+++ b/etl/prs/prs_density_inference.py
@@ -91,13 +91,27 @@ def train_kde_model(ratio: List[str],
     return ratio_string, model
 
 
-def get_kde(points_per_axis,
-            ratio_string,
-            training_data,
-            x=None,
-            y=None,
+def get_kde(points_per_axis: int,
+            ratio_string: str,
+            training_data: pd.DataFrame,
+            x: np.array = None,
+            y: np.array = None,
             best_bandwidth: float = None,
-            bw_adjustment_factor: Union[float, int] = 1):
+            bw_adjustment_factor: Union[float, int] = 1) -> tuple:
+    """
+        Compute the Kernel Density Estimate (KDE) for a given ratio and training data.
+
+        :param points_per_axis: Number of points to use along each axis for the KDE grid.
+        :param ratio_string: The ratio string indicating which ratio of the training data to use.
+        :param training_data: The DataFrame containing the training data.
+        :param x: Optional. The x-axis values for the KDE grid. Defaults to a computed range if None.
+        :param y: Optional. The y-axis values for the KDE grid. Defaults to a computed range if None.
+        :param best_bandwidth: The best bandwidth to use for KDE. Defaults to 0.2 if None.
+        :param bw_adjustment_factor: The adjustment factor to apply to the bandwidth. Defaults to 1.
+        :return: A tuple containing the grid, the KDE model, the positions, x-axis values, y-axis values, and the
+            computed KDE values.
+    """
+
     _best_bandwidth = bw_adjustment_factor * validate_parameter(best_bandwidth, 0.2)
     _x_bandwidth = bw_adjustment_factor * 0.2
     log_nh2 = np.log10(training_data['avg_nh2'])
@@ -128,6 +142,19 @@ def plot_kde_ratio_nh2(grid: np.array,
                        training_data: pd.DataFrame,
                        suffix_outfile: str = None,
                        ratio_limits: Union[None, list] = None):
+    """
+        Plot the Kernel Density Estimate (KDE) of a ratio against average H2 density along the line-of-sight and save
+         the plot as a PNG file.
+
+        :param grid: The grid of x and y values used for the KDE.
+        :param values_on_grid: The computed KDE values on the grid.
+        :param ratio_string: The ratio string indicating which ratio of the training data to plot.
+        :param model_root_folder: The root folder where the model and figures are stored.
+        :param training_data: The DataFrame containing the training data.
+        :param suffix_outfile: Optional. The suffix to append to the output file name. Defaults to an empty string if None.
+        :param ratio_limits: Optional. The limits for the ratio axis. Defaults to None, which auto-scales the axis.
+        :return: None. Saves the plot as a PNG file in the specified folder.
+    """
     plt.rcParams.update({'font.size': 20})
     _suffix_outfile = validate_parameter(suffix_outfile, default='')
     plt.clf()
diff --git a/etl/stg/stg_radmc_input_generator.py b/etl/stg/stg_radmc_input_generator.py
index e73878057d1a4482bafef27f86e7cd79caf243ff..e425a92a3d4877e308b95cc91616cc05eab40e47 100644
--- a/etl/stg/stg_radmc_input_generator.py
+++ b/etl/stg/stg_radmc_input_generator.py
@@ -37,6 +37,17 @@ def write_radmc_input(filename: str,
                       path: Union[None, str] = None,
                       override_defaults: Union[None, dict] = None,
                       flatten_style: Union[None, str] = None):
+    """
+        Write RADMC-3D input files with the specified grid metadata and quantities.
+
+        :param filename: The name of the file to write.
+        :param quantity: The array of quantities to be written to the file.
+        :param grid_metadata: A dictionary containing metadata about the grid.
+        :param path: Optional. The directory path where the file will be saved. Defaults to the current directory if None.
+        :param override_defaults: Optional. A dictionary to override default header values. Defaults to None.
+        :param flatten_style: Optional. The style to flatten the array (Fortran 'F' or C 'C' order). Defaults to 'F'.
+        :return: None. Writes the formatted data to the specified file.
+    """
     rt_metadata_default = {
         'iformat': 1,
         'grid_type': grid_metadata['grid_type'],
@@ -65,6 +76,14 @@ def write_radmc_input(filename: str,
 def write_radmc_lines_input(line_config: dict,
                             logger: logging.Logger,
                             path: Union[None, str] = None):
+    """
+        Write the 'lines.inp' input file for RADMC-3D based on the provided line configuration.
+
+        :param line_config: A dictionary containing the line configuration, including mode, species, and collision partners.
+        :param logger: A logging.Logger instance for logging warnings and information.
+        :param path: Optional. The directory path where the file will be saved. Defaults to the current directory if None.
+        :return: None. Writes the 'lines.inp' file based on the configuration.
+    """
     _path = validate_parameter(path, default='.')
 
     if line_config['lines_mode'] != 'lte':
diff --git a/etl/tests/test_commons.py b/etl/tests/test_commons.py
index 7901f98727e606d66190c209396a564b1c904ae0..e2baf927ea4da1c6304ea4d8e30d294ad43404e0 100644
--- a/etl/tests/test_commons.py
+++ b/etl/tests/test_commons.py
@@ -18,7 +18,8 @@ from assets.commons.grid_utils import (get_grid_edges,
                                        compute_los_average_weighted_profile)
 from assets.commons.parsing import (get_grid_properties,
                                     parse_grid_overrides)
-from assets.commons.training_utils import compute_and_add_similarity_cols
+from assets.commons.training_utils import (compute_and_add_similarity_cols,
+                                           split_data)
 
 
 def create_test_config(config_dict: dict,
@@ -429,3 +430,23 @@ class TestTraining(TestCase):
                 np.nan_to_num(expected_result.round(5), nan=0)
             )
         )
+
+    def test_split_data(self):
+        merged = pd.DataFrame(
+            data=[
+                [2.7e4, 1, 1, 2],
+                [2.187e6, 2, 2, 4],
+                [1e3, 3, 3, 6],
+                [1e4, 4, 4, 8],
+                [1e5, 5, 5, 10],
+                [1e6, 6, 6, 12],
+                [1e7, 7, 7, 14]
+            ],
+            columns=['nh2', 'tdust', 'predictor', 'target'])
+        x_test, x_train, x_validation, y_test, y_train, y_validation = split_data(
+            merged=merged,
+            target_column='target',
+            predictor_columns=['nh2', 'tdust', 'predictor']
+        )
+        self.assertListEqual(list((10**x_test['nh2'].unique()).round(1)), [2.7e4])
+        self.assertListEqual(list((10**x_validation['nh2'].unique()).round(1)), [2.187e6])