Skip to content
Snippets Groups Projects
Commit 3d06210e authored by Marco Molinaro's avatar Marco Molinaro
Browse files

initial commit

parents
No related branches found
No related tags found
No related merge requests found
FROM registry.fedoraproject.org/fedora:latest
RUN dnf -y update &&\
dnf -y install httpd python3-mod_wsgi pip &&\
dnf clean all &&\
pip install pandas tables Pyro4
COPY wsgi.conf /etc/httpd/conf.d/
EXPOSE 80
ENTRYPOINT /usr/sbin/httpd -DFOREGROUND
README.md 0 → 100644
# SEDModS -- VLKB SED Models web service
This service is meant to provide filtering on top of the SED Models
(big, 30M rows) catalogue table. The table was initially served as a
RDBMS table and queried through SQL directly.
This work moves it onto a python+HDF5 solution.
The service is deployed as a python+WSGI on httpd, within a podman
container to help maintenance.
## Deploy SEDModS
### Pull repo
When pulling this repo you'll end up with
|-- Dockerfile (container description)
|-- README.md (this text)
|-- sed-data
| `-- link.to.hdf5 (dummy data file, see later)
|-- wsgi.conf (WSGI configuration for the http server)
`-- wsgi-scripts (actual service business logic)
|-- hdf_query.py
|-- query-server_d.py
|-- wsgid.py
`-- wsgi.py
### Project content description
The `Dockerfile` cotains instructions to build the container that
includes:
- an httpd server (running on container's port 80)
- python and WSGI packages to be able to run the service
It needs a properly configured `wsgi.conf` that will be loaded within
the `http/conf.d` of the container.
The base container is a _fedora:latest_ environment.
Besides the instructions to build the container and the WSGI
configuration, the container, to properly run, needs two further pieces,
two folders:
- the `sed-data` one that contains the SED Models HDF5 file
- the `wsgi-scripts` that actually contains the business logic of the
service itself
(The actual HDF5 file is not kept in this repo, because it's too large
and because it's not a wise choice to store byte-blobs in a versioning
system).
The `wsgi-scripts` is provided and maintained in this repo, the
`sed-data` one can be anywhere else, in reach of the host system.
If the location of these these pieces has to be changed, it suffices to
update accordingly the run command of the container (see below).
### Build the container image
To build the container image, simply run
podman build --tag vlkb-sedmods -f ./Dockerfile
The tag can be one of your choice. It is suggested to have a dedicated
user to run this in production.
### Run the container
Once the podman image is ready and the two directories are in place, to
run the container it suffices a command like
podman run -dt \
--name sedmod-test \
--rm \
-v $PWD/sed-data:/sed-data \
-v $PWD/wsgi-scripts:/var/www/wsgi-scripts \
-p 8080:80 vlkb-sedmods
where the name is optional and of your choice and the left-hand side of
the _:_ in the _-v_ arguments must point to the actual location of the
two folders to be mounted in the container.
In the example the _-v_ arguments are considered such as the command is
run from the local working copy and the HDF5 SED Models file is actually
within the `sed-data` folder.
Also, _-p_ maps the 80 port of the container onto the 8080 port on the
host server, this must be changed if the host's 8080 is already in use.
### Service(s) in the container
Two flavours of the _SEDModS_ service can run with the repo content:
- one that reads the HDF5 at each query
- one that reads the HDF5 once and works as a _system daemon_
#### Single query service mode
This mode is directly available when the cotainer is run. It uses the
following code files:
wsgi.py
hdf_query.py
If you run on the host server
curl localhost:8080/sedsearch/?'clump_mass<10.005' > output.dat
you should get the response in the `output.dat` file, or you can point
the browser to
http://host.server:8080/sedsearch/?'clump_mass<10.005'
and see the response directly.
#### Daemon service mode
This mode uses the code in:
wsgid.py
query_server_d.py
It requires a couple of processes to run before the deamonised service
can work. These processes run within the container, so, after running
it, one can launch them attaching to the running container with
podman exec -it sedmod-test /bin/bash
and within it run
python -m Pyro4.naming &
python query_server_d.py &
After that, on can exit the shell and the daemon-based service should be
reachable at
http://host.server:8080/seddaemon
with the same usage of the single query one.
### Network Proxy
Since it could be annoying to explitly use port 8080 on the host, the
service can be made visible on a specific context path in the host
server's http using the httpd _ProxyPass_ directive, like
<Location "/sedmods">
ProxyPass "http://localhost:8080"
</Location>
where _/sedmods_ is an example and the _8080_ port depends on the passed
parameters to the podman run command (see above).
## SED Models HDF5 file
This is preserved, currently, on the INAF ICT Owncloud instance.
## Dependencies
On the host:
- podman
- httpd
Within the container (i.e. provided in the build):
- httpd with python3-mod\_wsgi
- python 3.x
- pandas
- Pyro4 (deamon mode)
- (py)tables
This is a placeholder.
Put here the actual needed HDF5 or remember to run the container with
the appropriate path to the file.
#!/usr/bin/env python3
from urllib.parse import unquote
import pandas as pd
def query_out(parameters):
parsequery=parameters.replace(' ', '')
query1=parsequery.replace('%27', '')
query_final=unquote(query1)
table=pd.read_hdf('/sed-data/vlkb_1.h5')
myquery=table.query(query_final)
return myquery
# test mod_wsgi app
def application(environ, start_response):
status = '200 OK'
output = b'Ciao Mondo!'
getstring = environ['QUERY_STRING']
output += getstring.encode('utf-8')
response_headers = [('Content-type', 'text/plain'),
('Content-Length', str(len(output)))]
start_response(status, response_headers)
return [output]
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Fri Mar 4 15:06:40 2022
@author: smordini
"""
import Pyro4
from urllib.parse import unquote
import pandas as pd
@Pyro4.expose
class QueryMaker(object):
dataset=pd.read_hdf('vlkb_1.h5')
def query_out(self, parameters):
dataset=pd.read_hdf('vlkb_1.h5')
parsequery=parameters.replace(' ', '')
query1=parsequery.replace('%27', '')
query_final=unquote(query1)
myquery=QueryMaker.dataset.query(query1)
test1=myquery.to_json(orient='split')
test2=str(test1)
output=bytes(test2,'utf-8')
return output
daemon=Pyro4.Daemon()
ns=Pyro4.locateNS()
uri=daemon.register(QueryMaker)
ns.register("test.query", uri)
print("Ready. Object uri=", uri)
daemon.requestLoop()
#!/usr/bin/env python3
import sys
import pandas
sys.path.insert(0,"/var/www/wsgi-scripts/")
from hdf_query import query_out
def application(environ, start_response):
status = '200 OK'
var1 =str( environ['QUERY_STRING'])
test=query_out(var1)
test1=test.to_json(orient='split')
test2=str(test1)
# test1=str(test)
output=bytes(test2,'utf-8')
output1 = b'Hello beautiful World!'
getstring = environ['QUERY_STRING']
# test += getstring.encode('utf-8')
response_headers = [('Content-type', 'text/plain'),
('Content-Length', str(len(output)))]
# response_headers = [('Content-Disposition', 'attachment; filename= export.csv')]
# test.headers['Content-Disposition']='attachment; filename= export.csv'
# test.headers['Content-type']= 'text/csv'
start_response(status, response_headers)
return [output]
#!/usr/bin/env python3
import sys
import pandas
sys.path.insert(0,"/var/www/html/")
from parquet_query import query_out
import Pyro4
def application(environ, start_response):
status = '200 OK'
query_in =str( environ['QUERY_STRING'])
query_maker=Pyro4.Proxy("PYRONAME:test.query")
output=query_maker.query_out(query_in)
response_headers = [('Content-type', 'text/plain'),
('Content-Length', str(len(output)))]
# response_headers = [('Content-Disposition', 'attachment; filename= export.csv')]
# test.headers['Content-Disposition']='attachment; filename= export.csv'
# test.headers['Content-type']= 'text/csv'
start_response(status, response_headers)
return [output]
<VirtualHost *:80>
WSGIApplicationGroup %{GLOBAL}
# Configuration for various python/WSGI tests
# Basic Hello World app alias - at least this one must work
WSGIScriptAlias /myapp /var/www/wsgi-scripts/myapp.wsgi
# Plain SED models HDF5 query
WSGIScriptAlias /sedsearch /var/www/wsgi-scripts/wsgi.py
# Daemon based SED models HDF5 query
# WSGIScriptAlias /seddeamon /var/www/wsgi-scripts/wsgid.py
# Directory to deploy the python part into
<Directory /var/www/wsgi-scripts>
<IfVersion < 2.4>
Order allow,deny
Allow from all
</IfVersion>
<IfVersion >= 2.4>
Require all granted
</IfVersion>
</Directory>
</VirtualHost>
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment