Sundew pull migration to sarracenia (PXATX)
- Manual section:
1
- Date:
@Date@
- Version:
@Version@
- Manual group:
MetPx Sarracenia Suite
DESCRIPTION
This document suppose that the reader is familiar with the concepts and usage of sundew and sarracenia.
sundew receiver supports a pull mechanism that allow querying a remote server for new products to be ingested in the sundew product flow. There are two types namely pull-file and pull-bulletin. The difference between them is the way in which the product is processed/routed once downloaded.
sarracenia supports the same mechanism. sr-poll announces the new products from the remote site and sr-sarra makes the products available, downloading and annoncing the product locally.
It is fairly straight forward to convert a pull reveiver configuration file to both an sr-poll and sr-sarra that play the same role. There is a new concept in sarracenia where the source of the product needs to be specified in the path of the file tree.
Fortunately, in the context of this PXATX conversion, in our sundew system, the products are placed properly in a sarracenia tree and announced with amqp under a defined source directory. We will use these sources informations.
This document is solely based on the PXATX experience and so one should take the ideas and apply them to his/her context.
METHODOLOGY
I did this document using a very simple sundew pull receiver to make sure to put just the right amount of details.
First set up a conversion environment. Where sarracenia is downloaded, make sure you can use the bash script pull_2_pollsarra.sh by updating your PATH environment variable. Use git clone to get an updated version of sarracenia ( See Dev ). Make sarracenia tools available for direct shell commands:
export PATH=...wherever.../sarracenia/tools:$PATH
Define a place where you want to convert pxatx pull:
mkdir -p convert/pxatx
cd convert/pxatx
Get all the sundew configurations of sundew pxatx:
scp -r px@pxatx1-ops:/apps/px/etc .
Here we pick the configuration pull-BC-ENV_AQ-WAMR.conf, and proceed to its converstion:
cd etc/rx
pull_2_pollsarra.sh pull-BC-ENV_AQ-WAMR.conf
The original file looks like this
#
# STATUS: Operational
#
# DESCRIPTION: pull Wamr data from British Columbia Ministry of Environment (BC MoE)
#
# CONTACT: BC MoE contact: AQHIDSI@Victoria1.gov.bc.ca
#
#
type pull-file
routemask true
routing_version 1
routingTable /apps/px/etc/pdsRouting.conf
protocol ftp
host fake.host.gc.ca
user fakeuser
password fakepass
delete False
timeout_get 90
pull_sleep 180
extension pull-BC-ENV_AQ-WAMR:GOV-BC:WAMR:3:ASCII
directory /pub/outgoing/WAMR/EarthNetworks2/
get .*.lsi
# generate key with accept
accept .*(lsi:pull-BC-ENV_AQ-WAMR:GOV-BC:WAMR:3:ASCII).*
The script created these files:
ls credentials *BC-ENV_AQ-WAMR.conf
credentials
pull-BC-ENV_AQ-WAMR.conf
poll_pull-BC-ENV_AQ-WAMR.conf
sarra_get_pull-BC-ENV_AQ-WAMR.conf
My personal renaming convention is to rename the files
mv poll_pull-BC-ENV_AQ-WAMR.conf BC_ENV_AQ_WAMR.conf
mv sarra_get_pull-BC-ENV_AQ-WAMR.conf get_BC_ENV_AQ_WAMR.conf
So now we have the sr_poll BC_ENV_AQ_WAMR.conf and the sr_sarra get_BC_ENV_AQ_WAMR.conf.
SR_POLL CONFIG
The generated sr_poll config looks like this: cat BC_ENV_AQ_WAMR.conf:
#
# STATUS: Operational
#
# DESCRIPTION: pull Wamr data from British Columbia Ministry of Environment (BC MoE)
#
# CONTACT: BC MoE contact: AQHIDSI@Victoria1.gov.bc.ca
#
#
# on doit avoir le vip de ddsr.cmc.ec.gc.ca
vip 142.135.12.146
# post_broker is DDSR spread the poll messages
post_broker amqp://SOURCE@ddsr.cmc.ec.gc.ca/
post_exchange xs_SOURCE
# options
sleep 180
timeout 90
# to useless... left for backward compat
to DDSR.CMC,DDI.CMC,CMC,SCIENCE,EDM
# where to get the products
pollUrl ftp://fakeuser:fakepass@fakehost.gc.ca
#where/how to get the products
path /pub/outgoing/WAMR/EarthNetworks2/
# generate key with accept
accept .*(lsi:pull-BC-ENV_AQ-WAMR:GOV-BC:WAMR:3:ASCII).*
# usually no accept... in sr_poll
reject .*
The follows all the original option of the sundew pull as a reference. To continue we need to know what product is ingested by that pull:
ssh px@pxatx1-ops grep Ingested /apps/px/log/rx_pull-BC-ENV_AQ-WAMR.log
We find that one of the product “today” is 29_05_2019_04_25.lsi:pull-BC-ENV_AQ-WAMR:GOV-BC:WAMR:3:ASCII Lets try to find it on pxatx sarracenia side how it is announced:
ssh sarra@data-lb-ops1 'cd master/pxatx; srl grep 29_05_2019_04_25.lsi \*.log'
Just picking one of the notice leads us to this place
20190529/PROVINCIAL/BC-ENV_AQ-WAMR/12/29_05_2019_04_25.lsi:pull-BC-ENV_AQ-WAMR:GOV-BC:WAMR:3:ASCII
By convention the directory after the date is the name of the SOURCE for these products. So here PROVINCIAL is used as an amqp source user for announcement and as one of the top directory leaf for its products With theses informations we can finalized the sr_poll config
vi BC_ENV_AQ_WAMR.conf
change
post_broker amqp://SOURCE@ddsr.cmc.ec.gc.ca
post_exchange xs_SOURCE**
for
post_broker amqp://PROVINCIAL@ddsr.cmc.ec.gc.ca
post_exchange xs_PROVINCIAL
The destination put by the script always contain all the credentials. So we just edit to keep protocol://user#host:
change
pollUrl ftp://fakeuser:fakepass@fake.host.gc.ca
for
pollUrl ftp://fakeuser@fake.host.gc.ca
Starting at comment # where to get the products down to the end of the file, the script attempted to reproduce the directory, get and accept/reject options as in the original. And finally it placed all the options of the original file as reference. Make sure the sr_poll config is reflecting the original sundew one Get rid of duplicated options, scrutening the rest of the file. It is not our case here but if there are reject options in this config keep them. For accept option, you dont really need them since option get plays the same role:
remove
accept .*(lsi:pull-BC-ENV_AQ-WAMR:GOV-BC:WAMR:3:ASCII).*
After, change the get for accept. So a cleaned version of the last lines of the sr_poll config would be:
# where to get the products
pollUrl ftp://fakeuser@fake.host.gc.ca
# product source directories
path /pub/outgoing/WAMR/EarthNetworks2/
accept .*\.lsi
SR_SARRA CONFIG
The generated sr_sarra config looks like this: cat get_BC_ENV_AQ_WAMR.conf:
#
# STATUS: Operational
#
# DESCRIPTION: pull Wamr data from British Columbia Ministry of Environment (BC MoE)
#
# CONTACT: BC MoE contact: AQHIDSI@Victoria1.gov.bc.ca
#
#
# source
instances 1
# receives messages from same DDSR queue spreads the messages
broker amqp://feeder@ddsr.cmc.ec.gc.ca/
exchange xs_SOURCE
# listen to spread the poll messages
prefetch 10
queue_name q_feeder.${PROGRAM}.${CONFIG}.SHARED
sourceFromExchange True
# what to do with product
mirror False
preserve_time False
# MG CHECK DELETE
#delete False
delete False
# directories
directory ${PBD}/${YYYYMMDD}/${SOURCE}/--${0}-- to be determined ----
accept .*(something).*
# destination
post_broker amqp://feeder@localhost/
post_exchange xpublic
post_baseUrl http://${HOSTNAME}
post_baseDir /apps/sarra/public_data
Again we need to adjust to the SOURCE value which is PROVINCIAL:
vi get_BC_ENV_AQ_WAMR.conf
change
exchange xs_SOURCE
for
exchange xs_PROVINCIAL
A special attention must be given to the delete option. If the sundew pull configuration is deleting the products once downloaded, to test our sr_sarra process we must not delete products. By default, the script writes
# MG CHECK DELETE
#delete value
delete False
Where value is the setting of the delete option in the sundew pull. The sr_sarra configuration, when ready, can be tested without deletion. When placed in operation, and the sundew pull withdrawn, if the delete option should be true just delete the ‘delete False’ and uncomment the ‘delete True’.
To have the proper directory, accept settings (there might be more than one), we want to search how the products are disposed on the sarracenia side. Because it is sundew processes that mimic sarracenia we find theses informatios in the sundew senders:
% grep PROVINCIAL/BC-ENV_AQ-WAMR ../tx/*
% tx/ddsr-PROVINCIAL.inc:directory //apps/sarra/public_data/${RYYYY}${RMM}${RDD}/PROVINCIAL/BC-ENV_AQ-WAMR/${RHH}
And looking for the conplete configuration setting for these products in this include file we get:
directory //apps/sarra/public_data/${RYYYY}${RMM}${RDD}/PROVINCIAL/BC-ENV_AQ-WAMR/${RHH}
accept .*.lsi:pull-BC-ENV_AQ-WAMR:GOV-BC:WAMR:.*
The final changes in our sr_poll config is to reflect that finding:
change**
directory \${PBD}/\${YYYYMMDD}/\${SOURCE}/--\${0}-- to be determined ----
accept .*(something).*
for
directory ${PBD}/${YYYYMMDD}/${SOURCE}/BC-ENV_AQ-WAMR/${HH}
accept .*\.lsi.*
And we are all set for testing.
TESTING
We install sr_poll BC_ENV_AQ_WAM.conf and sr_sarra get_BC_ENV_AQ_WAM.conf on DDSR_DEV. (on ddsr_dev, there are various things to modify. Setting xattr_disable true, changing ddsr.cmc for ddsr_dev.cmc, in broker… document_root option in senders and perhaps more)
Leave the processes running and check the right disposal/announcement of the products.
MIGRATING FILTERS
Will do another paper for sundew filters that become sr_sarra.
MIGRATING SENDER
Will do another paper on how to migrate senders.
SEE ALSO
sr_poll(1) - post announcemensts of specific files.
sr_sarra(8) - Subscribe, Acquire, and ReAdvertise tool.
https://github.com/MetPX/ - sr_subscribe is a component of MetPX-Sarracenia, the AMQP based data pump.