Command Line Guide

SR3 - Everything

sr3 is a command line tool to manage Sarracenia configurations, individually or in groups. For the current user, it reads on all of the configuration files, state files, and consults the process table to determine the state of all components. It then makes the change requested.

sr3 options action [ component/configuration … ]

sr3 components are used to publish to and download files from websites or file servers that provide sr_post(7) protocol notifications. Such sites publish notification messages for each file as soon as it is available. Clients connect to a broker (often the same as the server itself) and subscribe to the notifications. The sr_post notifications provide true push notices for web-accessible folders (WAF), and are far more efficient than either periodic polling of directories, or ATOM/RSS style notifications. Sr_subscribe can be configured to post notification messages after they are downloaded, to make them available to consumers for further processing or transfers.

sr3 can also be used for purposes other than downloading, (such as for supplying to an external program) specifying the -n (equal to: download off) will suppress the download behaviour and only post the URL on standard output. The standard output can be piped to other processes in classic UNIX text filter style.

The components of sarracenia are groups of defaults on the main algorithm, to reduce the size of individual components. The components are:

  • cpump - copy notification messages from one pump another second one (a C implementation of shovel.)

  • flow - generic flow with no default behaviours. Good basis for building user defined components.

  • poll - poll a non-sarracenia web or file server to create notification messages for processing.

  • post & watch - create notification messages for files for processing.

  • sarra _ - download file from a remote server to the local one, and re-post them for others.

  • sender - send files from a local server to a remote one.

  • shovel - copy notification messages, only, not files.

  • watch - create notification messages for each new file that arrives in a directory, or at a set path.

  • winnow - copy notification messages, suppressing duplicates.

All of these components accept the same options, with the same effects. There is also sr3_cpump(1) which is a C version that implements a subset of the options here, but where they are implemented, they have the same effect.

The sr3 command usually takes two arguments: an action followed by a list of configuration files. When any component is invoked, an operation and a configuration file are specified. If the configuration is omitted, it means to apply the action to all configurations. The action is one of:

  • foreground: run a single instance in the foreground logging to stderr

  • restart: stop and then start the configuration.

  • sanity: looks for instances which have crashed or gotten stuck and restarts them.

  • start: start the configuration running

  • status: check if the configuration is running.

  • stop: stop the configuration from running

The remaining actions manage the resources (exchanges, queues) used by the component on the broker, or manage the configurations.

  • cleanup: deletes the component’s resources on the server.

  • declare: creates the component’s resources on the server.

  • add: copy to the list of available configurations.

  • list: list all the configurations available.

  • list plugins: list all the plugins available.

  • list examples: list all the examples available.

  • show view an interpreted version of a configuration file.

  • edit: modify an existing configuration.

  • remove: remove a configuration.

  • disable: mark a configuration as ineligible to run.

  • enable: mark a configuration as eligible to run.

  • convert: converts v2 configs to a v3 config.

For example: sr3 foreground subscribe/dd runs the subscribe component with the dd configuration as a single foreground instance.

The foreground action is used when building a configuration or for debugging. The foreground instance will run regardless of other instances which are currently running. Should instances be running, it shares the same notification message queue with them. A user stop the foreground instance by simply using <ctrl-c> on linux or use other means to kill the process.

After a configuration has been refined, start to launch the component as a background service (daemon or fleet of daemons whose number is controlled by the instances option.) If multiple configurations and components need to be run together, the entire fleet can be similarly controlled using the sr3(1) command.

To have components run all the time, on Linux one can use systemd integration, as described in the Admin Guide On Windows, one can configure a service, as described in the Windows user manual

The actions cleanup, declare, can be used to manage resources on the rabbitmq server. The resources are either queues or exchanges. declare creates the resources.

The add, remove, list, edit, enable & disable actions are used to manage the list of configurations. One can see all of the configurations available using the list action. to view available plugins use list plugins. Using the edit option, one can work on a particular configuration. A disabled configuration will not be started or restarted by the start, foreground, or restart actions. It can be used to set aside a configuration temporarily.

The convert action is used to translate configuration files written with Sarracenia version 2 options into Sarracenia version 3 options. The v2 configuration file must be placed in the ~/.config/sarra/component/v2_config.conf directory and the translated version will be placed in the ~/.config/sr3/component/v3_config.conf directory. For example, one would invoke this action with sr3 convert component/config.

ACTIONS

declare

Call the corresponding function for each configuration:

$ sr3 declare
  declare: 2020-09-06 23:22:18,043 [INFO] root declare looking at cpost/pelle_dd1_f04
  2020-09-06 23:22:18,048 [INFO] sarra.moth.amqp __putSetup exchange declared: xcvan00 (as: amqp://tfeed@localhost/)
  2020-09-06 23:22:18,049 [INFO] sarra.moth.amqp __putSetup exchange declared: xcvan01 (as: amqp://tfeed@localhost/)
  2020-09-06 23:22:18,049 [INFO] root declare looking at cpost/veille_f34
  2020-09-06 23:22:18,053 [INFO] sarra.moth.amqp __putSetup exchange declared: xcpublic (as: amqp://tfeed@localhost/)
  2020-09-06 23:22:18,053 [INFO] root declare looking at cpost/pelle_dd2_f05
  ...
  2020-09-06 23:22:18,106 [INFO] root declare looking at cpost/pelle_dd2_f05
  2020-09-06 23:22:18,106 [INFO] root declare looking at cpump/xvan_f14
  2020-09-06 23:22:18,110 [INFO] sarra.moth.amqp __getSetup queue declared q_tfeed.sr_cpump.xvan_f14.23011811.49631644 (as: amqp://tfeed@localhost/)
  2020-09-06 23:22:18,110 [INFO] sarra.moth.amqp __getSetup um..: pfx: v03, exchange: xcvan00, values: #
  2020-09-06 23:22:18,110 [INFO] sarra.moth.amqp __getSetup binding q_tfeed.sr_cpump.xvan_f14.23011811.49631644 with v03.# to xcvan00 (as: amqp://tfeed@localhost/)
  2020-09-06 23:22:18,111 [INFO] root declare looking at cpump/xvan_f15
  2020-09-06 23:22:18,115 [INFO] sarra.moth.amqp __getSetup queue declared q_tfeed.sr_cpump.xvan_f15.50074940.98161482 (as: amqp://tfeed@localhost/)

Declares the queues and exchanges related to each configuration. One can also invoke it with --users, so that it will declare users as well as exchanges and queues:

$ sr3 --users declare
  ...
  2020-09-06 23:28:56,211 [INFO] sarra.rabbitmq_admin add_user permission user 'ender' role source  configure='^q_ender.*|^xs_ender.*' write='^q_ender.*|^xs_ender.*' read='^q_ender.*|^x[lrs]_ender.*|^x.*public$'
  ...

Providing a flow/flows will declare only the users that are specified in the flow(s):

$ sr3 --users declare subscribe/dd_amis
  ...
  declare: 2024-05-17 20:02:18,548 434920 [INFO] sarracenia.rabbitmq_admin add_user permission user 'tfeed@localhost' role feeder  configure=.* write=.* read=.*
  ...

dump

print the three data structure used by sr. There are three lists:

  • processes thought to be related to sr.

  • configurations present

  • contents of the state files.

dump is used for debugging or to get more detail than provided by status:

Running Processes
     4238: name:sr_poll.py cmdline:['/usr/bin/python3', '/home/peter/src/sarracenia/sarra/sr_poll.py', '--no', '1', 'start', 'pulse']
     .
     .
     .
Configs
   cpost
       veille_f34 : {'status': 'running', 'instances': 1}

States
   cpost
       veille_f34 : {'instance_pids': {1: 4251}, 'queue_name': None, 'instances_expected': 0, 'has_state': False, 'missing_instances': []}

Missing

It is quite long, and so a bit too much information to look at in a raw state. Usually used in conjunction with linux filters, such as grep. for example:

$ sr3 dump  | grep stopped
  WMO_mesh_post : {'status': 'stopped', 'instances': 0}
  shim_f63 : {'status': 'stopped', 'instances': 0}
  test2_f61 : {'status': 'stopped', 'instances': 0}

$ sr3 dump  | grep disabled
  amqp_f30.conf : {'status': 'disabled', 'instances': 5}

provides easy method to determine which configurations are in a particular state. Another example, if sr3 status reports sender/tsource2send_f50 as being partial, then one can use dump to get more detail:

$ sr3 dump | grep sender/tsource2send_f50
  49308: name:sr3_sender.py cmdline:['/usr/bin/python3', '/usr/lib/python3/dist-packages/sarracenia/instance.py', '--no', '1', 'start', 'sender/tsource2send_f50']
  q_tsource.sr_sender.tsource2send_f50.58710892.12372870: ['sender/tsource2send_f50']

foreground

run a single instance of a single configuration as an interactive process logging to the current stderr/terminal output. for debugging.

list

shows the user the configuration files present:

$ sr3 list
  User Configurations: (from: /home/peter/.config/sarra )
  cpost/pelle_dd1_f04.conf         cpost/pelle_dd2_f05.conf         cpost/veille_f34.conf
  cpump/xvan_f14.conf              cpump/xvan_f15.conf              poll/f62.conf
  post/shim_f63.conf               post/t_dd1_f00.conf              post/t_dd2_f00.conf
  post/test2_f61.conf              sarra/download_f20.conf          sender/tsource2send_f50.conf
  shovel/rabbitmqtt_f22.conf       subscribe/amqp_f30.conf          subscribe/cclean_f91.conf
  subscribe/cdnld_f21.conf         subscribe/cfile_f44.conf         subscribe/cp_f61.conf
  subscribe/ftp_f70.conf           subscribe/q_f71.conf             subscribe/rabbitmqtt_f31.conf
  subscribe/u_sftp_f60.conf        watch/f40.conf                   admin.conf
  credentials.conf                 default.conf
  logs are in: /home/peter/.cache/sarra/log

The last line says which directory the log files are in.

Also list examples shows included configuration templates available as starting points with the add action:

$ sr3 list examples
  Sample Configurations: (from: /home/peter/Sarracenia/development/sarra/examples )
  cpump/cno_trouble_f00.inc        poll/aws-nexrad.conf             poll/pollingest.conf
  poll/pollnoaa.conf               poll/pollsoapshc.conf            poll/pollusgs.conf
  poll/pulse.conf                  post/WMO_mesh_post.conf          sarra/wmo_mesh.conf
  sender/ec2collab.conf            sender/pitcher_push.conf         shovel/no_trouble_f00.inc
  subscribe/WMO_Sketch_2mqtt.conf  subscribe/WMO_Sketch_2v3.conf    subscribe/WMO_mesh_CMC.conf
  subscribe/WMO_mesh_Peer.conf     subscribe/aws-nexrad.conf        subscribe/dd_2mqtt.conf
  subscribe/dd_all.conf            subscribe/dd_amis.conf           subscribe/dd_aqhi.conf
  subscribe/dd_cacn_bulletins.conf subscribe/dd_citypage.conf       subscribe/dd_cmml.conf
  subscribe/dd_gdps.conf           subscribe/dd_ping.conf           subscribe/dd_radar.conf
  subscribe/dd_rdps.conf           subscribe/dd_swob.conf           subscribe/ddc_cap-xml.conf
  subscribe/ddc_normal.conf        subscribe/downloademail.conf     subscribe/ec_ninjo-a.conf
  subscribe/hpfx_amis.conf         subscribe/local_sub.conf         subscribe/pitcher_pull.conf
  subscribe/sci2ec.conf            subscribe/subnoaa.conf           subscribe/subsoapshc.conf
  subscribe/subusgs.conf           watch/master.conf                watch/pitcher_client.conf
  watch/pitcher_server.conf        watch/sci2ec.conf


$ sr3 add dd_all.conf
  add: 2021-01-24 18:04:57,018 [INFO] sarracenia.sr add copying: /usr/lib/python3/dist-packages/sarracenia/examples/subscribe/dd_all.conf to /home/peter/.config/sr3/subscribe/dd_all.conf
$ sr3 edit dd_all.conf

The add, remove, list, edit, enable & disable actions are used to manage the list of configurations. One can see all of the configurations available using the list action. to view available plugins use list plugins. Using the edit option, one can work on a particular configuration. A disabled sets a configuration aside (by adding .off to the name) so that it will not be started or restarted by the start, foreground, or restart actions.

show

View all configuration settings (the result of all parsing… what the flow components actually see):

% sr3 show subscribe/q_f71
2022-03-20 15:30:32,507 1084652 [INFO] sarracenia.config parse_file download_f20.conf:35 obsolete v2:"on_message msg_log" converted to sr3:"logEvents after_accept"
2022-03-20 15:30:32,508 1084652 [INFO] sarracenia.config parse_file tsource2send_f50.conf:26 obsolete v2:"on_message msg_rawlog" converted to sr3:"logEvents after_accept"
2022-03-20 15:30:32,508 1084652 [INFO] sarracenia.config parse_file rabbitmqtt_f22.conf:6 obsolete v2:"on_message msg_log" converted to sr3:"logEvents after_accept"

Config of subscribe/q_f71:
{'_Config__admin': 'amqp://bunnymaster@localhost/ None True True False False None None',
 '_Config__broker': 'amqp://tsource@localhost/ None True True False False None None',
 '_Config__post_broker': None,
 'accelThreshold': 0,
 'acceptSizeWrong': False,
 'acceptUnmatched': False,
 'admin': 'amqp://bunnymaster@localhost/ None True True False False None None',
 'attempts': 3,
 'auto_delete': False,
 'baseDir': None,
 'baseUrl_relPath': False,
 'batch': 1,
 'bindings': [('xs_tsource_poll', ['v03', 'post'], ['#'])],
 'broker': 'amqp://tsource@localhost/ None True True False False None None',
 'bufsize': 1048576,
 'byteRateMax': None,
 'cfg_run_dir': '/home/peter/.cache/sr3/subscribe/q_f71',
 'component': 'subscribe',
 'config': 'q_f71',
 'currentDir': None,
 'debug': False,
 'declared_exchanges': [],
 'declared_users': {'anonymous': 'subscriber', 'eggmeister': 'subscriber', 'ender': 'source', 'tfeed': 'feeder', 'tsource': 'source', 'tsub': 'subscriber'},
 'delete': False,
 'destfn_script': None,
 'directory': '//home/peter/sarra_devdocroot/recd_by_srpoll_test1',
 'discard': False,
 'documentRoot': None,
 'download': True,
 'durable': True,
 'env_declared': ['FLOWBROKER', 'MQP', 'SFTPUSER', 'TESTDOCROOT'],
 'exchange': 'xs_tsource_poll',
 'exchangeDeclare': True,
 'exchangeSuffix': 'poll',
 'expire': 1800.0,
 'feeder': ParseResult(scheme='amqp', netloc='tfeed@localhost', path='/', params='', query='', fragment=''),
 'fileEvents': {'create', 'link', 'modify', 'delete', 'mkdir', 'rmdir' },
 'file_total_interval': '0',
 'filename': None,
 'fixed_headers': {},
 'flatten': '/',
 'hostdir': 'fractal',
 'hostname': 'fractal',
 'housekeeping': 300,
 'imports': [],
 'inflight': None,
 'inline': False,
 'inlineByteMax': 4096,
 'inlineEncoding': 'guess',
 'inlineOnly': False,
 'instances': 1,
 'identity_arbitrary_value': None,
 'identity_method': 'sha512',
 'logEvents': {'after_work', 'after_accept', 'on_housekeeping'},
 'logFormat': '%(asctime)s [%(levelname)s] %(name)s %(funcName)s %(message)s',
 'logLevel': 'info',
 'logReject': False,
 'logRotateCount': 5,
 'logRotateInterval': 86400.0,
 'logStdout': True,
 'log_flowcb_needed': False,
 'masks': ['accept .* into //home/peter/sarra_devdocroot/recd_by_srpoll_test1 with mirror:True strip:.*sent_by_tsource2send/'],
 'messageAgeMax': 0,
 'messageCountMax': 0,
 'messageDebugDump': False,
 'messageRateMax': 0,
 'messageRateMin': 0,
 'message_strategy': {'failure_duration': '5m', 'reset': True, 'stubborn': True},
 'message_ttl': 0,
 'mirror': True,
 'msg_total_interval': '0',
 'fileAgeMax': 0,
 'nodupe_ttl': 0,
 'overwrite': True,
 'permCopy': True,
 'permDefault': 0,
 'permDirDefault': 509,
 'permLog': 384,
 'plugins_early': [],
 'plugins_late': ['sarracenia.flowcb.log.Log'],
 'post_baseDir': None,
 'post_baseUrl': None,
 'post_broker': None,
 'post_documentRoot': None,
 'post_exchanges': [],
 'post_topicPrefix': ['v03', 'post'],
 'prefetch': 25,
 'pstrip': '.*sent_by_tsource2send/',
 'queueBind': True,
 'queueDeclare': True,
 'queueName': 'q_tsource_subscribe.q_f71.76359618.62916076',
 'queue_filename': '/home/peter/.cache/sr3/subscribe/q_f71/subscribe.q_f71.tsource.qname',
 'randid': 'cedf',
 'randomize': False,
 'realpathPost': False,
 'rename': None,
 'report': False,
 'reset': False,
 'resolved_qname': 'q_tsource_subscribe.q_f71.76359618.62916076',
 'retry_ttl': 1800.0,
 'settings': {},
 'sleep': 0.1,
 'statehost': False,
 'strip': 0,
 'subtopic': [],
 'timeCopy': True,
 'timeout': 300,
 'timezone': 'UTC',
 'tls_rigour': 'normal',
 'topicPrefix': ['v03', 'post'],
 'undeclared': ['msg_total_interval', 'file_total_interval'],
 'users': False,
 'v2plugin_options': [],
 'v2plugins': {'plugin': ['msg_total_save', 'file_total_save']},
 'vhost': '/',
 'vip': []}

%

convert

Converting a config: both formats are accepted, as well as include files:

$ sr3 convert poll/sftp_f62
  2022-06-14 15:00:00,762 1093345 [INFO] root convert converting poll/sftp_f62 from v2 to v3

$ sr3 convert poll/sftp_f62.conf
  2022-06-14 15:01:11,766 1093467 [INFO] root convert converting poll/sftp_f62.conf from v2 to v3

$ sr3 convert shovel/no_trouble_f00.inc
  2022-06-14 15:03:29,918 1093655 [INFO] root convert converting shovel/no_trouble_f00.inc from v2 to v3

To overwrite an existing sr3 configuration use the –wololo option. When overwriting multiple sr3 configurations, one must also use –dangerWillRobinson=n in the normal way… where n is the number of configurations to convert.

start

launch all configured components:

$ sr3 start
  gathering global state: procs, configs, state files, logs, analysis - Done.
  starting...Done

stop

stop all processes:

$ sr3 stop
  gathering global state: procs, configs, state files, logs, analysis - Done.
  stopping........Done
  Waiting 1 sec. to check if 93 processes stopped (try: 0)
  All stopped after try 0

status

Sample OK status (sr is running):

SSC-5CD2310S60% sr3 status

status:
Component/Config                         Processes   Connection        Lag                              Rates
                                         State   Run Retry  msg data   Queued  LagMax LagAvg  Last  %rej     pubsub messages   RxData     TxData
                                         -----   --- -----  --- ----   ------  ------ ------  ----  ----     ------ --------   ------     ------
cpost/veille_f34                         idle    1/1     0 100%   0%      0    0.00s    0.00s n/a    0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
cpump/pelle_dd1_f04                      stop    0/0     0   0%   0%      0    0.00s    0.00s 0      0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
cpump/pelle_dd2_f05                      stop    0/0     0   0%   0%      0    0.00s    0.00s 0      0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
cpump/xvan_f14                           hung    1/1     0 100%   0%      0    0.00s    0.00s n/a    0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
cpump/xvan_f15                           hung    1/1     0 100%   0%      0    0.00s    0.00s n/a    0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
poll/f62                                 idle    1/1     0 100%   0%      0    0.00s    0.00s 221.3s  0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
post/shim_f63                            stop    0/0     0   0%   0%      0    0.00s    0.00s 0      0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
post/test2_f61                           stop    0/0     0 100%   0%      0    0.00s    0.00s 224.6s  0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
sarra/download_f20                       idle    3/3     0 100%   0%      0    0.00s    0.00s 324.0s  0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
sender/tsource2send_f50                  idle  10/10     0 100%   0%      0    0.00s    0.00s 140.9s  0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
shovel/pclean_f90                        idle    3/3   231 100%   0%      0    0.00s    0.00s 140.8s  0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
shovel/pclean_f92                        idle    3/3     0 100%   0%      0    0.00s    0.00s 142.1s  0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
shovel/rabbitmqtt_f22                    hung    3/3     0 100%   0%      0    0.00s    0.00s 139.9s  0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
shovel/t_dd1_f00                         stop    0/0     0   0%   0%      0    0.00s    0.00s 0      0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
shovel/t_dd2_f00                         stop    0/0     0   0%   0%      0    0.00s    0.00s 0      0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
subscribe/amqp_f30                       idle    3/3     0 100%   0%      0    0.00s    0.00s 323.9s  0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
subscribe/cclean_f91                     idle    3/3   247 100%   0%      1    0.00s    0.00s 217.1s  0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
subscribe/cdnld_f21                      idle    3/3     0 100%   0%     17    0.00s    0.00s 313.2s  0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
subscribe/cfile_f44                      idle    3/3     0 100%   0%      1    0.00s    0.00s 217.1s  0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
subscribe/cp_f61                         idle    3/3     0 100%   0%      0    0.00s    0.00s 139.0s  0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
subscribe/ftp_f70                        part    1/3     8 100%   3%      0  586.95s  438.50s 224.4s  0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
subscribe/q_f71                          lag     3/3    15 100%  39%      0  573.83s  457.23s 221.0s  0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
subscribe/rabbitmqtt_f31                 idle    3/3     0 100%   0%      0    0.00s    0.00s 138.1s  0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
subscribe/u_sftp_f60                     idle    3/3     0 100%   0%      0    0.00s    0.00s 140.8s  0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
watch/f40                                idle    1/1     0 100%   0%      0    0.00s    0.00s 141.3s  0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
winnow/t00_f10                           idle    1/1     0 100%   0%     42    0.00s    0.00s 324.3s  0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
winnow/t01_f10                           idle    1/1     0 100%   0%      0    0.00s    0.00s 325.4s  0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s
      Total Running Configs:  21 ( Processes: 54 missing: 2 stray: 0 )
                     Memory: uss:2.2 GiB rss:3.0 GiB vms:6.6 GiB
                   CPU Time: User:562.61s System:47.39s
           Pub/Sub Received: 0 msgs/s (0 Bytes/s), Sent:  0 msgs/s (0 Bytes/s) Queued: 97 Retry: 1019, Mean lag: 445.07s
              Data Received: 0 Files/s (0 Bytes/s), Sent: 0 Files/s (0 Bytes/s)

Full sample:

fractal% sr3 --full status
status:
Component/Config                         Processes   Connection        Lag                              Rates                                        Counters (per housekeeping)                                                    Data Counters                                           Memory                             CPU Time
                                         State   Run Retry  msg data   Queued  LagMax LagAvg  Last  %rej     pubsub messages   RxData     TxData       subBytes   Accepted   Rejected  Malformed   pubBytes    pubMsgs     pubMal     rxData    rxFiles     txData    txFiles    Since       uss        rss        vms       user     system
                                         -----   --- -----  --- ----   ------  ------ ------  ----  ----     ------ --------   ------     ------        -------   --------   --------  ---------    -------     ------      -----      -----    -------     ------  -------      -----        ---        ---        ---       ----     ------
cpost/veille_f34                         run     1/1     0 100%   0%      0    0.00s    0.00s n/a    0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s    0 Bytes     0 msgs     0 msgs     0 msgs    0 Bytes     0 msgs     0 msgs    0 Bytes    0 Files    0 Bytes    0 Files  121.40s    2.4 MiB    5.9 MiB   15.2 MiB        0.03       0.08
cpump/pelle_dd1_f04                      run     1/1     0 100%   0%      0    0.00s    0.00s n/a   83.5%  0 Bytes/s  11 msgs/s  0 Bytes/s  0 Bytes/s    0 Bytes    1.4 Kim    1.1 Kim     0 msgs    0 Bytes   230 msgs     0 msgs    0 Bytes    0 Files    0 Bytes    0 Files  121.40s    3.9 MiB    6.8 MiB   17.2 MiB        0.12       0.11
cpump/pelle_dd2_f05                      run     1/1     0 100%   0%      0    0.00s    0.00s n/a   83.5%  0 Bytes/s  11 msgs/s  0 Bytes/s  0 Bytes/s    0 Bytes    1.4 Kim    1.1 Kim     0 msgs    0 Bytes   230 msgs     0 msgs    0 Bytes    0 Files    0 Bytes    0 Files  121.40s    3.9 MiB    7.0 MiB   17.2 MiB        0.17       0.06
cpump/xvan_f14                           run     1/1     0 100%   0%      0    0.00s    0.00s n/a    0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s    0 Bytes     0 msgs     0 msgs     0 msgs    0 Bytes     0 msgs     0 msgs    0 Bytes    0 Files    0 Bytes    0 Files   12.90s    3.0 MiB    4.7 MiB   16.1 MiB        0.01       0.01
cpump/xvan_f15                           run     1/1     0 100%   0%      0    0.00s    0.00s n/a    0.0%  0 Bytes/s   0 msgs/s  0 Bytes/s  0 Bytes/s    0 Bytes      1 msg     0 msgs     0 msgs    0 Bytes     0 msgs     0 msgs    0 Bytes    0 Files    0 Bytes    0 Files    8.63s    3.0 MiB    4.6 MiB   16.1 MiB        0.01       0.02
poll/f62                                 run     1/1     0 100%   0%      0    0.16s    0.05s  0.8s  0.0%  1.3 KiB/s   0 msgs/s  0 Bytes/s  0 Bytes/s    0 Bytes     0 msgs    4.4 Kim     0 msgs  151.1 KiB   420 msgs     0 msgs    0 Bytes    0 Files    0 Bytes    0 Files  118.56s   45.1 MiB   59.0 MiB  229.3 MiB        2.62       0.32
post/shim_f63                            stop    0/0          -          -         -     -     -          -        -        -       -          -          -          -          -          -          -          -          -          -          -    0 Bytes    0 Bytes    0 Bytes        0.00       0.00
post/test2_f61                           stop    0/0     0 100%   0%      0    0.03s    0.02s  0.5s  0.0% 11.5 KiB/s   0 msgs/s  0 Bytes/s  0 Bytes/s    0 Bytes     0 msgs     0 msgs     0 msgs    6.7 KiB    14 msgs     0 msgs    0 Bytes    0 Files    0 Bytes    0 Files    0.58s    0 Bytes    0 Bytes    0 Bytes        0.00       0.00
sarra/download_f20                       run     3/3     0 100%   0%      0   10.17s    2.87s  1.1s  0.0%  1.1 KiB/s   0 msgs/s  1.1 KiB/s  0 Bytes/s   27.1 KiB    47 msgs     0 msgs     0 msgs   36.0 KiB    47 msgs     0 msgs   65.6 KiB   47 Files    0 Bytes    0 Files   57.57s  132.5 MiB  192.4 MiB  280.1 MiB        2.72       0.40
sender/tsource2send_f50                  run   10/10     0 100%   9%      0    1.37s    0.52s  1.3s  0.0%  5.5 KiB/s   3 msgs/s  0 Bytes/s  1.3 KiB/s  326.0 KiB   421 msgs     0 msgs     0 msgs  326.6 KiB   421 msgs     0 msgs    0 Bytes    0 Files  152.7 KiB  421 Files  118.97s  425.6 MiB  562.8 MiB    2.2 GiB        7.80       0.93
shovel/pclean_f90                        run     3/3   310 100%   0%      0   82.03s   75.72s  0.7s  0.0%  5.6 KiB/s   3 msgs/s  0 Bytes/s  0 Bytes/s  111.4 KiB   120 msgs     0 msgs     0 msgs   99.9 KiB   111 msgs     0 msgs    0 Bytes    0 Files    0 Bytes    0 Files   38.02s  127.4 MiB  169.0 MiB  249.6 MiB        2.37       0.27
shovel/pclean_f92                        run     3/3     0 100%   0%      0   82.49s   76.06s 19.1s  0.0%  1.7 KiB/s   0 msgs/s  0 Bytes/s  0 Bytes/s   99.9 KiB   111 msgs     0 msgs     0 msgs  103.0 KiB   111 msgs     0 msgs    0 Bytes    0 Files    0 Bytes    0 Files  118.28s  126.4 MiB  168.5 MiB  249.2 MiB        2.04       0.21
shovel/rabbitmqtt_f22                    run     3/3     0 100%   0%      0    1.25s    0.54s  1.3s  0.0%  5.5 KiB/s   3 msgs/s  0 Bytes/s  0 Bytes/s  326.0 KiB   421 msgs     0 msgs     0 msgs  326.0 KiB   421 msgs     0 msgs    0 Bytes    0 Files    0 Bytes    0 Files  118.77s  126.0 MiB  167.6 MiB  248.7 MiB        2.12       0.20
shovel/t_dd1_f00                         run     3/3     0 100%   0%      3   23.15s    3.06s  0.1s 82.3%  3.4 KiB/s  14 msgs/s  0 Bytes/s  0 Bytes/s  231.6 KiB    1.6 Kim    1.3 Kim     0 msgs  168.4 KiB   293 msgs     0 msgs    0 Bytes    0 Files    0 Bytes    0 Files  118.19s  127.3 MiB  171.0 MiB  250.1 MiB        2.77       0.32
shovel/t_dd2_f00                         run     3/3     0 100%   0%      0   11.82s    2.61s  0.2s 82.2%  3.3 KiB/s  13 msgs/s  0 Bytes/s  0 Bytes/s  225.7 KiB    1.6 Kim    1.3 Kim     0 msgs  166.6 KiB   290 msgs     0 msgs    0 Bytes    0 Files    0 Bytes    0 Files  118.33s  127.3 MiB  170.6 MiB  250.1 MiB        2.74       0.39
subscribe/amqp_f30                       run     3/3     0 100%  39%      0   18.79s    6.44s  0.7s  0.0%  1.5 KiB/s   2 msgs/s  1.2 KiB/s  0 Bytes/s  181.0 KiB   237 msgs     0 msgs     0 msgs    0 Bytes     0 msgs     0 msgs  146.8 KiB  237 Files    0 Bytes    0 Files  117.95s  126.2 MiB  168.0 MiB  248.9 MiB        2.20       0.27
subscribe/cclean_f91                     run     3/3   213 100%   0%      0    0.00s    0.00s  0.5s 17.4%  2.7 KiB/s   6 msgs/s  0 Bytes/s  0 Bytes/s   35.5 KiB    92 msgs    16 msgs     0 msgs    0 Bytes     0 msgs     0 msgs    0 Bytes    0 Files    0 Bytes    0 Files   13.37s  126.1 MiB  167.9 MiB  248.8 MiB        2.53       0.36
subscribe/cdnld_f21                      run     3/3     0 100%  41%      0   10.43s    3.13s  0.1s  0.0%  1.4 KiB/s   2 msgs/s  1.4 KiB/s  0 Bytes/s  167.1 KiB   262 msgs     0 msgs     0 msgs    0 Bytes     0 msgs     0 msgs  162.7 KiB  262 Files    0 Bytes    0 Files  117.90s  131.6 MiB  190.7 MiB  277.6 MiB        3.39       0.42
subscribe/cfile_f44                      run     3/3     0 100%  40%      0    3.32s    0.30s  0.0s  0.0%  1.5 KiB/s   4 msgs/s  1.4 KiB/s  0 Bytes/s  178.3 KiB   509 msgs     0 msgs     0 msgs    0 Bytes     0 msgs     0 msgs  161.8 KiB  509 Files    0 Bytes    0 Files  118.09s  130.9 MiB  188.5 MiB  278.0 MiB        2.62       0.30
subscribe/cp_f61                         run     3/3     0 100%  12%      0    6.42s    2.09s  0.1s  0.0%  2.8 KiB/s   3 msgs/s 597 Bytes/s  0 Bytes/s  326.6 KiB   421 msgs     0 msgs     0 msgs    0 Bytes     0 msgs     0 msgs   69.0 KiB  123 Files    0 Bytes    0 Files  118.23s  125.9 MiB  166.8 MiB  248.9 MiB        2.48       0.31
subscribe/ftp_f70                        run     3/3     0 100%  39%      0    1.75s    0.77s  0.1s  0.0%  1.3 KiB/s   2 msgs/s  1.4 KiB/s  0 Bytes/s  158.4 KiB   340 msgs     0 msgs     0 msgs    0 Bytes     0 msgs     0 msgs  159.8 KiB  340 Files    0 Bytes    0 Files  118.04s  126.1 MiB  167.4 MiB  248.8 MiB        2.22       0.36
subscribe/q_f71                          run     3/3     0 100%  31%      0    3.12s    1.18s  5.7s  0.0%  1.2 KiB/s   3 msgs/s  1.1 KiB/s  0 Bytes/s  142.7 KiB   396 msgs     0 msgs     0 msgs    0 Bytes     0 msgs     0 msgs  124.6 KiB  396 Files    0 Bytes    0 Files  118.49s  135.0 MiB  191.9 MiB  928.8 MiB        4.30       0.68
subscribe/rabbitmqtt_f31                 run     3/3     0 100%   8%      0    4.27s    1.10s  1.1s  0.0%  2.8 KiB/s   3 msgs/s 598 Bytes/s  0 Bytes/s  326.0 KiB   421 msgs     0 msgs     0 msgs    0 Bytes     0 msgs     0 msgs   69.0 KiB  123 Files    0 Bytes    0 Files  118.15s  126.2 MiB  167.9 MiB  248.9 MiB        2.22       0.27
subscribe/u_sftp_f60                     run     3/3     0 100%   9%      0    2.69s    1.71s  1.2s  0.2%  2.8 KiB/s   3 msgs/s 599 Bytes/s  0 Bytes/s  326.6 KiB   421 msgs      1 msg     0 msgs    0 Bytes     0 msgs     0 msgs   69.0 KiB  122 Files    0 Bytes    0 Files  117.93s  126.4 MiB  167.8 MiB  249.2 MiB        2.05       0.33
watch/f40                                run     1/1     0 100%   0%      0    0.06s    0.02s  1.6s  0.0%  3.0 KiB/s   0 msgs/s  0 Bytes/s  0 Bytes/s    0 Bytes     0 msgs    74 msgs     0 msgs  169.5 KiB   203 msgs     0 msgs    0 Bytes    0 Files    0 Bytes    0 Files   57.21s   46.3 MiB   65.3 MiB  311.8 MiB        1.95       0.14
winnow/t00_f10                           run     1/1     0 100%   0%      0    6.38s    3.47s  0.2s 51.7%  1.7 KiB/s   2 msgs/s  0 Bytes/s  0 Bytes/s   66.8 KiB   116 msgs    60 msgs     0 msgs   32.3 KiB    56 msgs     0 msgs    0 Bytes    0 Files    0 Bytes    0 Files   57.79s   42.5 MiB   56.2 MiB   83.3 MiB        0.77       0.12
winnow/t01_f10                           run     1/1     0 100%   0%      0    9.75s    2.53s  0.1s 51.3%  1.8 KiB/s   2 msgs/s  0 Bytes/s  0 Bytes/s   67.2 KiB   117 msgs    60 msgs     0 msgs   32.8 KiB    57 msgs     0 msgs    0 Bytes    0 Files    0 Bytes    0 Files   56.37s   42.3 MiB   55.7 MiB   83.2 MiB        0.81       0.12
      Total Running Configs:  25 ( Processes: 64 missing: 0 stray: 0 )
                     Memory: uss:2.5 GiB rss:3.4 GiB vms:7.4 GiB
                   CPU Time: User:53.06s System:7.00s
       Pub/Sub Received: 99 msgs/s (63.2 KiB/s), Sent:  53 msgs/s (29.3 KiB/s) Queued: 3 Retry: 523, Mean lag: 4.54s
          Data Received: 18 Files/s (9.3 KiB/s), Sent: 3 Files/s (1.3 KiB/s)

The first row of output gives categories for the following line:

  • Processes: report on the number of instances and, and whether any are missing.

  • Connection: indicators of health of connectivity with remote.

  • Lag: the degree of delay being experienced, or how old data is by the time downloads are complete.

  • Last: indicates last successful transfer (for components that move files) or last message received/posted.

  • Rates: the speed of transfers, various measures.

  • Counters: the basis for rate calculations, reset each housekeeping interval.

  • Memory: the memory usage of the processes in the configuration.

  • Cpu Time: the cpu time usage of the processes in the configuration.

The last three categories will only be listed when the –full option is provided.

The second row of output gives detailed headings within each category:

The configurations are listed on the left. For each configuraion, the state will be:

  • cpuS: process is expensive in CPU usage (runStateThreshold_cpuSlow)

  • disa: disabled, configured not to run.

  • disc: disconnected, cannot reach the message broker.

  • down: cannot connect or exchange data with remote data source or sink.

  • hung: processes appear hung, not writing anything to logs.

  • idle: all processes running, but no data or message transfers for too long (runStateThreshold_idle)

  • lag: all processes running, but messages being processed are too old ( runStateThreshold_lag )

  • part: some processes are running, others are missing.

  • reje: all processes running, but too high percent of messages being rejected (runStateThreshold_reject )

  • rtry: all processes running, but too large number of transfers failed and retrying (runStateThreshold_retry )

  • run: all processes are running (and transferring, and not behind, and not slow… normal state.)

  • slow: transfering less than minimum bytes/second ( runStateThreshold_slow )

  • stop: no processes are running.

  • stby: Standby mode: all processes running, but messages are being stored in the local download_retry queue.

  • wVip: process doesn’t have the vip (only applies when the vip option is specified in the config)

The next columns to the right give more information, detailing how many processes are Running, out of the number expected. For example, 3/3 means 3 processes or instances found of the 3 expected to be found. The Expected entry lists how many processes should be running based on the configuration, and whether it is stopped or not.

The Retry column is the number of notification messages stored in the local retry queue, indicating what channels are having processing difficulties.

Heading

Purpose

State

General health/status of a given configuration: stop|run|disa|part|wVip|fore

Run

Number of processes or instances running compared to expected. 3/10 3 processes running of 10 expected.

Retry

The number of messages in retry queues, indicating problems with transfers.

msg

The percentage of time connected to a message broker, to subscribe to or publish messages.

data

The percentrage of time connected to data sources or sinks to transfer data.

LagQueued

The number of messages queued on the (remote) broker, not yet picked up.

LagMax

The maximum age of messages on receipt (prior to download), so delay upstream.

LagAvg

The average age of messages on receipt (prior to download), so delay upstream.

Last

Elapsed time since the last successful file (or message) transfer.

pubsub

The rate of the pub/sub messages being downloaded in bytes/second

messages

The rate of the pub/sub messages being downloaded in messages/second

%rej

The percent of downloaded messages rejected (indicator of health of subscription tuning.)

RxData

The amount of data being downloaded (not messages, but file data.)

TxData

The amount of data being sent (not messages, but file data.)

subBytes

Counter of the bytes of pub/sub messages being received.

Accepted

Counter of number of messages accepted

Rejected

Counter of number of messages rejected

Malformed

Counter of messages which were discarded because they could not be understood.

pubBytes

Counter of bytes of messages being published

pubMsgs

Counter of messages being published

pubMal

Counter of message which failed to publish.

rxData

Counter of bytes of files being downloaded.

rxFiles

Counter of files being downloaded.

txData

Counter of bytes of files being sent.

txFiles

Counter of files being sent.

Since

How many seconds since the last counter reset (basis for rate calculations.)

uss

unique set size (memory usage of instances.) Actual physical unique memory used by processes.

rss

resident set size (memory usage of instances) Actual physical memory used, including shared.

vms

virtual memory size of all shared and physical and swap allocated together.

user

user cpu time used

system

system cpu time used.

At the bottom of the listing is a cumulation of these values.

Message Gathering

Most Metpx Sarracenia components loop on gathering and/or reception of sarracenia notification messages. Usually, the notification messages are sr_post(7) notification messages, announcing the availability of a file by publishing its URL, but there are also report messages which can be processed using the same tools. AMQP messages are published to an exchange on a broker (AMQP server). The exchange delivers notification messages to queues. To receive notification messages, one must provide the credentials to connect to the broker. Once connected, a consumer needs to create a queue to hold pending notification messages. The consumer must then bind the queue to one or more exchanges so that they put notification messages in its queue.

Once the bindings are set, the program can receive notification messages. When a notification message is received, further filtering is possible using regular expressions onto the AMQP messages. After a notification message passes this selection process, and other internal validation, the component can run an after_accept plugin script to perform additional notification message processing. If this plugin returns False, the notification message is discarded. If True, processing continues.

The following sections explains all the options to set this “consuming” part of sarracenia programs.

Setting the Broker

broker [amqp|mqtt]{s}://<user>:<password>@<brokerhost>[:port]/<vhost>

A URI is used to configure a connection to a notification message pump, either an MQTT or an AMQP broker. Some Sarracenia components set a reasonable default for that option. provide the normal user,host,port of connections. In most configuration files, the password is missing. The password is normally only included in the credentials.conf file.

Sarracenia work has not used vhosts, so vhost should almost always be /.

for more info on the AMQP URI format: ( https://www.rabbitmq.com/uri-spec.html )

either in the default.conf or each specific configuration file. The broker option tell each component which broker to contact.

broker [amqp|mqtt]{s}://<user>:<pw>@<brokerhost>[:port]/<vhost>

::

(default: None and it is mandatory to set it )

Once connected to an AMQP broker, the user needs to bind a queue to exchanges and topics to determine the notification messages of interest.

Creating the Queue

Once connected to an AMQP broker, the user needs to create a queue. Common settings for the queue on broker :

  • queueShare <string> (default: ${USER}_${HOSTNAME}_${RAND8})

  • expire <duration> (default: 5m == five minutes. RECOMMEND OVERRIDING)

  • message_ttl <duration> (default: None)

  • prefetch <N> (default: 1)

Usually components guess reasonable defaults for all these values and users do not need to set them. For less usual cases, the user may need to override the defaults. The queue is where the notifications are held on the server for each subscriber.

queueShare <str> (default: ${USER}_${HOSTNAME}_${RAND8} )

A suffix included to queue names to allow defining the sharing scope of a queue. When multiple hosts are participating in the same queue, use this setting to have instances pick the same queue:

queueShare my_share_group

to get a private queue, for example, one could specify:

queueShare ${RAND8}

will result in a random 8 digit number being appended to the queue name. All the instances within the configuration with access to the same state directory will use the queue name thus defined.

AMQP QUEUE BINDINGS

Once one has a queue, it must be bound to an exchange. Users almost always need to set these options. Once a queue exists on the broker, it must be bound to an exchange. Bindings define which notification messages (URL notifications) the program receives. The root of the topic tree is fixed to indicate the protocol version and type of the notification message (but developers can override it with the topicPrefix option.)

These options define which notification messages (URL notifications) the program receives:

  • exchange <name> (default: xpublic)

  • exchangeSuffix <name> (default: None)

  • topicPrefix <amqp pattern> (default: v03 – developer option)

  • subtopic <amqp pattern> (no default, must appear after exchange)

subtopic <amqp pattern> (default: #)

Within an exchange’s postings, the subtopic setting narrows the product selection. To give a correct value to the subtopic, one has the choice of filtering using subtopic with only AMQP’s limited wildcarding and length limited to 255 encoded bytes, or the more powerful regular expression based accept/reject mechanisms described below. The difference being that the AMQP filtering is applied by the broker itself, saving the notices from being delivered to the client at all. The accept/reject patterns apply to notification messages sent by the broker to the subscriber. In other words, accept/reject are client side filters, whereas subtopic is server side filtering.

It is best practice to use server side filtering to reduce the number of notification messages sent to the client to a small superset of what is relevant, and perform only a fine-tuning with the client side mechanisms, saving bandwidth and processing for all.

topicPrefix is primarily of interest during protocol version transitions, where one wishes to specify a non-default protocol version of notification messages to subscribe to.

Usually, the user specifies one exchange, and several subtopic options. Subtopic is what is normally used to indicate notification messages of interest. To use the subtopic to filter the products, match the subtopic string with the relative path of the product.

For example, consuming from DD, to give a correct value to subtopic, one can browse the our website http://dd.weather.gc.ca and write down all directories of interest. For each directory tree of interest, write a subtopic option as follow:

subtopic directory1.*.subdirectory3.*.subdirectory5.#

where:
      *                matches a single directory name
      #                matches any remaining tree of directories.
note:

When directories have these wild-cards, or spaces in their names, they will be URL-encoded ( ‘#’ becomes %23 ) When directories have periods in their name, this will change the topic hierarchy.

FIXME:

hash marks are URL substituted, but did not see code for other values. Review whether asterisks in directory names in topics should be URL-encoded. Review whether periods in directory names in topics should be URL-encoded.

One can use multiple bindings to multiple exchanges as follows:

exchange A
subtopic directory1.*.directory2.#

exchange B
subtopic *.directory4.#

Will declare two separate bindings to two different exchanges, and two different file trees. While default binding is to bind to everything, some brokers might not permit clients to set bindings, or one might want to use existing bindings. One can turn off queue binding as follows:

subtopic None

(False, or off will also work.)

Client-side Filtering

We have selected our notification messages through exchange, subtopic and perhaps patterned subtopic with AMQP’s limited wildcarding which is all done by the broker (server-side). The broker puts the corresponding notification messages in our queue. The subscribed component downloads these notification messages. Once the notification message is downloaded, Sarracenia clients apply more flexible client side filtering using regular expressions.

Brief Introduction to Regular Expressions

Regular expressions are a very powerful way of expressing pattern matches. They provide extreme flexibility, but in these examples we will only use a very trivial subset: The . is a wildcard matching any single character. If it is followed by an occurrence count, it indicates how many letters will match the pattern. The * (asterisk) character, means any number of occurrences. So:

  • .* means any sequence of characters of any length. In other words, match anything.

  • cap.* means any sequence of characters that starts with cap.

  • .*CAP.* means any sequence of characters with CAP somewhere in it.

  • .*cap means any sequence of characters that ends with CAP. In the case where multiple portions of the string could match, the longest one is selected.

  • .*?cap same as above, but non-greedy, meaning the shortest match is chosen.

Please consult various internet resources for more information on the full variety of matching possible with regular expressions:

accept, reject and accept_unmatch

  • accept <regexp pattern> (optional)

  • reject <regexp pattern> (optional)

  • acceptUnmatched <boolean> (default: True)

  • baseUrl_relPath <boolean> (default: False)

The accept and reject options process regular expressions (regexp). The regexp is applied to the the notification message’s URL for a match.

If the notification message’s URL of a file matches a reject pattern, the notification message is acknowledged as consumed to the broker and skipped.

One that matches an accept pattern is processed by the component.

In many configurations, accept and reject options are mixed with the directory option. They then relate accepted notification messages to the directory value they are specified under.

After all accept / reject options are processed, normally the notification message is acknowledged as consumed and skipped. To override that default, set accept_unmatch to True. The accept/reject settings are interpreted in order. Each option is processed orderly from top to bottom. For example:

sequence #1:

reject .*\.gif
accept .*

sequence #2:

accept .*
reject .*\.gif

In sequence #1, all files ending in ‘gif’ are rejected. In sequence #2, the accept .* (which accepts everything) is encountered before the reject statement, so the reject has no effect.

It is best practice to use server side filtering to reduce the number of notification messages sent to the component to a small superset of what is relevant, and perform only a fine-tuning with the client side mechanisms, saving bandwidth and processing for all. More details on how to apply the directives follow:

Normally the relative path (appended to the base directory) for files which are downloaded will be set according to the relPath header included in the notification message. if baseUrl_relPath is set, however, the notification message’s relPath will be prepended with the sub-directories from the notification message’s baseUrl field.

NAMING QUEUES

While in most common cases, a good value is generated by the application, in some cases there may be a need to override those choices with an explicit user specification. To do that, one needs to be aware of the rules for naming queues:

  1. queue names start with q_

  2. this is followed by <amqpUserName> (the owner/user of the queue’s broker username)

  3. followed by a second underscore ( _ )

  4. followed by a string of the user’s choice.

The total length of the queue name is limited to 255 bytes of UTF-8 characters.

POSTING

Just as many components consumer a stream of notification messages, many components (often the same ones) also product an output stream of notification messages. To make files available to subscribers, a poster publishes the notification messages to an AMQP or MQTT server, also called a broker. The post_broker option sets all the credential information to connect to the output AMQP broker.

post_broker [amqp|mqtt]{s}://<user>:<pw>@<brokerhost>[:port]/<vhost>

Once connected to the source AMQP broker, the program builds notifications after the download of a file has occurred. To build the notification and send it to the next hop broker, the user sets these options :

  • post_baseDir <path> (optional)

  • post_topicPrefix <pfx> (default: ‘v03’)

  • post_exchange <name> (default: xpublic)

  • post_baseUrl <url> (MANDATORY)

FIXME: Examples of what these are for, what they do…

NAMING EXCHANGES

  1. Exchange names start with x

  2. Exchanges that end in public are accessible (for reading) by any authenticated user.

  3. Users are permitted to create exchanges with the pattern: xs_<amqpUserName>_<whatever> such exchanges can be written to only by that user.

  4. The system (sr_audit or administrators) create the xr_<amqpUserName> exchange as a place to send reports for a given user. It is only readable by that user.

  5. Administrative users (admin or feeder roles) can post or subscribe anywhere.

For example, xpublic does not have xs_ and a username pattern, so it can only be posted to by admin or feeder users. Since it ends in public, any user can bind to it to subscribe to notification messages posted. Users can create exchanges such as xs_<amqpUserName>_public which can be written to by that user (by rule 3), and read by others (by rule 2.) A description of the conventional flow of notification messages through exchanges on a pump. Subscribers usually bind to the xpublic exchange to get the main data feed. This is the default in sr_subscribe.

Another example, a user named Alice will have at least two exchanges:

  • xs_Alice the exhange where Alice posts her file notifications and report messages (via many tools).

  • xr_Alice the exchange where Alice reads her report messages from (via sr_shovel).

  • Alice can create a new exchange by just posting to it (with sr3_post or sr_cpost) if it meets the naming rules.

Usually an sr_sarra run by a pump administrator will read from an exchange such as xs_Alice_mydata, retrieve the data corresponding to Alice´s post notification message, and make it available on the pump, by re-announcing it on the xpublic exchange.

POLLING

Polling is doing the same job as a post, except for files on a remote server. In the case of a poll, the post will have its url built from the pollUrl option, with the product’s path (path/”matched file”). There is one post per file. The file’s size is taken from the directory “ls”… but its checksum cannot be determined, so the default identity method is “cod”, asking clients to calculate the identity Checksum On Download.

To set when to poll, use the scheduled_interval or scheduled_hour and scheduled_minute settings. for example:

scheduled_interval 30m

to poll the remote resources every thirty minutes. Alternatively:

scheduled_hour 1,13,19
scheduled_minute 27

specifies that poll be run at 1:27, 13:27, and 19:27 each day. Alternatively:

scheduled_time 15:30,20:30,23:15

will poll the remote resources at exactly 15:30, 16:30 and 23:15.

By default, sr_poll sends its post notification message to the broker with default exchange (the prefix xs_ followed by the broker username). The post_broker is mandatory. It can be given incomplete if it is well defined in the credentials.conf file.

Refer to sr3_post(1) - to understand the complete notification process. Refer to sr_post(7) - to understand the complete notification format.

These options set what files the user wants to be notified for and where

it will be placed, and under which name.

  • path <path> (default: .)

  • accept <regexp pattern> [rename=] (must be set)

  • reject <regexp pattern> (optional)

  • permDefault <integer> (default: 0o400)

  • fileAgeMax <duration> (default 30d)

nodupe_fileAgeMax should be less than nodupe_ttl when using duplicate suppression, to avoid re-ingesting of files that have aged out of the nodupe cache.

The option path defines where to get the files on the server. Combined with accept / reject options, the user can select the files of interest and their directories of residence.

The accept and reject options use regular expressions (regexp) to match URL. These options are processed sequentially. The URL of a file that matches a reject pattern is not published. Files matching an accept pattern are published.

The path can have some patterns. These supported patterns concern date/time . They are fixed…

${YYYY} current year ${MM} current month ${JJJ} current julian ${YYYYMMDD} current date

${YYYY-1D} current year - 1 day ${MM-1D} current month - 1 day ${JJJ-1D} current julian - 1 day ${YYYYMMDD-1D} current date - 1 day

ex.   path /mylocaldirectory/myradars
      path /mylocaldirectory/mygribs
      path /mylocaldirectory/${YYYYMMDD}/mydailies

      accept    .*RADAR.*
      reject    .*Reg.*
      accept    .*GRIB.*
      accept    .*observations.*

The permDefault option allows users to specify a linux-style numeric octal permission mask:

permDefault 040

means that a file will not be posted unless the group has read permission (on an ls output that looks like: —r—–, like a chmod 040 <file> command). The permDefault options specifies a mask, that is the permissions must be at least what is specified.

As with all other components, the vip option can be used to indicate that a poll should be active on only a single node in a cluster. Note that other nodes participating in the poll, when they don’t have the vip, will subscribe to the output of the poll to keep their duplicate suppression caches current.

files that are more than fileAgeMax are ignored. However, this can be modified to any specified time limit in the configurations by using the option fileAgeMax <duration>. By default in components other than poll, it is disabled by being set to zero (0). As it is a duration option, units are in seconds by default, but minutes, hours, days, and weeks, are available. In the poll component, fileAgeMax defaults to 30 days.

Advanced Polling

The built-in Poll lists remote directories and parses the lines returned building paramiko.SFTPAttributes structures (similar to os.stat) for each file listed. There is a wide variety of customization available because resources to poll are so disparate:

  • one can implement a sarracenia.flowcb callback with a poll routine to support such services, replacing the default poller.

  • Some servers have non-standard results when listing files, so one can subclass a sarracenia.flowcb.poll callback with the on_line entry point to normalize their responses and still be able to use the builtin polling flow.

  • There are many http servers that provide disparately formatted listings of files, so that sometimes rather than reformatting individual lines, a means of overriding the parsing of an entire page is needed. The on_html_page entry point in sarracenia.flowcb.poll can be modified by subclassing as well.

  • There are other servers that provide different services, not covered by the default poll. One can implement additional sarracenia.transfer classes to add understanding of them to poll.

The output of a poll is a list of notification messages built from the file names and SFTPAttributes records, which can then be filtered by elements after gather in the algorithm.

COMPONENTS

All the components do some combination of polling, consuming, and posting. with variations that accomplish either forwarding of notification messages or data transfers. The components all apply the same single algorithm, just starting from different default settings to match common use cases.

CPUMP

cpump* is an implementation of the shovel component in C. On an individual basis, it should be faster than a single python downloader, with some limitations.

  • doesn’t download data, only circulates posts. (shovel, not subscribe)

  • runs as only a single instance (no multiple instances).

  • does not support any plugins.

  • does not support vip for high availability.

  • different regular expression library: POSIX vs. python.

  • does not support regex for the strip command (no non-greedy regex).

It can therefore usually, but not always, act as a drop-in replacement for shovel and winnow

The C implementation may be easier to make available in specialized environments, such as High Performance Computing, as it has far fewer dependencies than the python version. It also uses far less memory for a given role. Normally the python version is recommended, but there are some cases where use of the C implementation is sensible.

sr_cpump connects to a broker (often the same as the posting broker) and subscribes to the notifications of interest. If _suppress_duplicates_ is active, on reception of a post, it looks up the notification message’s integity field in its cache. If it is found, the file has already come through, so the notification is ignored. If not, then the file is new, and the sum is added to the cache and the notification is posted.

FLOW

Flow is the parent class from which all of the other components except cpost and cpump are built. Flow has no built-in behaviour. Settings can make it act like any other python component, or it can be used to build user defined components. Typically used with the flowMain option to run a user defined flow subclass.

POLL

poll is a component that connects to a remote server to check in various directories for some files. When a file is present, modified or created in the remote directory, the program will notify about the new product.

The notification protocol is defined here sr_post(7)

poll connects to a broker. Every scheduled_interval seconds (or can used combination of scheduled_hour and scheduled_minute) , it connects to a pollUrl (sftp, ftp, ftps). For each of the directory defined, it lists the contents. Polling is only intended to be used for recently modified files. The fileAgeMax option eliminates files that are too old from consideration. When a file is found that matches a pattern given by accept, poll builds a notification message for that product.

The notification message is then checked in the duplicate cache (time limited by nodupe_ttl option.) to prevent posting of files which have already been seen.

poll can be used to acquire remote files in conjunction with an sarra subscribed to the posted notifications, to download and repost them from a data pump.

The pollUrl option specify what is needed to connect to the remote server

pollUrl protocol://<user>@<server>[:port]

::

(default: None and it is mandatory to set it )

The pollUrl should be set with the minimum required information… sr_poll uses pollUrl setting not only when polling, but also in the sr3_post notification messages produced.

For example, the user can set :

pollUrl ftp://myself@myserver

And complete the needed information in the credentials file with the line :

ftp://myself:mypassword@myserver:2121 passive,binary

Poll gathers information about remote files, to build notification messages about them. The gather method that is built-in uses sarracenia.transfer protocols, currently implemented are sftp, ftp, and http.

Repeated Scans and VIP

When multiple servers are being co-operating to poll a remote server, the vip setting is used to decide which server will actually poll. All servers participating subscribe to where poll is posting, and use the results to fill in the duplicate suppression cache, so that if the VIP moves, the alternate servers have current indications of what was posted.

POST or WATCH

sr3_post posts the availability of a file by creating an notification message. In contrast to most other sarracenia components that act as daemons, sr3_post is a one shot invocation which posts and exits. To make files available to subscribers, sr3_post sends the notification messages to an AMQP or MQTT server, also called a broker.

There are many options for detection changes in directories, for a detailed discussion of options in Sarracenia, see DetectFileReady.rst

This manual page is primarily concerned with the python implementation, but there is also an implementation in C, which works nearly identically. Differences:

  • plugins are not supported in the C implementation.

  • C implementation uses POSIX regular expressions, python3 grammar is slightly different.

  • when the sleep option ( used only in the C implementation) is set to > 0, it transforms sr_cpost into a daemon that works like watch.

The watch component is used to monitor directories for new files. It is equivalent to post (or cpost) with the sleep option set to >0.

The [-pbu|–post_baseUrl url,url,…] option specifies the location subscribers will download the file from. There is usually one post per file. Format of argument to the post_baseUrl option:

[ftp|http|sftp]://[user[:password]@]host[:port]/
or
file:

When several urls are given as a comma separated list to post_baseUrl, the url´s provided are used round-robin style, to provide a coarse form of load balancing.

The [-p|–path path1 path2 .. pathN] option specifies the path of the files to be announced. There is usually one post per file. Format of argument to the path option:

/absolute_path_to_the/filename
or
relative_path_to_the/filename

The -pipe option can be specified to have sr3_post read path names from standard input as well.

An example invocation of sr3_post:

sr3_post -pb amqp://broker.com -pbu sftp://stanley@mysftpserver.com/ -p /data/shared/products/foo

By default, sr3_post reads the file /data/shared/products/foo and calculates its checksum. It then builds a notification message, logs into broker.com as user ‘guest’ (default credentials) and sends the post to defaults vhost ‘/’ and default exchange. The default exchange is the prefix xs_ followed by the broker username, hence defaulting to ‘xs_guest’. A subscriber can download the file /data/shared/products/foo by authenticating as user stanley on mysftpserver.com using the sftp protocol to broker.com assuming he has proper credentials. The output of the command is as follows

[INFO] Published xs_guest v03.data.shared.products.foo '20150813161959.854 sftp://stanley@mysftpserver.com/ /data/shared/products/foo' sum=d,82edc8eb735fd99598a1fe04541f558d parts=1,4574,1,0,0

In MetPX-Sarracenia, each post is published under a certain topic. The log line starts with ‘[INFO]’, followed by the topic of the post. Topics in AMQP are fields separated by dot. Note that MQTT topics use a slash (/) as the topic separator. The complete topic starts with a topicPrefix (see option), version v03, followed by a subtopic (see option) here the default, the file path separated with dots data.shared.products.foo.

The second field in the log line is the notification message notice. It consists of a time stamp 20150813161959.854, and the source URL of the file in the last 2 fields.

The rest of the information is stored in AMQP message headers, consisting of key=value pairs. The sum=d,82edc8eb735fd99598a1fe04541f558d header gives file fingerprint (or checksum ) information. Here, d means md5 checksum performed on the data, and 82edc8eb735fd99598a1fe04541f558d is the checksum value. The parts=1,4574,1,0,0 state that the file is available in 1 part of 4574 bytes (the filesize.) The remaining 1,0,0 is not used for transfers of files with only one part.

Another example:

sr3_post -pb amqp://broker.com -pbd /data/web/public_data -pbu http://dd.weather.gc.ca/ -p bulletins/alphanumeric/SACN32_CWAO_123456

By default, sr3_post reads the file /data/web/public_data/bulletins/alphanumeric/SACN32_CWAO_123456 (concatenating the post_baseDir and relative path of the source url to obtain the local file path) and calculates its checksum. It then builds a notification message, logs into broker.com as user ‘guest’ (default credentials) and sends the post to defaults vhost ‘/’ and exchange ‘xs_guest’.

A subscriber can download the file http://dd.weather.gc.ca/bulletins/alphanumeric/SACN32_CWAO_123456 using http without authentication on dd.weather.gc.ca.

File Partitioning

use of the blocksize option has no effect in sr3. It is used to do file partitioning, and it will become effective again in the future, with the same semantics.

SARRA

sarra is a program that Subscribes to file notifications, Acquires the files and ReAnnounces them at their new locations. The notification protocol is defined here sr_post(7)

sarra connects to a broker (often the same as the remote file server itself) and subscribes to the notifications of interest. It uses the notification information to download the file on the local server it’s running on. It then posts a notification for the downloaded files on a broker (usually on the local server).

sarra can be used to acquire files from sr3_post(1) or watch or to reproduce a web-accessible folders (WAF), that announce its products.

The sr_sarra is an sr_subscribe(1) with the following presets:

mirror True

Specific Consuming Requirements

If the notification messages are posted directly from a source, the exchange used is ‘xs_<brokerSourceUsername>’. To protect against malicious users, administrators should set sourceFromExchange to True. Such notification messages may not contain a source nor an origin cluster fields or a malicious user may set the values incorrectly.

  • sourceFromExchange <boolean> (default: False)

Upon reception, the program will set these values in the parent class (here cluster is the value of option cluster taken from default.conf):

msg[‘source’] = <brokerUser> msg[‘from_cluster’] = cluster

overriding any values present in the notification message. This setting should always be used when ingesting data from a user exchange.

SENDER

sender is a component derived from subscribe used to send local files to a remote server using a file transfer protocol, primarily SFTP. sender is a standard consumer, using all the normal AMQP settings for brokers, exchanges, queues, and all the standard client side filtering with accept, reject, and after_accept.

Often, a broker will announce files using a remote protocol such as HTTP, but for the sender it is actually a local file. In such cases, one will see a notification message: ERROR: The file to send is not local. An after_accept plugin will convert the web url into a local file one:

baseDir /var/httpd/www
flowcb sarracenia.flowcb.tolocalfile.ToLocalFile

This after_accept plugin is part of the default settings for senders, but one still needs to specify baseDir for it to function.

If a post_broker is set, sender checks if the clustername given by the to option if found in one of the notification message’s destination clusters. If not, the notification message is skipped.

SETUP 1 : PUMP TO PUMP REPLICATION

For pump replication, mirror is set to True (default).

baseDir supplies the directory path that, when combined with the relative one in the selected notification gives the absolute path of the file to be sent. The default is None which means that the path in the notification is the absolute one.

In a subscriber, the baseDir represents the prefix to the relative path on the upstream server, and is used as a pattern to be replaced in the currently selected base directory (from a baseDir or directory option) in the notification message fields: ‘fileOp’, which are used when mirroring symbolic links, or files that are renamed.

The sendTo defines the protocol and server to be used to deliver the products. Its form is a partial url, for example: ftp://myuser@myhost The program uses the file ~/.conf/sarra/credentials.conf to get the remaining details (password and connection options). Supported protocol are ftp, ftps and sftp. Should the user need to implement another sending mechanism, he would provide the plugin script through option do_send.

On the remote site, the post_baseDir serves the same purpose as the baseDir on this server. The default is None which means that the delivered path is the absolute one.

Now we are ready to send the product… for example, if the selected notification looks like this :

20150813161959.854 http://this.pump.com/ relative/path/to/IMPORTANT_product

sr_sender performs the following pseudo-delivery:

Sends local file [baseDir]/relative/path/to/IMPORTANT_product to sendTo/[post_baseDir]/relative/path/to/IMPORTANT_product (kbytes_ps is greater than 0, the process attempts to respect this delivery speed… ftp,ftps,or sftp)

At this point, a pump-to-pump setup needs to send the remote notification… (If the post_broker is not set, there will be no posting… just products replication)

The selected notification contains all the right information (topic and header attributes) except for url field in the notice… in our example : http://this.pump.com/

By default, sr_sender puts the sendTo in that field. The user can overwrite this by specifying the option post_baseUrl. For example:

post_baseUrl http://remote.apache.com

The user can provide an on_post script. Just before the notification message is published on the post_broker and post_exchange, the on_post script is called… with the sr_sender class instance as an argument. The script can perform whatever you want… if it returns False, the notification message will not be published. If True, the program will continue processing from there.

FIXME: Missing example configuration.

DESTINATION SETUP 2 : METPX-SUNDEW LIKE DISSEMINATION

In this type of usage, we would not usually repost… but if the post_broker and post_exchange (url,**on_post**) are set, the product will be announced (with its possibly new location and new name). Let’s reintroduce the options in a different order with some new ones to ease explanation.

There are 2 differences with the previous case : the directory, and the filename options.

The baseDir is the same, and so are the sendTo and the post_baseDir options.

The directory option defines another “relative path” for the product at its sendTo. It is tagged to the accept options defined after it. If another sequence of directory/accept follows in the configuration file, the second directory is tagged to the following accepts and so on.

The accept/reject patterns apply to notification message notice url as above. Here is an example, here some ordered configuration options :

directory /my/new/important_location

accept .*IMPORTANT.*

directory /my/new/location/for_others

accept .*

If the notification selected is, as above, this :

20150813161959.854 http://this.pump.com/ relative/path/to/IMPORTANT_product

It was selected by the first accept option. The remote relative path becomes /my/new/important_location … and sr_sender performs the following pseudo-delivery:

sends local file [baseDir]/relative/path/to/IMPORTANT_product to sendTo/[post_baseDir]/my/new/important_location/IMPORTANT_product

Usually this way of using sr_sender would not require posting of the product. But if post_broker and post_exchange are provided, and url , as above, is set to http://remote.apache.com, then sr_sender would reconstruct :

Topic: v03.my.new.important_location.IMPORTANT_product

Notice: 20150813161959.854 http://remote.apache.com/ my/new/important_location/IMPORTANT_product

SHOVEL

shovel copies notification messages on one broker (given by the broker option) to another (given by the post_broker option.) subject to filtering by (exchange, subtopic, and optionally, accept/reject.)

The topicPrefix option must to be set to:

shovel is a flow with the following presets:

no-download True
suppress_duplicates off

SUBSCRIBE

Subscribe is the normal downloading flow component, that will connect to a broker, download the configured files, and then forward the notification messages with an altered baseUrl.

WATCH

Watches a directory and publishes posts when files in the directory change ( added, modified, or deleted). Its arguments are very similar to sr3_post. In the MetPX-Sarracenia suite, the main goal is to post the availability and readiness of one’s files. Subscribers use sr_subscribe to consume the post and download the files.

Posts are sent to an AMQP server, also called a broker, specified with the option [ -pb|–post_broker broker_url ].

The [-post_baseUrl|–pbu|–url url] option specifies the protocol, credentials, host and port to which subscribers will connect to get the file.

Format of argument to the url option:

[ftp|http|sftp]://[user[:password]@]host[:port]/
or
[ftp|http|sftp]://[user[:password]@]host[:port]/
or
file:

The [-p|–path path] option tells watch what to look for. If the path specifies a directory, watches creates a post for any time a file in that directory is created, modified or deleted. If the path specifies a file, watch watches only that file. In the notification message, it is specified with the path of the product. There is usually one post per file.

FIXME: in version 3 does it work at all without a configuration? perhaps we should just create the file if it isn’t there?

An example of an execution of watch checking a file:

sr3 --post_baseUrl sftp://stanley@mysftpserver.com/ --path /data/shared/products/foo --post_broker amqp://broker.com start watch/myflow

Here, watch checks events on the file /data/shared/products/foo. Default events settings reports if the file is modified or deleted. When the file gets modified, watch reads the file /data/shared/products/foo and calculates its checksum. It then builds a notification message, logs into broker.com as user ‘guest’ (default credentials) and sends the post to defaults vhost ‘/’ and post_exchange ‘xs_stanley’ (default exchange)

A subscriber can download the file /data/shared/products/foo by logging as user stanley on mysftpserver.com using the sftp protocol to broker.com assuming he has proper credentials.

The output of the command is as follows

[INFO] v03.data.shared.products.foo '20150813161959.854 sftp://stanley@mysftpserver.com/ /data/shared/products/foo'
      source=guest parts=1,256,1,0,0 sum=d,fc473c7a2801babbd3818260f50859de

In MetPX-Sarracenia, each post is published under a certain topic. After the ‘[INFO]’ the next information gives the fBtopic* of the post. Topics in AMQP are fields separated by dot. In MetPX-Sarracenia it is made of a topicPrefix by default : version v03 , followed by the subtopic by default : the file path separated with dots, here, data.shared.products.foo

After the topic hierarchy comes the body of the notification. It consists of a time 20150813161959.854 , and the source url of the file in the last 2 fields.

The remaining line gives informations that are placed in the amqp message header. Here it consists of source=guest , which is the amqp user, parts=1,256,0,0,1 , which suggests to download the file in 1 part of 256 bytes (the actual filesize), trailing 1,0,0 gives the number of block, the remaining in bytes and the current block. sum=d,fc473c7a2801babbd3818260f50859de mentions checksum information,

here, d means md5 checksum performed on the data, and fc473c7a2801babbd3818260f50859de is the checksum value. When the event on a file is a deletion, sum=R,0 R stands for remove.

Another example watching a file:

sr3 --post_baseDir /data/web/public_data --post_baseUrl http://dd.weather.gc.ca/ --path bulletins/alphanumeric/SACN32_CWAO_123456 -post_broker amqp://broker.com start watch/myflow

By default, watch checks the file /data/web/public_data/bulletins/alphanumeric/SACN32_CWAO_123456 (concatenating the baseDir and relative path of the source url to obtain the local file path). If the file changes, it calculates its checksum. It then builds a notification message, logs into broker.com as user ‘guest’ (default credentials) and sends the post to defaults vhost ‘/’ and post_exchange ‘sx_guest’ (default post_exchange)

A subscriber can download the file http://dd.weather.gc.ca/bulletins/alphanumeric/SACN32_CWAO_123456 using http without authentication on dd.weather.gc.ca.

An example checking a directory:

sr3 -post_baseDir /data/web/public_data -post_baseUrl http://dd.weather.gc.ca/ --path bulletins/alphanumeric --post_broker amqp://broker.com start watch/myflow

Here, watch checks for file creation(modification) in /data/web/public_data/bulletins/alphanumeric (concatenating the baseDir and relative path of the source url to obtain the directory path). If the file SACN32_CWAO_123456 is being created in that directory, watch calculates its checksum. It then builds a notification message, logs into broker.com as user ‘guest’

A subscriber can download the created/modified file http://dd.weather.gc.ca/bulletins/alphanumeric/SACN32_CWAO_123456 using http without authentication on dd.weather.gc.ca.

WINNOW

the winnow component subscribes to file notification messages and reposts them, suppressing redundant ones. How to decide which ones are redundant varies by use case. In the most straight-forward case, the messages have Identity header stores a file’s fingerprint as described in the sr_post(7) man page, and header is used exclusively. There are many other use cases, though. discussed in the following section on Duplicate Suppression

winnow has the following options forced:

no-download True
suppress_duplicates on
accept_unmatch True

The suppress_duplicates lifetime can be adjusted, but it is always on. Other components can use the same duplicate suppression logic. The use of winnow components is purely to make it easier for analysts to understand a configuration at a glance.

winnow connects to a broker (often the same as the posting broker) and subscribes to the notifications of interest. On reception of a notification, it looks up its sum in its cache. If it is found, the file has already come through, so the notification is ignored. If not, then the file is new, and the sum is added to the cache and the notification is posted.

winnow can be used to trim notification messages produced by post, sr3_post, sr3_cpost, poll or watch etc… It is used when there are multiple sources of the same data, so that clients only download the source data once, from the first source that posted it.

Configurations

If one has a ready made configuration called q_f71.conf, it can be added to the list of known ones with:

subscribe add q_f71.conf

In this case, xvan_f14 is included with examples provided, so add finds it in the examples directory and copies into the active configuration one. Each configuration file manages the consumers for a single queue on the broker. To view the available configurations, use:

$ subscribe list

  configuration examples: ( /usr/lib/python3/dist-packages/sarra/examples/subscribe )
            all.conf     all_but_cap.conf            amis.conf            aqhi.conf             cap.conf      cclean_f91.conf
      cdnld_f21.conf       cfile_f44.conf        citypage.conf       clean_f90.conf            cmml.conf cscn22_bulletins.conf
        ftp_f70.conf            gdps.conf         ninjo-a.conf           q_f71.conf           radar.conf            rdps.conf
           swob.conf           t_f30.conf      u_sftp_f60.conf

  user plugins: ( /home/peter/.config/sarra/plugins )
        destfn_am.py         destfn_nz.py       msg_tarpush.py

  general: ( /home/peter/.config/sarra )
          admin.conf     credentials.conf         default.conf

  user configurations: ( /home/peter/.config/sarra/subscribe )
     cclean_f91.conf       cdnld_f21.conf       cfile_f44.conf       clean_f90.conf         ftp_f70.conf           q_f71.conf
          t_f30.conf      u_sftp_f60.conf

one can then modify it using:

$ subscribe edit q_f71.conf

(The edit action uses the EDITOR environment variable, if present.) Once satisfied, one can start the the configuration running:

$ subscibe foreground q_f71.conf

What goes into the files? See next section:

  • The forward slash (/) as the path separator in Sarracenia configuration files on all operating systems. Use of the backslash character as a path separator (as used in the cmd shell on Windows) may not work properly. When files are read on Windows, the path separator is immediately converted to the forward slash, so all pattern matching, in accept, reject, strip etc… directives should use forward slashes when a path separator is needed.

Example:

directory A
accept X

Places files matching X in directory A.

vs::

accept X directory A

Places files matching X in the current working directory, and the directory A setting does nothing in relation to X.

To provide non-functional descriptions of configurations, or comments, use lines that begin with a #.

All options are case sensitive. Debug is not the same as debug nor DEBUG. Those are three different options (two of which do not exist and will have no effect, but should generate an ´unknown option warning´).

Options and command line arguments are equivalent. Every command line argument has a corresponding long version starting with ‘–’. For example -u has the long form –url. One can also specify this option in a configuration file. To do so, use the long form without the ‘–’, and put its value separated by a space. The following are all equivalent:

  • url <url>

  • -u <url>

  • –url <url>

Settings are interpreted in order. Each file is read from top to bottom. For example:

sequence #1:

reject .*\.gif
accept .*

sequence #2:

accept .*
reject .*\.gif

Note

FIXME: does this match only files ending in ‘gif’ or should we add a $ to it? will it match something like .gif2 ? is there an assumed .* at the end?

In sequence #1, all files ending in ‘gif’ are rejected. In sequence #2, the accept .* (which accepts everything) is encountered before the reject statement, so the reject has no effect. Some options have global scope, rather than being interpreted in order. for thoses cases, a second declaration overrides the first.

Options to be reused in different config files can be grouped in an include file:

  • –include <includeConfigPath>

The includeConfigPath would normally reside under the same config dir of its master configs. If a URL is supplied as an includeConfigPATH, then a remote configuraiton will be downloaded and cached (used until an update on the server is detected.) See Remote Configurations for details.

Environment variables, and some built-in variables can also be put on the right hand side to be evaluated, surrounded by ${..} The built-in variables are:

  • ${BROKER_USER} - the user name for authenticating to the broker (e.g. anonymous)

  • ${PROGRAM} - the name of the component (subscribe, shovel, etc…)

  • ${CONFIG} - the name of the configuration file being run.

  • ${HOSTNAME} - the hostname running the client.

  • ${RANDID} - a random id that will be consistent within a single invocation.

  • ${INSTANCE} - The instance id of the running flow.

flowCallbacks

Sarracenia makes extensive use of small python code snippets that customize processing called flowCallback Flow_callbacks define and use additional settings:

flowCallback sarracenia.flowcb.log.Log

There is also a shorter form to express the same thing:

callback log

Either way, the module refers to the sarracenia/flowcb/msg/log.py file in the installed package. In that file, the Log class is the one searched for entry points. The log.py file included in the package is like this:

from sarracenia.flowcb import FlowCB
import logging

logger = logging.getLogger( __name__ )

class Log(Plugin):

  def after_accept(self,worklist):
      for msg in worklist.incoming:
          logger.info( "msg/log received: %s " % msg )
      return worklist

It’s a normal python class, declared as a child of the sarracenia.flowcb.FlowCB class. The methods (function names) in the plugin describe when those routines will be called. For more details consult the Programmer’s Guide

To add special processing of notification messages, create a module in the python path, and have it include entry points.

There is also flowCallbackPrepend which adds a flowCallback class to the front of the list (which determines relative execution order among flowCallback classes.)

callback options

callbacks that are delivered with metpx-sr3 follow the following convention:

  • they are placed in the sarracenia/flowcb directory tree.

  • the name of the primary class is the same as the name of file containing it.

so we provide the following syntactic sugar:

callback log    (is equivalent to *flowCallback sarracenia.flowcb.log.Log* )

There is similarly a callback_prepend to fill in.

Importing Extensions

The import option works in a way familiar to Python developers, Making them available for use by the Sarracenia core, or flowCallback. Developers can add additional protocols for notification messages or file transfer. For example:

import torr

would be a reasonable name for a Transfer protocol to retrieve resources with bittorrent protocol. A skeleton of such a thing would look like this:

import logging

logger = logging.getLogger(__name__)

import sarracenia.transfer

class torr(sarracenia.transfer.Transfer):
    pass

logger.warning("loading")

For more details on implementing extensions, consult the Programmer’s Guide

Deprecated v2 plugins

There is and older (v2) style of plugins as well. That are usually prefixed with the name of the plugin:

msg_to_clusters DDI
msg_to_clusters DD

on_message msg_to_clusters

A setting ‘msg_to_clusters’ is needed by the msg_to_clusters plugin referenced in the on_message the v2 plugins are a little more cumbersome to write. They are included here for completeness, but people should use version 3 (either flowCallback, or extensions discussed next) when possible.

Reasons to use newer style plugins:

  • Support for running v2 plugins is accomplished using a flowcb called v2wrapper. It performs a lot of processing to wrap up the v3 data structures to look like v2 ones, and then has to propagate the changes back. It’s a bit expensive.

  • newer style extensions are ordinary python modules, unlike v2 ones which require minor magic incantations.

  • when a v3 (flowCallback or imported) module has a syntax error, all the tools of the python interpreter apply, providing a lot more feedback is given to the coder. with v2, it just says there is something wrong, much more difficult to debug.

  • v3 api is strictly more powerful than v2, as it works on groups of notification messages, rather than individual ones.

Environment Variables

On can also reference environment variables in configuration files, using the ${ENV} syntax. If Sarracenia routines needs to make use of an environment variable, then they can be set in configuration files:

declare env HTTP_PROXY=localhost

LOGS and MONITORING

  • debug

    Setting option debug is identical to use logLevel debug

  • logMessageDump (default: off) boolean flag if set, all fields of a notification message are printed, rather than just a url/path reference.

  • logEvents ( default after_accept,after_work,on_housekeeping )

    emit standard log messages at the given points in notification message processing. other values: on_start, on_stop, post, gather, … etc…

  • logLevel ( default: info )

    The level of logging as expressed by python’s logging. Possible values are : critical, error, info, warning, debug.

  • –logStdout ( default: False ) EXPERIMENTAL FOR DOCKER use case

    The logStdout disables log management. Best used on the command line, as there is some risk of creating stub files before the configurations are completely parsed:

    sr3 --logStdout start
    

    All launched processes inherit their file descriptors from the parent. so all output is like an interactive session.

    This is in contrast to the normal case, where each instance takes care of its logs, rotating and purging periodically. In some cases, one wants to have other software take care of logs, such as in docker, where it is preferable for all logging to be to standard output.

    It has not been measured, but there is a reasonable likelihood that use of logStdout with large configurations (dozens of configured instances/processes) will cause either corruption of logs, or limit the speed of execution of all processes writing to stdout.

  • log_reject <True|False> ( default: False )

    print a log message when rejecting notification messages (choosing not to download the corresponding files)

  • log <dir> ( default: ~/.cache/sarra/log ) (on Linux)

    The directory to store log files in.

  • statehost <False|True> ( default: False )

    In large data centres, the home directory can be shared among thousands of nodes. Statehost adds the node name after the cache directory to make it unique to each node. So each node has it’s own statefiles and logs. example, on a node named goofy, ~/.cache/sarra/log/ becomes ~/.cache/sarra/goofy/log.

  • logRotateCount <max_logs> ( default: 5 )

    Maximum number of logs archived.

  • logRotateInterval <duration>[<time_unit>] ( default: 1d = 86,400s )

    The duration of the interval with an optional time unit (ie 5m, 2h, 3d)

  • permLog ( default: 0600 )

    The permission bits to set on log files.

See the Subscriber Guide <../How2Guides/subscriber.rst> for a more detailed discussion of logging options and techniques.

CREDENTIALS

One normally does not specify passwords in configuration files. Rather they are placed in the credentials file:

edit ~/.config/sr3/credentials.conf

For every url specified that requires a password, one places a matching entry in credentials.conf. The broker option sets all the credential information to connect to the RabbitMQ server

  • broker amqp{s}://<user>:<pw>@<brokerhost>[:port]/<vhost>

(default: amqps://anonymous:anonymous@dd.weather.gc.ca/ )

For all sarracenia programs, the confidential parts of credentials are stored only in ~/.config/sarra/credentials.conf. This includes the url and broker passwords and settings needed by components. The format is one entry per line. Examples:

  • amqp://user1:password1@host/

  • amqps://user2:password2@host:5671/dev

  • amqps://usern:passwd@host/ login_method=PLAIN

  • sftp://user5:password5@host

  • sftp://user6:password6@host:22 ssh_keyfile=/users/local/.ssh/id_dsa

  • ftp://user7:password7@host passive,binary

  • ftp://user8:password8@host:2121 active,ascii

  • ftps://user7:De%3Aize@host passive,binary,tls

  • ftps://user8:%2fdot8@host:2121 active,ascii,tls,prot_p

  • https://ladsweb.modaps.eosdis.nasa.gov/ bearer_token=89APCBF0-FEBE-11EA-A705-B0QR41911BF4

In other configuration files or on the command line, the url simply lacks the password or key specification. The url given in the other files is looked up in credentials.conf.

Credential Details

You may need to specify additional options for specific credential entries. These details can be added after the end of the URL, with multiple details separated by commas (see examples above).

Supported details:

  • ssh_keyfile=<path> - (SFTP) Path to SSH keyfile

  • passive - (FTP) Use passive mode

  • active - (FTP) Use active mode

  • binary - (FTP) Use binary mode

  • ascii - (FTP) Use ASCII mode

  • ssl - (FTP) Use SSL/standard FTP

  • tls - (FTP) Use FTPS with TLS

  • prot_p - (FTPS) Use a secure data connection for TLS connections (otherwise, clear text is used)

  • bearer_token=<token> (or bt=<token>) - (HTTP) Bearer token for authentication

  • login_method=<PLAIN|AMQPLAIN|EXTERNAL|GSSAPI> - (AMQP) By default, the login method will be automatically determined. This can be overriden by explicity specifying a login method, which may be required if a broker supports multiple methods and an incorrect one is automatically selected.

Note::

SFTP credentials are optional, in that sarracenia will look in the .ssh directory and use the normal SSH credentials found there.

These strings are URL encoded, so if an account has a password with a special character, its URL encoded equivalent can be supplied. In the last example above, %2f means that the actual password isi: /dot8 The next to last password is: De:olonize. ( %3a being the url encoded value for a colon character. )

PERIODIC PROCESSING

Most processing occurs on receipt of a notification message, but there is some periodic maintenance work that happens every housekeeping interval (default is 5 minutes.) Evey housekeeping, all of the configured on_housekeeping plugins are run. By default there are three present:

  • log - prints “housekeeping” in the log.

  • nodupe - ages out old entries in the reception cache, to minimize its size.

  • memory - checks the process memory usage, and restart if too big.

The log will contain notification messages from all three plugins every housekeeping interval, and if additional periodic processing is needed, the user can configure addition plugins to run with the on_housekeeping option.

sanity_log_dead <interval> (default: 1.5*housekeeping)

The sanity_log_dead option sets how long to consider too long before restarting a component.

nodup_ttl <off|on|999> (default: off)

The cleanup of expired elements in the duplicate suppression store happens at each housekeeping.

ERROR RECOVERY

The tools are meant to work well unattended, and so when transient errors occur, they do their best to recover elegantly. There are timeouts on all operations, and when a failure is detected, the problem is noted for retry. Errors can happen at many times:

  • Establishing a connection to the broker.

  • losing a connection to the broker

  • establishing a connection to the file server for a file (for download or upload.)

  • losing a connection to the server.

  • during data transfer.

Initially, the programs try to download (or send) a file a fixed number (attempts, default: 3) times. If all three attempts to process the file are unsuccessful, then the file is placed in an instance’s retry file. The program then continues processing of new items. When there are no new items to process, the program looks for a file to process in the retry queue. It then checks if the file is so old that it is beyond the retry_expire (default: 2 days). If the file is not expired, then it triggers a new round of attempts at processing the file. If the attempts fail, it goes back on the retry queue.

This algorithm ensures that programs do not get stuck on a single bad product that prevents the rest of the queue from being processed, and allows for reasonable, gradual recovery of service, allowing fresh data to flow preferentially, and sending old data opportunistically when there are gaps.

While fast processing of good data is very desirable, it is important to slow down when errors start occurring. Often errors are load related, and retrying quickly will just make it worse. Sarracenia uses exponential back-off in many points to avoid overloading a server when there are errors. The back-off can accumulate to the point where retries could be separated by a minute or two. Once the server begins responding normally again, the programs will return to normal processing speed.

If a failure will last for a while, one can stop the flow, configure attempts 0 to fill the retry queue without making vain attempts to download or send. At the end of the outage, return attempts to normal, and the retry queue will gradually be drained when there is room in the current data flow.

EXAMPLES

Here is a short complete example configuration file:

broker amqps://dd.weather.gc.ca/

subtopic model_gem_global.25km.grib2.#
accept .*

This above file will connect to the dd.weather.gc.ca broker, connecting as anonymous with password anonymous (defaults) to obtain notification messages about files in the http://dd.weather.gc.ca/model_gem_global/25km/grib2 directory. All files which arrive in that directory or below it will be downloaded into the current directory (or just printed to standard output if -n option was specified.)

A variety of example configuration files are available here:

QUEUES and MULTIPLE STREAMS

When executed, subscribe chooses a queue name, which it writes to a file named after the configuration file given as an argument to subscribe with a .queue suffix ( .”configfile”.queue). If subscribe is stopped, the posted notification messages continue to accumulate on the broker in the queue. When the program is restarted, it uses the queuename stored in that file to connect to the same queue, and not lose any notification messages.

File downloads can be parallelized by running multiple subscribes using the same queue. The processes will share the queue and each download part of what has been selected. Simply launch multiple instances of subscribe in the same user/directory using the same configuration file.

You can also run several subscribe with different configuration files to have multiple download streams delivering into the the same directory, and that download stream can be multi-streamed as well.

Note

While the brokers keep the queues available for some time, Queues take resources on brokers, and are cleaned up from time to time. A queue which is not accessed for a long (implementation dependent) period will be destroyed. A queue which is not accessed and has too many (implementation defined) files queued will be destroyed. Processes which die should be restarted within a reasonable period of time to avoid loss of notifications.

report and report_exchange

For each download, by default, an amqp report message is sent back to the broker. This is done with option :

  • report <boolean> (default: True)

  • report_exchange <report_exchangename> (default: xreport|xs_*username* )

When a report is generated, it is sent to the configured report_exchange. Administrative components post directly to xreport, whereas user components post to their own exchanges (xs_*username*). The report daemons then copy the notification messages to xreport after validation.

These reports are used for delivery tuning and for data sources to generate statistical information. Set this option to False, to prevent generation of reports.

INSTANCES

Sometimes one instance of a component and configuration is not enough to process & send all available notifications.

instances <integer> (default:1)

The instance option allows launching several instances of a component and configuration. When running sender for example, a number of runtime files are created. In the ~/.cache/sarra/sender/configName directory:

A .sender_configname.state         is created, containing the number instances.
A .sender_configname_$instance.pid is created, containing the PID  of $instance process.

In directory ~/.cache/sarra/log:

A .sender_configname_$instance.log  is created as a log of $instance process.

Note

known bug in the management interface sr <sr.8.rst>_ means that instance should always be in the .conf file (not a .inc) and should always be an number (not a substituted variable or other more complex value.

Note

FIXME: indicate Windows location also… dot files on Windows?

Note

While the brokers keep the queues available for some time, Queues take resources on brokers, and are cleaned up from time to time. A queue which is not accessed and has too many (implementation defined) files queued will be destroyed. Processes which die should be restarted within a reasonable period of time to avoid loss of notifications. A queue which is not accessed for a long (implementation dependent) period will be destroyed.

vip - ACTIVE/PASSIVE OPTIONS

sr3 can be used on a single server node, or multiple nodes could share responsibility. Some other, separately configured, high availability software presents a vip (virtual ip) on the active server. Should the server go down, the vip is moved on another server. Both servers would run sr3. It is for that reason that the following options were implemented:

  • vip <string> (None)

When you run only one sr3 on one server, these options are not set, and sr3 will run in ‘standalone mode’.

In the case of clustered brokers, you would set the options for the moving vip.

vip 153.14.126.3

When sr3 does not find the vip, it sleeps for 5 seconds and retries. If it does find the vip, it consumes and processes a notification message and than rechecks for the vip. if multiple vips are specified, then posession of any of the vips is sufficient to be declared active.

[–blocksize <value>] (default: 0 (auto))

This blocksize option controls the partitioning strategy used to post files. The value should be one of:

0 - autocompute an appropriate partitioning strategy (default)
1 - always send entire files in a single part.
<blocksize> - used a fixed partition size (example size: 1M )

Files can be announced as multiple parts. Each part has a separate checksum. The parts and their checksums are stored in the cache. Partitions can traverse the network separately, and in parallel. When files change, transfers are optimized by only sending parts which have changed.

The outlet option allows the final output to be other than a post. See sr3_cpump(1) for details.

[-pbd|–post_baseDir <path>] (optional)

The post_baseDir option supplies the directory path that, when combined (or found) in the given path, gives the local absolute path to the data file to be posted. The post_baseDir part of the path will be removed from the posted notification message. For sftp urls it can be appropriate to specify a path relative to a user account. Example of that usage would be: -pbd ~user -url sftp:user@host For file: url’s, baseDir is usually not appropriate. To post an absolute path, omit the -pbd setting, and just specify the complete path as an argument.

post_baseUrl <url> (MANDATORY)

The post_baseUrl option sets how to get the file… it defines the protocol, host, port, and optionally, the user. It is best practice to not include passwords in urls.

post_exchange <name> (default: xpublic)

The post_exchange option set under which exchange the new notification will be posted. In most cases it is ‘xpublic’.

Whenever a publish happens for a product, a user can set to trigger a script. The option on_post would be used to do such a setup.

post_exchangeSplit <number> (default: 0)

The post_exchangeSplit option appends a two digit suffix resulting from hashing the path to the post_exchange name, in order to divide the output amongst a number of exchanges. This is currently used in high traffic pumps to allow multiple instances of winnow, which cannot be instanced in the normal way. Example:

post_exchangeSplit 5
post_exchange xwinnow

will result in posting notification messages to five exchanges named: xwinnow00, xwinnow01, xwinnow02, xwinnow03 and xwinnow04, where each exchange will receive only one fifth of the total flow.

Remote Configurations

One can specify URI’s as configuration files, rather than local files. Example:

  • –config http://dd.weather.gc.ca/alerts/doc/cap.conf

On startup, subscribe checks if the local file cap.conf exists in the local configuration directory. If it does, then the file will be read to find a line like so:

  • –remote_config_url http://dd.weather.gc.ca/alerts/doc/cap.conf

In which case, it will check the remote URL and compare the modification time of the remote file against the local one. The remote file is not newer, or cannot be reached, then the component will continue with the local file.

If either the remote file is newer, or there is no local file, it will be downloaded, and the remote_config_url line will be prepended to it, so that it will continue to self-update in future.

Extensions

One can override or add functionality with python scripting.

Sarracenia comes with a variety of example plugins, and uses some to implement base functionality, such as logging (implemented by default use of msg_log, file_log, post_log plugins):

$ sr3 list fcb
Provided callback classes: ( /home/peter/Sarracenia/sr3/sarracenia )
flowcb/filter/deleteflowfiles.py flowcb/filter/fdelay.py          flowcb/filter/pclean_f90.py      flowcb/filter/pclean_f92.py
flowcb/gather/file.py            flowcb/gather/message.py         flowcb/line_log.py               flowcb/line_mode.py
flowcb/log.py                    flowcb/nodupe.py                 flowcb/pclean.py                 flowcb/post/message.py
flowcb/retry.py                  flowcb/sample.py                 flowcb/script.py                 flowcb/v2wrapper.py
flowcb/work/rxpipe.py
$

Users can place their own scripts in the script sub-directory of their config directory tree ( on Linux, the ~/.config/sarra/plugins).

flowCallback and flowCallbackPrepend <class>

The flowCallback directive takes a class to load can scan for entry points as an argument:

flowCallback sarracenia.flowcb.log.Log

With this directive in a configuration file, the Log class found in installed package will be used. That module logs messages after_accept (after notification messages have passed through the accept/reject masks.) and the after_work (after the file has been downloaded or sent). Here is the source code for that callback class:

import json
import logging
from sarracenia.flowcb import FlowCB

logger = logging.getLogger(__name__)


class Log(FlowCB):
  def __init__(self, options):

      # FIXME: should a logging module have a logLevel setting?
      #        just put in a cookie cutter for now...
      if hasattr(options, 'logLevel'):
          logger.setLevel(getattr(logging, options.logLevel.upper()))
      else:
          logger.setLevel(logging.INFO)

  def after_accept(self, worklist):
      for msg in worklist.incoming:
          logger.info("accepted: %s " % msg_dumps(msg) )

  def after_work(self, worklist):
      for msg in worklist.ok:
          logger.info("worked successfully: %s " % msg.dumps() )

If you have multiple callbacks configured, they will be called in the same order they are configuration file. components in sr3 are often differentiated by the callbacks configured. For example, a watch is a flow with sarracenia.flowcb.gather.file.File class that is used to scan directories. A Common need when a data source is not easily accessed with python scripts is to use the script callback:

flowCallbackPrepend sarracenia.flowcb.script.Script

script_gather get_weird_data.sh

Using the _prepend variant of the flowCallback option, will have the Script callback class’s entry point, before the File callback… So A script will be executed, and then the directory will be checked for new files. Here is part of the Script callback class:

import logging
from sarracenia.flowcb import FlowCB
import subprocess

logger = logging.getLogger(__name__)


class Script(FlowCB):

   .
   .
   .

    def run_script( self, script ):
        try:
            subprocess.run( self.o.script_gather, check=True )
        except Exception as err:
            logging.error("subprocess.run failed err={}".format(err))
            logging.debug("Exception details:", exc_info=True)


    def gather(self ):
        if hasattr( self.o, 'script_gather') and self.o.script_gather is not None :
            self.run_script( self.o.script_gather )
        return []

Identity

One can use the import directive to add new checksum algorithms by sub-classing sarracenia.identity.Identity.

Transfer

One can add support for additional methods of downloading data by sub-classing sarracenia.transfer.Transfer.

Transfer protocol scripts should be declared using the import option. Aside the targetted built-in function(s), a module registered_as that defines a list of protocols that these functions provide. Example :

def registered_as(self) :

return [‘ftp’,’ftps’]

See the Programming Guide for more information on Extension development.

ROLES - feeder/admin/declare

of interest only to administrators

Administrative options are set using:

edit ~/.config/sr3/admin.conf

The feeder option specifies the account used by default system transfers for components such as shovel, sarra and sender (when posting).

  • feeder amqp{s}://<user>:<pw>@<post_brokerhost>[:port]/<vhost>

  • admin <name> (default: None)

The admin user is used to do maintenance operations on the pump such as defining the other users. Most users are defined using the declare option. The feeder can also be declared in that way.

  • declare <role> <name> (no defaults)

subscriber

A subscriber is user that can only subscribe to data and return report messages. Subscribers are not permitted to inject data. Each subscriber has an xs_<user> named exchange on the pump, where if a user is named Acme, the corresponding exchange will be xs_Acme. This exchange is where an subscribe process will send its report messages.

By convention/default, the anonymous user is created on all pumps to permit subscription without a specific account.

source

A user permitted to subscribe or originate data. A source does not necessarily represent one person or type of data, but rather an organization responsible for the data produced. So if an organization gathers and makes available ten kinds of data with a single contact email or phone number for questions about the data and its availability, then all of those collection activities might use a single ‘source’ account.

Each source gets a xs_<user> exchange for injection of data posts, and, similar to a subscriber to send report messages about processing and receipt of data. Source may also have an xl_<user> exchange where, as per report routing configurations, report messages of consumers will be sent.

feeder

A user permitted to write to any exchange. Sort of an administrative flow user, meant to pump notification messages when no ordinary source or subscriber is appropriate to do so. Is to be used in preference to administrator accounts to run flows.

User credentials are placed in the credentials files, and audit will update the broker to accept what is specified in that file, as long as the admin password is already correct.

CONFIGURATION FILES

While one can manage configuration files using the add, remove, list, edit, disable, and enable actions, one can also do all of the same activities manually by manipulating files in the settings directory. The configuration files for an sr_subscribe configuration called myflow would be here:

  • linux: ~/.config/sarra/subscribe/myflow.conf (as per: XDG Open Directory Specication )

  • Windows: %AppDir%/science.gc.ca/sarra/myflow.conf , this might be: C:UserspeterAppDataLocalscience.gc.casarramyflow.conf

  • MAC: FIXME.

The top of the tree has ~/.config/sarra/default.conf which contains settings that are read as defaults for any component on start up. In the same directory, ~/.config/sarra/credentials.conf contains credentials (passwords) to be used by sarracenia ( CREDENTIALS for details. )

One can also set the XDG_CONFIG_HOME environment variable to override default placement, or individual configuration files can be placed in any directory and invoked with the complete path. When components are invoked, the provided file is interpreted as a file path (with a .conf suffix assumed.) If it is not found as a file path, then the component will look in the component’s config directory ( config_dir / component ) for a matching .conf file.

If it is still not found, it will look for it in the site config dir (linux: /usr/share/default/sarra/component).

Finally, if the user has set option remote_config to True and if he has configured web sites where configurations can be found (option remote_config_url), The program will try to download the named file from each site until it finds one. If successful, the file is downloaded to config_dir/Downloads and interpreted by the program from there. There is a similar process for all plugins that can be interpreted and executed within sarracenia components. Components will first look in the plugins directory in the users config tree, then in the site directory, then in the sarracenia package itself, and finally it will look remotely.

SUNDEW COMPATIBILITY OPTIONS

For compatibility with Sundew, there are some additional delivery options which can be specified.

destfn_script <script> (default:None)

This option defines a script to be run when everything is ready for the delivery of the product. The script receives the sr_sender class instance. The script takes the parent as an argument, and for example, any modification to parent.msg.new_file will change the name of the file written locally.

filename <keyword> (default:WHATFN)

From metpx-sundew the support of this option give all sorts of possibilities for setting the remote filename. Some keywords are based on the fact that metpx-sundew filenames are five (to six) fields strings separated by for colons.

The default value on Sundew is NONESENDER, but in the interest of discouraging use of colon separation in files, the default in Sarracenia is WHATFN

The possible keywords are :

None
  • no Sundew-style filename processing. Leave the filename alone. (Not the same as NONE, described below.)

WHATFN
  • the first part of the Sundew filename (string before first :)

HEADFN
  • HEADER part of the sundew filename

SENDER
  • the Sundew filename may end with a string SENDER=<string> in this case the <string> will be the remote filename

NONE
  • deliver with the complete Sundew filename (without :SENDER=…)

NONESENDER
  • deliver with the complete Sundew filename (with :SENDER=…)

TIME
  • time stamp appended to filename. Example of use: WHATFN:TIME

DESTFN=str
  • direct filename declaration str

SATNET=1,2,3,A
  • cmc internal satnet application parameters

DESTFNSCRIPT=script.py
  • invoke a script (same as destfn_script) to generate the name of the file to write

accept <regexp pattern> [<keyword>]

keyword can be added to the accept option. The keyword is any one of the filename options. A notification message that matched against the accept regexp pattern, will have its remote_file plied this keyword option. This keyword has priority over the preceeding filename one.

The regexp pattern can be use to set directory parts if part of the notification message is put to parenthesis. sr_sender can use these parts to build the directory name. The rst enclosed parenthesis strings will replace keyword ${0} in the directory name… the second ${1} etc.

Example of use:

filename NONE

directory /this/first/target/directory

accept .*file.*type1.*

directory /this/target/directory

accept .*file.*type2.*

accept .*file.*type3.*  DESTFN=file_of_type3

directory /this/${0}/pattern/${1}/directory

accept .*(2016....).*(RAW.*GRIB).*

A selected notification message by the first accept would be delivered unchanged to the first directory.

A selected notification message by the second accept would be delivered unchanged to the second directory.

A selected notification message by the third accept would be renamed “file_of_type3” in the second directory.

A selected notification message by the forth accept would be delivered unchanged to a directory.

It’s named /this/20160123/pattern/RAW_MERGER_GRIB/directory if the notification message would have a notice like:

20150813161959.854 http://this.pump.com/ relative/path/to/20160123_product_RAW_MERGER_GRIB_from_CMC

Field Replacements

In MetPX Sundew, there is a much more strict file naming standard, specialised for use with World Meteorological Organization (WMO) data. Note that the file naming convention predates, and bears no relation to the WMO file naming convention currently approved, but is strictly an internal format. The files are separated into six fields by colon characters. The first field, DESTFN, gives the WMO (386 style) Abbreviated Header Line (AHL) with underscores replacing blanks:

TTAAii CCCC YYGGGg BBB ...

(see WMO manuals for details) followed by numbers to render the product unique (as in practice, though not in theory, there are a large number of products which have the same identifiers). The meanings of the fifth field is a priority, and the last field is a date/time stamp. The other fields vary in meaning depending on context. A sample file name:

SACN43_CWAO_012000_AAA_41613:ncp1:CWAO:SA:3.A.I.E:3:20050201200339

If a file is sent to sarracenia and it is named according to the Sundew conventions, then the following substitution fields are available:

${T1}    replace by bulletin's T1
${T2}    replace by bulletin's T2
${A1}    replace by bulletin's A1
${A2}    replace by bulletin's A2
${ii}    replace by bulletin's ii
${CCCC}  replace by bulletin's CCCC
${YY}    replace by bulletin's YY   (obs. day)
${GG}    replace by bulletin's GG   (obs. hour)
${Gg}    replace by bulletin's Gg   (obs. minute)
${BBB}   replace by bulletin's bbb
${RYYYY} replace by reception year
${RMM}   replace by reception month
${RDD}   replace by reception day
${RHH}   replace by reception hour
${RMN}   replace by reception minutes
${RSS}   replace by reception second

The ‘R’ fields come from the sixth field, and the others come from the first one. When data is injected into sarracenia from Sundew, the sundew_extension notification message header will provide the source for these substitions even if the fields have been removed from the delivered file names.