Downloading Using the Command Line

This jupyter notebook introduces Sarracenia version 3 usage from the command line (mostly on Linux, but should be similar on Windows and Mac also, main difference being different conventions for where preferences and logs are stored.) This is probably the easiest way to work with Sarracenia. You configure a flow to download files into a directory, and you can read the directory to process the files there.

[1]:
import sarracenia
!mkdir -p ~/.config/sr3/subscribe
!mkdir -p ~/.cache/sr3/log

Prerequisites

The above is just a way to get jupyter notebooks to install metpx-sr3 on a server. Creating some directories in case people use API access without running things through the API. The basic pre-requisite is to have metpx-sr3 installed somehow, either as a .deb package, or using pip (or pip3) available to the environment used by jupyter.

The rest of this notebook assumes metpx-sr3 is installed.

SR3

The command line interface is called sr3 (short for Sarracenia version 3). One defines flows to run using configuration files in a simple format: keyword value format. There are example configurations to get you started:

[2]:
!sr3 list examples
Sample Configurations: (from: /net/local/home/shakerm/sr3/sarracenia/examples )
cpump/cno_trouble_f00.inc        flow/amserver.conf
flow/opg.conf                    flow/poll.inc
flow/post.inc                    flow/report.inc
flow/sarra.inc                   flow/sender.inc
flow/shovel.inc                  flow/subscribe.inc
flow/watch.inc                   flow/winnow.inc
poll/airnow.conf                 poll/aws-nexrad.conf
poll/copernicus_odata.conf       poll/mail.conf
poll/nasa-mls-nrt.conf           poll/nasa_cmr_opendap.conf
poll/nasa_cmr_other.conf         poll/nasa_cmr_podaac.conf
poll/noaa.conf                   poll/soapshc.conf
poll/usgs.conf                   post/WMO_mesh_post.conf
sarra/wmo_mesh.conf              sender/am_send.conf
sender/ec2collab.conf            sender/pitcher_push.conf
shovel/no_trouble_f00.inc        subscribe/aws-nexrad.conf
subscribe/dd_2mqtt.conf          subscribe/dd_all.conf
subscribe/dd_amis.conf           subscribe/dd_aqhi.conf
subscribe/dd_cacn_bulletins.conf subscribe/dd_citypage.conf
subscribe/dd_cmml.conf           subscribe/dd_gdps.conf
subscribe/dd_radar.conf          subscribe/dd_rdps.conf
subscribe/dd_swob.conf           subscribe/ddc_cap-xml.conf
subscribe/ddc_normal.conf        subscribe/download_all_nasa_earthdata.conf
subscribe/downloademail.conf     subscribe/ec_ninjo-a.conf
subscribe/get_copernicus.conf    subscribe/hpfxWIS2DownloadAll.conf
subscribe/hpfx_amis.conf         subscribe/hpfx_citypage.conf
subscribe/local_sub.conf         subscribe/ping.conf
subscribe/pitcher_pull.conf      subscribe/sci2ec.conf
subscribe/subnoaa.conf           subscribe/subsoapshc.conf
subscribe/subusgs.conf           watch/master.conf
watch/pitcher_client.conf        watch/pitcher_server.conf
watch/sci2ec.conf

There are different kinds for flows: the examples are classified by flow type (poll, post, sarra, sender, shovel, etc.) A subscribe is used by clients to download from a data pump. Let’s pick one of those.

[3]:
!sr3 add subscribe/hpfx_amis.conf
add: 2024-03-06 23:48:56,706 2118966 [INFO] sarracenia.sr add copying: /net/local/home/shakerm/sr3/sarracenia/examples/subscribe/hpfx_amis.conf to /net/local/home/shakerm/.config/sr3/subscribe/hpfx_amis.conf

The files that are active for you are placed in ~/.config/sr3/<flow_type>/config_name. You can browse there and modify them with an editor if you like. You can also do that with sr3 edit subscribe/hpfx_amis.conf.

# this is a feed of wmo bulletin (a set called AMIS in the old times)

broker amqps://hpfx.collab.science.gc.ca/
exchange xpublic

# instances: number of downloading processes to run at once.  Defaults to 1. Not enough for this case
instances 5

# expire, in operational use, should be longer than longest expected interruption
expire 10m

topicPrefix v02.post
subtopic *.WXO-DD.bulletins.alphanumeric.#
mirror false
directory /tmp/hpfx_amis/

Add the messageCountMax, so it doesn’t run forever:

[4]:
!mkdir /tmp/hpfx_amis
!echo messageCountMax 10 >>~/.config/sr3/subscribe/hpfx_amis.conf

The root directory where files are to be placed needs to exist before you start. The above commands are to configure on a Linux machine, you might need something else on a mac or windows.

You can then run a flow interactively with the foreground action, and it will end quickly, like so:

[5]:
!sr3 foreground subscribe/hpfx_amis.conf
2024-03-06 23:49:06,570 2118978 [INFO] sarracenia.config finalize overriding batch for consistency with messageCountMax: {self.batch}
.2024-03-06 23:49:06,841 [INFO] 2118981 sarracenia.config finalize overriding batch for consistency with messageCountMax: {self.batch}
2024-03-06 23:49:06,846 [INFO] 2118981 sarracenia.config finalize overriding batch for consistency with messageCountMax: {self.batch}
2024-03-06 23:49:06,846 [INFO] 2118981 sarracenia.flow loadCallbacks flowCallback plugins to load: ['sarracenia.flowcb.gather.message.Message', 'sarracenia.flowcb.retry.Retry', 'sarracenia.flowcb.housekeeping.resources.Resources', 'log']
2024-03-06 23:49:06,855 [INFO] 2118981 sarracenia.flowcb.log __init__ subscribe initialized with: logEvents: {'after_post', 'on_housekeeping', 'after_work', 'after_accept'},  logMessageDump: False
2024-03-06 23:49:06,855 [INFO] 2118981 sarracenia.flow run callbacks loaded: ['sarracenia.flowcb.gather.message.Message', 'sarracenia.flowcb.retry.Retry', 'sarracenia.flowcb.housekeeping.resources.Resources', 'log']
2024-03-06 23:49:06,855 [INFO] 2118981 sarracenia.flow run pid: 2118981 subscribe/hpfx_amis instance: 0
2024-03-06 23:49:06,906 [INFO] 2118981 sarracenia.moth.amqp _queueDeclare queue declared q_anonymous_subscribe.hpfx_amis.16789186.78043112 (as: amqps://anonymous@hpfx.collab.science.gc.ca/), (messages waiting: 0)
2024-03-06 23:49:06,906 [INFO] 2118981 sarracenia.moth.amqp getSetup binding q_anonymous_subscribe.hpfx_amis.16789186.78043112 with v02.post.*.WXO-DD.bulletins.alphanumeric.# to xpublic (as: amqps://anonymous@hpfx.collab.science.gc.ca/)
2024-03-06 23:49:06,918 [INFO] 2118981 sarracenia.flow run now active on vip ['AnyAddressIsFine']
2024-03-06 23:49:15,332 [INFO] 2118981 sarracenia.flowcb.log after_accept accepted: (lag: 4.84 ) https://hpfx.collab.science.gc.ca /20240306/WXO-DD/bulletins/alphanumeric/20240306/WV/MMMX/23/WVMX31_MMMX_062348___03321
2024-03-06 23:49:15,333 [INFO] 2118981 sarracenia.flowcb.log after_accept accepted: (lag: 7.61 ) https://hpfx.collab.science.gc.ca /20240306/WXO-DD/bulletins/alphanumeric/20240306/SR/KWAL/23/SRCN40_KWAL_062348___32108
2024-03-06 23:49:15,333 [INFO] 2118981 sarracenia.flowcb.log after_accept accepted: (lag: 2.43 ) https://hpfx.collab.science.gc.ca /20240306/WXO-DD/bulletins/alphanumeric/20240306/SX/KWAL/23/SXCN40_KWAL_062348___23784
2024-03-06 23:49:15,333 [INFO] 2118981 sarracenia.flowcb.log after_accept accepted: (lag: 2.43 ) https://hpfx.collab.science.gc.ca /20240306/WXO-DD/bulletins/alphanumeric/20240306/SR/KWAL/23/SRCN40_KWAL_062348___45099
2024-03-06 23:49:15,333 [INFO] 2118981 sarracenia.flowcb.log after_accept accepted: (lag: 2.43 ) https://hpfx.collab.science.gc.ca /20240306/WXO-DD/bulletins/alphanumeric/20240306/SR/KWAL/23/SRCN40_KWAL_062348___35424
2024-03-06 23:49:15,333 [INFO] 2118981 sarracenia.flowcb.log after_accept accepted: (lag: 2.43 ) https://hpfx.collab.science.gc.ca /20240306/WXO-DD/bulletins/alphanumeric/20240306/SR/KWAL/23/SRCN40_KWAL_062348___47804
2024-03-06 23:49:15,333 [INFO] 2118981 sarracenia.flowcb.log after_accept accepted: (lag: 2.43 ) https://hpfx.collab.science.gc.ca /20240306/WXO-DD/bulletins/alphanumeric/20240306/SR/KWAL/23/SRME20_KWAL_062348___48533
2024-03-06 23:49:15,333 [INFO] 2118981 sarracenia.flowcb.log after_accept accepted: (lag: 2.19 ) https://hpfx.collab.science.gc.ca /20240306/WXO-DD/bulletins/alphanumeric/20240306/SR/KWAL/23/SRME20_KWAL_062348___768
2024-03-06 23:49:15,333 [INFO] 2118981 sarracenia.flowcb.log after_accept accepted: (lag: 1.43 ) https://hpfx.collab.science.gc.ca /20240306/WXO-DD/bulletins/alphanumeric/20240306/SR/KWAL/23/SRCN40_KWAL_062348___19124
2024-03-06 23:49:15,333 [INFO] 2118981 sarracenia.flowcb.log after_accept accepted: (lag: 1.43 ) https://hpfx.collab.science.gc.ca /20240306/WXO-DD/bulletins/alphanumeric/20240306/SO/KWAL/23/SOLC10_KWAL_062348___60057
2024-03-06 23:49:15,455 [INFO] 2118981 sarracenia.flowcb.log after_work downloaded ok: /tmp/hpfx_amis/WVMX31_MMMX_062348___03321
2024-03-06 23:49:15,455 [INFO] 2118981 sarracenia.flowcb.log after_work downloaded ok: /tmp/hpfx_amis/SRCN40_KWAL_062348___32108
2024-03-06 23:49:15,455 [INFO] 2118981 sarracenia.flowcb.log after_work downloaded ok: /tmp/hpfx_amis/SXCN40_KWAL_062348___23784
2024-03-06 23:49:15,455 [INFO] 2118981 sarracenia.flowcb.log after_work downloaded ok: /tmp/hpfx_amis/SRCN40_KWAL_062348___45099
2024-03-06 23:49:15,455 [INFO] 2118981 sarracenia.flowcb.log after_work downloaded ok: /tmp/hpfx_amis/SRCN40_KWAL_062348___35424
2024-03-06 23:49:15,455 [INFO] 2118981 sarracenia.flowcb.log after_work downloaded ok: /tmp/hpfx_amis/SRCN40_KWAL_062348___47804
2024-03-06 23:49:15,455 [INFO] 2118981 sarracenia.flowcb.log after_work downloaded ok: /tmp/hpfx_amis/SRME20_KWAL_062348___48533
2024-03-06 23:49:15,455 [INFO] 2118981 sarracenia.flowcb.log after_work downloaded ok: /tmp/hpfx_amis/SRME20_KWAL_062348___768
2024-03-06 23:49:15,455 [INFO] 2118981 sarracenia.flowcb.log after_work downloaded ok: /tmp/hpfx_amis/SRCN40_KWAL_062348___19124
2024-03-06 23:49:15,455 [INFO] 2118981 sarracenia.flowcb.log after_work downloaded ok: /tmp/hpfx_amis/SOLC10_KWAL_062348___60057
2024-03-06 23:49:15,456 [INFO] 2118981 sarracenia.flow please_stop ok, telling 4 callbacks about it.
2024-03-06 23:49:15,456 [INFO] 2118981 sarracenia.flow run starting last pass (without gather) through loop for cleanup.
2024-03-06 23:49:15,456 [INFO] 2118981 sarracenia.flow please_stop ok, telling 4 callbacks about it.
2024-03-06 23:49:15,456 [INFO] 2118981 sarracenia.flow run on_housekeeping pid: 2118981 subscribe/hpfx_amis instance: 0
2024-03-06 23:49:15,456 [INFO] 2118981 sarracenia.flowcb.gather.message on_housekeeping messages: good: 10 bad: 0 bytes: 1.4 KiB average: 138 Bytes
2024-03-06 23:49:15,456 [INFO] 2118981 sarracenia.flowcb.retry on_housekeeping on_housekeeping
2024-03-06 23:49:15,456 [INFO] 2118981 sarracenia.diskqueue on_housekeeping work_retry_00 on_housekeeping
2024-03-06 23:49:15,458 [INFO] 2118981 sarracenia.diskqueue on_housekeeping No retry in list
2024-03-06 23:49:15,458 [INFO] 2118981 sarracenia.diskqueue on_housekeeping on_housekeeping elapse 0.002104
2024-03-06 23:49:15,458 [INFO] 2118981 sarracenia.diskqueue on_housekeeping post_retry_000 on_housekeeping
2024-03-06 23:49:15,460 [INFO] 2118981 sarracenia.diskqueue on_housekeeping No retry in list
2024-03-06 23:49:15,460 [INFO] 2118981 sarracenia.diskqueue on_housekeeping on_housekeeping elapse 0.001996
2024-03-06 23:49:15,461 [INFO] 2118981 sarracenia.flowcb.housekeeping.resources on_housekeeping Current Memory cpu_times: user=0.24 system=0.03
2024-03-06 23:49:15,461 [INFO] 2118981 sarracenia.flowcb.housekeeping.resources on_housekeeping Current mem usage: 79.3 MiB, accumulating count (10 or 10/100 so far) before self-setting threshold
2024-03-06 23:49:15,461 [INFO] 2118981 sarracenia.flowcb.log stats version: 3.00.52rc2, started: 8 seconds ago, last_housekeeping:  8.6 seconds ago
2024-03-06 23:49:15,461 [INFO] 2118981 sarracenia.flowcb.log stats messages received: 10, accepted: 10, rejected: 0   rate accepted: 100.0% or 1.2 m/s
2024-03-06 23:49:15,461 [INFO] 2118981 sarracenia.flowcb.log stats files transferred: 10 bytes: 2.0 KiB rate: 243 Bytes/sec
2024-03-06 23:49:15,461 [INFO] 2118981 sarracenia.flowcb.log stats lag: average: 2.97, maximum: 7.61
2024-03-06 23:49:15,461 [INFO] 2118981 sarracenia.flowcb.log on_housekeeping housekeeping
2024-03-06 23:49:15,461 [INFO] 2118981 sarracenia.flow run clean stop from run loop
2024-03-06 23:49:15,462 [INFO] 2118981 sarracenia.flowcb.gather.message on_stop closing
2024-03-06 23:49:15,462 [INFO] 2118981 sarracenia.flow close flow/close completed cleanly pid: 2118981 subscribe/hpfx_amis instance: 0

As you can see, it downloaded five files to /tmp/amis. The foreground action is intended to help with debugging, rather than real operations.

[6]:
!sr3 status
2024-03-06 23:49:30,243 2118998 [INFO] sarracenia.config finalize overriding batch for consistency with messageCountMax: {self.batch}
status:
Component/Config                         Processes   Connection        Lag                              Rates
                                         State   Run Retry  msg data   Queued  LagMax LagAvg  Last  %rej     pubsub messages   RxData     TxData
                                         -----   --- -----  --- ----   ------  ------ ------  ----  ----     ------ --------   ------     ------
subscribe/hpfx_amis                      stop    0/0          -          -         -     -     -          -        -
      Total Running Configs:   0 ( Processes: 0 missing: 0 stray: 0 )
                     Memory: uss:0 Bytes rss:0 Bytes vms:0 Bytes
                   CPU Time: User:0.00s System:0.00s
           Pub/Sub Received: 0 msgs/s (0 Bytes/s), Sent:  0 msgs/s (0 Bytes/s) Queued: 0 Retry: 0, Mean lag: 0.00s
              Data Received: 0 Files/s (0 Bytes/s), Sent: 0 Files/s (0 Bytes/s)

Above, you can see there is 1 configuration in your list. You can have hundreds. The columns on the right refer to how many instances you have for each configuration. In the example above, instances is set to 5, so one would expect to see 5 running instances when it would be running. You can start specifc configurations (in this case a subscribe config) with sr3 start subscribe/<config>, or start all active configs from all components (sarra, subscribe, watch, winnow, etc.) with sr3 start

[7]:
!sr3 log subscribe/hpfx_amis.conf
2024-03-06 23:45:56,401 2118802 [INFO] sarracenia.config finalize overriding batch for consistency with messageCountMax: {self.batch}
tail: cannot open '/net/local/home/shakerm/.cache/sr3/log/subscribe_hpfx_amis_01.log' for reading: No such file or directory
tail: no files remaining
2024-03-06 23:45:56,406 2118802 [CRITICAL] root run_command subprocess.run failed err=Command '['tail', '-f', '/net/local/home/shakerm/.cache/sr3/log/subscribe_hpfx_amis_01.log']' returned non-zero exit status 1.

When running in the background, output needs to go a log file. Since we have only ran this configuration file in the foreground, asking to see the log prints an error about the log being missing. This tells you that the logs are in the ~/.cache/sr3/log directory. Logs can be monitored in real-time with traditional tools such as tail -f or grep.

sr3 stop does what you expect.

Processes can crash. In the sr3 status output above, if the number of processes in the Run column is less than in the Exp (for Expected) one, then it means that some instances have crashed. You can repair it (just start the missing instances) with:

sr3 sanity – start missing instances, also kill strays if any found.

So that’s it, an introduction to running configurations in Sarracenia from the command line.

Conclusion

If all you want to do is obtain data from a data pump in real-time, the easiest way to go is using the command line interface to control some processes that run all the time so that they dump files in a certain directory.

It isn’t very efficient though. When dealing with a large number of files and aiming for high-speed processing, it’s more efficient to have your own application receive notifications about file arrivals rather than scanning a directory. This approach reduces CPU and I/O overhead while improving processing speed.

The easiest way to do that is to add some callbacks to your flows. We’ll cover that next.