Message v01 Format

Status: Approved-Draft1-20150805

Description of the message protocol / format.

This file documents final conclusions/proposals, reasoning/debates goes elsewhere.

Messages posted include a ´topic´ and a ´body.´

The message topic breaks down as follows:

<version>.<type>.[varies by version].<dir>.<dir>.<dir>...

<version>:
        exp -- initial version, deprecated (not covered in this document)
        v00 -- used for NURP & PAN-AM in 2013-2014. (not covered in this document)
        v01 -- 2015 version.

<type>:
        adm  - change settings
                ´admin´, ´config´, etc...

        log  - report status of operations.

        notify - ´post´ but in exp and v00 versions. (not covered here.)

        post - announce or notify that a new product block is available.
                possible strings: post,ann(ounce), not(ify)

<source>:

Rest of this document assumes version 1 (v01 topic):

topic: <version>.<type>.<src>(.<dir>.)*.<filename> content: 1st line: <date stamp> <blocksize in bytes> <filesize in blocks> <block#> <remainder> <flags> <md5sum> <flowid> <srcpath> <relpath>

breaks down to:

<date stamp>: date
      YYYYMMDDHHMMSS.<decimal>

<blocksize in bytes>: bsz
      the number of bytes in a block.
      checksums are calculated per block, so one post

<filesize in blocks>: fzb
      the integer total number of blocks in the file
      FIXME: (including the last block or not?)
      if set to 1.

<block#>: bno
      0 origin, the block number covered by this posting.

<remainder>: brem
      normally 0, on the last block, it remaining blocks in the file
      to transfer.

      -- if (fzb=1 and brem=0)
             then bsz=fsz in bytes in bytes.
             -- entire files replaced.
             -- this is the same as rsync's --whole-file mode.

<flags>:a comma-separated list of option letters, some with arguments after ´=´.

      checksum setting contained in ´flags´ field, but is not the whole
      thing.  Other letters/digits could be there to designate other things.
      ´=´ acts as a separator of flags from arguments.

      results in ´flags´ entry:

      0 - no checksums (unconditional copy.)
      d - checksum the entire data
      n - checksum the file name
      c=<script> - checksum with a script, named <script>

              <script> should be ´registered´ in the switch network.
                      registered means that all downstream subscribers
                      can obtain the script to validate the checksum.
                      there needs to be a retrieval mechanism.

      other possible flag values:

              u - unlinked... for files that have been removed? 'r'?

      File Segment strategy:
              i - inplace (do not create temporary files, just lseek
                      within file.)
                  may result in .ddsig file being created?
              p - part files.  use .part files,  suffix fixed.
                  do not know which will be default.
         - file segment strategy can be overridden by client. just a suggestion.
         - analogous to rsync options: --inplace, --partial,

<flowid>
      an arbitrary tag used for tracking of data through the network.

The two paths are subtly inter-related.  Neither can be interpreted on their own.
One must consider both path components.
------

      what if there are spaces in the file name?
      it is url-encoded, so a space should turn into: %20

------


<srcpath> -- the base URL used to retrieve the data.

      options: Complete URL:

      sftp://afsiext@cmcdataserver/data/NRPDS/outputs/NRPDS_HiRes_000.gif

      in the case where the URL does not end with a path separator ('/'),
      the src path is taken to be the complete source of the file to retrieve.


      Static URL:

      sftp://afsiext@cmcdataserver/

      If the URL ends with a path separator ('/'), then the src URL is
      considered a prefix for the variable part of the retrieval URL.


<relpath> -- The relative path from the current directory in which to
      place the file.

      Two cases based on the end being a path separator or not.

      case 1: NURP/GIF/

      based on the current working directory of the downloading client,
      create a subdirectory called URP, and within that, a subdirectory
      called GIF will be created.  The file name will be taken from the
      srcpath.

      if the srcpath ends in pathsep, then the relpath here will be
      concatenated to the srcpath, forming the complete retrieval URL.

      case 2: NRP/GIF/mine.gif

      if the  srcpath ends in pathsep, then the relpath will be concatenated
      to srcpath for form the complete retrieval URL.

      if the src path does not end in pathsep, then the src URL is taken
      as complete, and the file is renamed on download according to the
      specification (in this case, mine.gif)
FIXME: verify the following:

fsz = Size of a file in bytes = ( bsz * (fsb-1) ) + brem ?

example 1:

v01.post.ec_cmc.NRDPS.GIF.NRDPS_HiRes_000.gif
201506011357.345 457 1 0 0 d <md5sum> exp13 sftp://afsiext@cmcdataserver/data/NRPDS/outputs/NRDPS_HiRes_000.gif NRDPS/GIF/

      v01 - version of protocol
      post - indicates the type of message

      version and type together determine format of following topics and the message body.

      ec_cmc - the account used to issue the post (unique in a network).

        -- blocksize is 457  (== file size)
        -- block count is 1
        -- remainder is 0.
        -- block number is 0.
        -- d - checksum was calculated on the body.
        -- flow is an argument after the relative path.
        -- complete source URL specified (does not end in '/')
        -- relative path specified for

      pull from:
              sftp://afsiext@cmcdataserver/data/NRPDS/outputs/NRDPS_HiRes_000.gif

      complete relative download path:
              NRDPS/GIF/NRDPS_HiRes_000.gif

              -- takes file name from srcpath.
              -- may be modified by validation process.

example 2:

v01.post.ec_cmc.NRDPS.GIF.NRDPS_HiRes_000.gif
201506011357.345 457 1 0 0 d <md5sum> exp13 http://afsiext@cmcdataserver/data/  NRDPS/GIF/NRDPS_HiRes_000.gif

in this case, the
      pull from:
              http://afsiext@cmcdataserver/data/NRPDS/GIF/NRDPS_HiRes_000.gif

              -- srcpath ends in '/', so concatenated, takes file from relative URL.
              -- true 'mirror'


      complete relative download path:
              NRDPS/GIF/NRDPS_HiRes_000.gif

              -- may be modified by validation process.

Log messages

Log message contains:

is only emitted after processing is completed, to indicate a final status.

topic matches notification message message except…

v01.log.<source>.<consumer>……

version is protocol version, should increment in sync with notify.

start is as per post… just add fields after:

<date> blksz blckcnt remainder blocknum flags <flow> baseurl relativeurl <status> <host> <client> <duration>

CFG messages

just a place holder.

really not baked yet. thinking is in configuration.txt

v01.cfg