www.alea.net - Usenet
Ressources sur Usenet-Fr et Usenet
          INTERNET DRAFT to be        NEWS                      sec. -





                      News Article Format and Transmission

                                 Henry Spencer



          Status of this Memo

          This  document  is  intended  to  become  an Internet Draft.
          Internet Drafts are working documents of the Internet  Engi-
          neering  Task  Force  (IETF),  its  Areas,  and  its Working
          Groups.  Note that other groups may also distribute  working
          documents as Internet Drafts.

          Internet  Drafts  are draft documents valid for a maximum of
          six months.  Internet Drafts may be  updated,  replaced,  or
          obsoleted  by other documents at any time.  It is not appro-
          priate to use Internet Drafts as reference  material  or  to
          cite  them  other  than  as  a  "working  draft" or "work in
          progress".

          Please check the I-D  abstract  listing  contained  in  each
          Internet Draft directory to learn the current status of this
          or any other Internet Draft.  (Actually, this  draft  is  at
          too early a stage to even be listed there yet.)

          It is hoped that a later version of this Draft will obsolete
          RFC 1036 and will become an Internet standard.

          References to the "successor to this  Draft"  refer  not  to
          later  versions  of this draft, but to a hypothetical future
          rewrite of this Draft (in the same way that this Draft is  a
          rewrite of RFC 1036).

          Distribution of this memo is unlimited.


          Abstract

          This Draft defines the format and procedures for interchange
          of network news articles.  It is hoped that a later  version
          of this Draft will obsolete RFC 1036, reflecting more recent
          experience and accommodating future directions.

          Network news articles resemble mail messages but are  broad-
          cast  to potentially-large audiences, using a flooding algo-
          rithm that propagates one copy to each interested  host  (or
          group thereof), typically stores only one copy per host, and
          does not require any central  administration  or  systematic
          registration  of  interested users.  Network news originated
          as the medium  of  communication  for  Usenet,  circa  1980.



          2 June 1994                 - 1 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                      sec. -


          Since  then  Usenet has grown explosively, and many Internet
          sites participate in it.  In addition, the  news  technology
          is now in widespread use for other purposes, on the Internet
          and elsewhere.

          This Draft primarily codifies and organizes  existing  prac-
          tice.   A few small extensions have been added in an attempt
          to solve problems that are considered serious.  Major exten-
          sions (e.g. cryptographic authentication) that need signifi-
          cant development effort are left to be undertaken  as  inde-
          pendent efforts.


          Table of Contents

          TBW


          1. Introduction

          Network  news articles resemble mail messages but are broad-
          cast to potentially-large audiences, using a flooding  algo-
          rithm  that  propagates one copy to each interested host (or
          groups thereof), typically stores only one  copy  per  host,
          and  does  not require any central administration or system-
          atic registration of interested users.  Network news  origi-
          nated as the medium of communication for Usenet, circa 1980.
          Since then Usenet has grown explosively, and  many  Internet
          sites  participate  in it.  In addition, the news technology
          is now in widespread use for other purposes, on the Internet
          and elsewhere.

          The  earliest  news  interchange used the so-called "A News"
          article  format.   Shortly  thereafter,  an  article  format
          vaguely  resembling  Internet  mail  was  devised  and  used
          briefly.  Both of those  formats  are  completely  obsolete;
          they  are  documented  in  appendix A for historical reasons
          only.  With publication of RFC 850 [rrr] in 1983, news arti-
          cles  came  to closely resemble Internet mail messages, with
          some restrictions and some  additional  headers.   RFC  1036
          [rrr]  in 1987 updated RFC 850 without making major changes.

          In the intervening five years, the RFC 1036  article  format
          has  proven  quite  satisfactory,  although minor extensions
          appear desirable to match recent developments in areas  such
          as  multi-media  mail.  RFC 1036 itself has not proven quite
          so satisfactory.  It is often  rather  vague  and  does  not
          address  some  issues  at  all;  this has caused significant
          interoperability problems at times, and implementations have
          diverged  somewhat.  Worse, although it was intended primar-
          ily to document existing  practice,  it  did  not  precisely
          match  existing  practice even at the time it was published,
          and the deviations have grown since.




          2 June 1994                 - 2 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                      sec. 1


          This Draft attempts to specify the format of  articles,  and
          the  procedures  used  to exchange them and process them, in
          sufficient detail to allow full interoperability.  In  addi-
          tion,  some  tentative suggestions are made about directions
          for future development, in an attempt to  avert  unnecessary
          divergence  and  consequent loss of interoperability.  Major
          extensions (e.g.  cryptographic  authentication)  that  need
          significant  development effort are left to be undertaken as
          independent efforts.

               NOTE: One question this all may raise is:  why  is
               there  no  News-Version header, analogous to MIME-
               Version, specifying a version number corresponding
               to  this specification?  The answer is: it doesn't
               appear  to  be  useful,  given  news's   backward-
               compatibility  constraints.   The  major  use of a
               version number  is  indicating  which  of  several
               INCOMPATIBLE  interpretations  is  relevant.   The
               impossibility of orchestrating any sort of  simul-
               taneous change over news's installed base makes it
               necessary to avoid such incompatible  changes  (as
               opposed  to extensions) entirely.  MIME has a ver-
               sion number mostly because it introduced incompat-
               ible  changes  to  the  interpretation  of several
               "Content-"  headers.   This  Draft   attempts   no
               changes  in interpretation and it appears doubtful
               that future Drafts will find it feasible to intro-
               duce any.

               UNRESOLVED  ISSUE:  Should  this  be reconsidered?
               Only if the header has SPECIFIC IDENTIFIABLE  uses
               today.  Otherwise it's just useless added bulk.

          As  in  this  Draft's  predecessors, the exact means used to
          transmit articles from one host to another is not specified.
          NNTP  [rrr]  is probably the most common transmission method
          on the Internet, but a number of others are known to  be  in
          use,  including  the UUCP protocol [rrr] extensively used in
          the early days of Usenet and still much used on its  fringes
          today.

          Several  of  the mechanisms described in this Draft may seem
          somewhat strange or even bizarre at first reading.  As  with
          Internet  mail, there is no reasonable possibility of updat-
          ing the entire installed base of news software promptly,  so
          interoperability  with  old  software  is  crucial  and will
          remain so.  Compatibility with existing practice and robust-
          ness  in  an  imperfect world necessarily take priority over









          2 June 1994                 - 3 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                      sec. 1


          elegance.


          2. Definitions, Notations, and Conventions


          2.1. Textual Notations

          Throughout this Draft, "MAIL" is short for "RFC 822 [rrr] as
          amended  by  RFC  1123  [rrr]".   (RFC 1123's amendments are
          mostly relatively small, but they  are  not  insignificant.)
          See  also  the  discussion  in  section 3 about this Draft's
          relationship to MAIL.  "MIME" is short for  "RFCs  1341  and
          1342" (or their updated replacements).

               UNRESOLVED ISSUE: Update these numbers.

          "ASCII"  is  short  for "the ANSI X3.4 character set" [rrr].
          While "ASCII" is often misused to refer to various character
          sets  somewhat similar to X3.4, in this Draft, "ASCII" means
          X3.4 and only X3.4.

               NOTE: The name is traditional (to the point  where
               the  ANSI standard sanctions it) even though it is
               no longer an acronym for the name of the standard.

               NOTE:  ASCII,  X3.4,  contains 128 characters, not
               all of them printable.  Character sets  with  more
               characters   are  not  ASCII,  although  they  may
               include it as a subset.

          Certain words used to define the significance of  individual
          requirements are capitalized.  "MUST" means that the item is
          an absolute  requirement  of  the  specification.   "SHOULD"
          means that the item is a strong recommendation: there may be
          valid reasons to ignore it  in  unusual  circumstances,  but
          this  should  be  done  only after careful study of the full
          implications and a firm conclusion  that  it  is  necessary,
          because  there are serious disadvantages to doing so.  "MAY"
          means that the item is truly optional, and implementors  and
          users  are warned that conformance is possible but not to be
          relied on.

          The term "compliant", applied to implementations etc., indi-
          cates  satisfaction  of  all  relevant  "MUST"  and "SHOULD"
          requirements.  The term "conditionally compliant"  indicates
          satisfaction  of all relevant "MUST" requirements but viola-
          tion of at least one relevant "SHOULD" requirement.

          This Draft contains explanatory notes  using  the  following
          format.   These  may be skipped by persons interested solely
          in the content of the specification.   The  purpose  of  the
          notes  is to explain why choices were made, to place them in
          context, or to suggest possible implementation techniques.



          2 June 1994                 - 4 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 2.1


               NOTE: While such explanatory notes may seem super-
               fluous  in  principle,  they  often help the less-
               than-omniscient reader grasp the  purpose  of  the
               specification and the constraints involved.  Given
               the limitations of natural language  for  descrip-
               tive  purposes, this improves the probability that
               implementors and users will  understand  the  true
               intent  of  the  specification  in cases where the
               wording is not entirely clear.

          All numeric values are given  in  decimal  unless  otherwise
          indicated.   Octets  are  assumed  to be unsigned values for
          this purpose.  Large numbers are  written  using  the  North
          American  convention, in which "," separates groups of three
          digits but otherwise has no significance.


          2.2. Syntax Notation

          Although the mechanisms specified  in  this  Draft  are  all
          described  in prose, most are also described formally in the
          modified BNF notation of RFC 822.  Implementors will need to
          be  familiar  with  this  notation  to fully understand this
          specification, and are referred to RFC 822  for  a  complete
          explanation  of  the modified BNF notation.  Here is a brief
          illustrative example:

               sentence  = clause *( punct clause ) "."
               punct     = ":" / ";"
               clause    = 1*word [ "(" clause ")" / "," 1*word ]
               word      = 

          This defines a sentence as some clauses separated by  puncts
          and  ended  by  a period, a punct as a colon or semicolon, a
          clause as at least one  optionally followed by  either
          a  parenthesized  clause  or  a  comma and at least one more
          , and a  as (informally) any English  word.   <>
          are  used to enclose names when (and only when) distinguish-
          ing them from surrounding text is useful.  The full form  of
          the  repetition  notation  is "*", denoting 
          through  repetitions of ;   defaults  to  zero,
            to  infinity, and the "*" and  can be omitted if 
          and  are equal, so 1*word is one or more  words,  1*5word
          is one through five words, and 2word is exactly two words.

          The  character  "\"  is not special in any way in this nota-
          tion.

          This Draft is intended  to  be  self-contained;  all  syntax
          rules  used in it are defined within it, and a rule with the
          same name as one found in MAIL does not necessarily have the
          same  definition.   The lexical layer of MAIL is NOT, repeat
          NOT, used in this  Draft,  and  its  presence  must  not  be
          assumed;  notably,  this  Draft  spells out all places where



          2 June 1994                 - 5 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 2.2


          white space is permitted/required and all places where  con-
          structs resembling MAIL comments can occur.

               NOTE:  News  parsers  historically  have been much
               less permissive than MAIL parsers.


          2.3. Definitions

          The term "character set", wherever it is used in this Draft,
          refers to a coded character set, in the sense of ISO charac-
          ter set standardization work, and must not be misinterpreted
          as meaning merely "a set of characters".

          In this Draft, ASCII character 32 is referred to as "blank";
          the word "space" has a more generic meaning.

          An "article" is the unit of news, analogous to a MAIL  "mes-
          sage".

          A "poster" is a human being (or software equivalent) submit-
          ting a  possibly-compliant  article  to  be  "posted":  made
          available  for  reading  on  all relevant hosts.  A "posting
          agent" is software that assists posters to prepare articles,
          including  determining  whether the final article is compli-
          ant, passing it on to a  relayer  for  posting  if  so,  and
          returning  it  to  the poster with an explanation if not.  A
          "relayer" is  software  which  receives  allegedly-compliant
          articles  from  posting  agents and/or other relayers, files
          copies in a "news database", and possibly passes  copies  on
          to other relayers.

               NOTE:  While  the  same software may well function
               both as a relayer and as part of a posting  agent,
               the  two  functions are distinct and should not be
               confused.  The  posting  agent's  purpose  is  (in
               part) to validate an article, supply header infor-
               mation that can or should  be  supplied  automati-
               cally, and generally take reasonable actions in an
               attempt to transform the poster's submission  into
               a  compliant article.  The relayer's purpose is to
               move already-compliant articles around efficiently
               without damaging them.

          A "reader" is a human being reading news articles.  A "read-
          ing agent" is software which presents articles to a  reader.

               NOTE:  Informal usage often uses "reader" for both
               these meanings, but this  introduces  considerable
               potential  for  confusion and misunderstanding, so
               this Draft takes care to make the distinction.

          A "newsgroup" is a single news  forum,  a  logical  bulletin
          board,  having a name and nominally intended for articles on



          2 June 1994                 - 6 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 2.3


          a specific topic.  An article is "posted to" a single  news-
          group  or  several newsgroups.  When an article is posted to
          more than one newsgroup, it is said  to  be  "cross-posted";
          note that this differs from posting the same text as part of
          each of several articles, one per newsgroup.  A  "hierarchy"
          is  the set of all newsgroups whose names share a first com-
          ponent (see the name syntax in section 5.5).

          A newsgroup may be "moderated", in  which  case  submissions
          are  not  posted  directly,  but mailed to a "moderator" for
          consideration and possible posting.   Moderators  are  typi-
          cally  human but may be implemented partially or entirely in
          software.

          A "followup" is an article containing a response to the con-
          tents of an earlier article (the followup's "precursor").  A
          "followup agent" is a combination of reading agent and post-
          ing agent that aids in the preparation and posting of a fol-
          lowup.

          Text  comparisons  are  "case-sensitive"  if  they  consider
          uppercase  letters  (e.g. "A") different from lowercase let-
          ters (e.g. "a"), and "case-insensitive" if letters differing
          only  in  case  (e.g. "A" and "a") are considered identical.
          Categories of text are said to be case-(in)sensitive if com-
          parisons of such texts to others are case-(in)sensitive.

          A  "cooperating  subnet"  is  a set of news-exchanging hosts
          which is sufficiently well-coordinated (typically via a cen-
          tral  administration of some sort) that stronger assumptions
          can be made about hosts in the set than about news hosts  in
          general.  This is typically used to relax restrictions which
          are otherwise required for worst-case interoperability; mem-
          bers  of  a cooperating subnet MAY interchange articles that
          do not conform to this Draft's specifications, provided  all
          members  have  agreed  to this and provided the articles are
          not permitted to leak out of the subnet.  The word  "subnet"
          is  used to emphasize that a cooperating subnet is typically
          not an isolated universe; care must be  taken  that  traffic
          leaving  the  subnet  complies  with the restrictions of the
          larger net, not just those of the cooperating subnet.

          A "message ID" is a unique identifier for an  article,  usu-
          ally supplied by the posting agent which posted it.  It dis-
          tinguishes the article from every other article ever  posted
          anywhere (in theory).  Articles with the same message ID are
          treated as identical copies of the same article even if they
          are not in fact identical.

          A  "gateway"  is  software  which receives news articles and
          converts them to messages of some other kind (e.g. mail to a
          mailing list), or vice-versa; in essence it is a translating
          relayer that straddles boundaries between different  methods
          of  message  exchange.   The  most  common  type  of gateway



          2 June 1994                 - 7 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 2.3


          connects newsgroup(s) to mailing list(s),  either  unidirec-
          tionally  or  bidirectionally,  but  there are also gateways
          between news networks using this  Draft's  news  format  and
          those using other formats.

          A  "control  message"  is an article which is marked as con-
          taining control information; a  relayer  receiving  such  an
          article  will  (subject  to  permissions  etc.) take actions
          beyond just filing and passing on the article.

               NOTE: "Control article" would be  more  consistent
               terminology, but "control message" is already well
               established.

          An article's "reply address" is the address to which  mailed
          replies  should  be  sent.  This is the address specified in
          the article's From header (see section 5.2), unless it  also
          has a Reply-To header (see section 6.3).

          The  notation  (e.g.)  "(ASCII  17)"  following a name means
          "this name refers to the ASCII character having  value  17".
          An  "ASCII printable character" is an ASCII character in the
          range 33-126.  An "ASCII  control  character"  is  an  ASCII
          character  in  the  range  0-31, or the character DEL (ASCII
          127).  A "non-ASCII character" is a character having a value
          exceeding 127.

               NOTE: Blank is neither an "ASCII printable charac-
               ter" nor an "ASCII control character".


          2.4. End Of Line

          How the end of a text line is  represented  depends  on  the
          context  and  the implementation.  For Internet transmission
          via protocols such as SMTP [rrr], an  end-of-line  is  a  CR
          (ASCII  13)  followed  by an LF (ASCII 10).  ISO C [rrr] and
          many modern operating systems indicate  end-of-line  with  a
          single  character,  typically  ASCII LF (aka "newline"), and
          this is the normal convention when news is  transmitted  via
          UUCP.  A variety of other methods are in use, including out-
          of-band methods in which there is no specific character that
          means end-of-line.

          This Draft does not constrain how end-of-line is represented
          in news, except that characters other than CR  and  LF  MUST
          not  be  usurped  for  use  in  end-of-line representations.
          Also, obviously, all software dealing with a particular copy
          of  an  article  must  agree  on  the convention to be used.
          "EOL" is used to mean "whatever  end-of-line  representation
          is  appropriate";  it  is  not  necessarily  a  character or
          sequence of characters.





          2 June 1994                 - 8 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 2.4


               NOTE: If faced with picking an EOL  representation
               in the absence of other constraints, use of a sin-
               gle character simplifies processing, and the ASCII
               standard  [rrr] specifies that if one character is
               to be used for  this  purpose,  it  should  be  LF
               (ASCII 10).

               NOTE:  Inside  MIME encodings, use of the Internet
               canonical EOL representation (CR followed  by  LF)
               is mandatory.  See [rrr].


          2.5. Case-Sensitivity

          Text  in  newsgroup  names, header parameters, etc. is case-
          sensitive unless stated otherwise.

               NOTE: This is at  variance  with  MAIL,  which  is
               case-insensitive  unless  stated otherwise, but is
               consistent  with  news  historical  practice   and
               existing news software.  See the comments on back-
               ward compatibility in section 1.


          2.6. Language

          Various constant strings in this Draft, such as header names
          and  month  names,  are derived from English words.  Despite
          their derivation, these words do NOT change when the  poster
          or  reader employing them is interacting in a language other
          than English.  Posting and reading agents  SHOULD  translate
          as  appropriate  in  their  interaction  with  the poster or
          reader, but the forms that actually appear in  articles  are
          always the English-derived ones defined in this Draft.


          3. Relation To MAIL (RFC 822 etc.)

          The  primary  intent of this Draft is to completely describe
          the news article format as a subset of MAIL's message format
          augmented by some new headers.  Unless explicitly noted oth-
          erwise, the intent throughout is that an article  MUST  also
          be a valid MAIL message.

               NOTE:  Despite  obvious  similarities between news
               and mail, opinions vary on whether it is  possible
               or  desirable to unify them into a single service.
               However, it is unquestionably  both  possible  and
               useful to employ some of the same tools for manip-
               ulating both mail messages and news  articles,  so
               there  is specific advantage to be had in defining
               them compatibly.  Furthermore, there is no  appar-
               ent need to re-invent the wheel when slight exten-
               sions to an existing definition will suffice.



          2 June 1994                 - 9 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                      sec. 3


          Given that this Draft  attempts  to  be  self-contained,  it
          inevitably  contains  considerable repetition of information
          found in MAIL.  This raises the possibility of unintentional
          conflicts.  Unless specifically noted otherwise, any wording
          in this Draft which  permits  behavior  that  is  not  MAIL-
          compliant  is  erroneous  and should be followed only to the
          extent that the result remains compliant with MAIL.

               NOTE: RFC 1036 said "where this standard conflicts
               with  [RFC 822], RFC-822 should be considered cor-
               rect and this standard in  error".   Taken  liter-
               ally, this was obviously incorrect, since RFC 1036
               imposed a number of restrictions not found in  RFC
               822.   The  intent,  however,  was  reasonable: to
               indicate  that  UNINTENTIONAL   differences   were
               errors in RFC 1036.

          Implementors and users should note that MAIL is deliberately
          an extensible standard, and most extensions devised for mail
          are  also relevant to (and compatible with) news.  Note par-
          ticularly MIME [rrr],  summarized  briefly  in  appendix  B,
          which extends MAIL in a number of useful ways that are defi-
          nitely relevant to news.   Also  of  note  is  the  work  in
          progress  on  reconciling  PEM (Privacy Enhanced Mail, which
          defines extensions for  authentication  and  security)  with
          MIME, after which this may also be relevant to news.

               UNRESOLVED ISSUE: Update the MIME/PEM information.

          Similarly, descriptions here of MIME  facilities  should  be
          considered  correct  only  to  the  extent  that they do not
          require or legitimize practices  that  would  violate  those
          RFCs.   (Note that this Draft does extend the application of
          some MIME facilities, but this is an extension  rather  than
          an alteration.)


          4. Basic Format


          4.1. Overall Syntax

          The overall syntax of a news article is:














          2 June 1994                - 10 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 4.1


               article         = 1*header separator body
               header          = start-line *continuation
               start-line      = header-name ":" space [ nonblank-text ] eol
               continuation    = space nonblank-text eol
               header-name     = 1*name-character *( "-" 1*name-character )
               name-character  = letter / digit
               letter          = 
               digit           = 
               separator       = eol
               body            = *( [ nonblank-text / space ] eol )
               eol             = 
               nonblank-text   = [ space ] text-character *( space-or-text )
               text-character  = 
               space           = 1*(  /  )
               space-or-text   = space / text-character

          An  article consists of some headers followed by a body.  An
          empty line separates the two.  The  headers  contain  struc-
          tured information about the article and its transmission.  A
          header begins with a header name identifying it, and can  be
          continued  onto  subsequent lines by beginning the continua-
          tion line(s) with white space.   (Note  that  section  4.2.3
          adds some restrictions to the header syntax indicated here.)
          The body is largely-unstructured text  significant  only  to
          the poster and the readers.

               NOTE:  Terminology here follows the current custom
               in the news community, rather than the  MAIL  con-
               vention  of  (sometimes) referring to what is here
               called a "header" as a "header field" or  "field".

          Note that the separator line must be truly empty, not just a
          line containing white space.  Further empty lines  following
          it  are  part  of the body, as are empty lines at the end of
          the article.

               NOTE: Some systems  make  no  distinction  between
               empty lines and lines consisting entirely of white
               space;  indeed,  some  systems  cannot   represent
               entirely  empty  lines.  The grammar's requirement
               that header continuation lines contain some print-
               able  text is meant to ensure that the empty/space
               distinction cannot confuse identification  of  the
               separator line.

               NOTE:  It  is tempting to authorize posting agents
               to strip empty lines at the beginning and  end  of
               the  body,  but such empty lines could possibly be
               part of a preformatted document.

          Implementors are warned that trailing white  space,  whether
          alone  on  the  line or not, MAY be significant in the body,



          2 June 1994                - 11 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 4.1


          notably in early versions of  the  "uuencode"  encoding  for
          binary  data.  Trailing white space MUST be preserved unless
          the article is known to have originated within a cooperating
          subnet  that  avoids using significant trailing white space,
          and SHOULD be preserved regardless.   Posters  SHOULD  avoid
          using  conventions  or  encodings  which make trailing white
          space significant;  for  encoding  of  binary  data,  MIME's
          "base64"  encoding  is recommended.  Implementors are warned
          that ISO C implementations  are  not  required  to  preserve
          trailing  white space, and special precautions may be neces-
          sary in implementations which do not.

               NOTE: Unfortunately, the signature-delimiter  con-
               vention (described in section 4.3.2) does use sig-
               nificant trailing white space.  It's too  late  to
               fix  this;  there  is work underway on defining an
               organized signature convention as  part  of  MIME,
               which is a preferable solution in the long run.

          Posters  are warned that some very old relayer software mis-
          behaves when the first non-empty line  of  an  article  body
          begins with white space.


          4.2. Headers


          4.2.1. Names and Contents

          Despite  the  restrictions  on header-name syntax imposed by
          the grammar, relayers and  reading  agents  SHOULD  tolerate
          header  names containing any ASCII printable character other
          than colon (":", ASCII 58).

               NOTE: MAIL header  names  can  contain  any  ASCII
               printable  character (other than colon) in theory,
               but in practice, arbitrary header names are  known
               to  cause trouble for some news software.  Section
               4.1's restriction to alphanumeric sequences  sepa-
               rated by hyphens is believed to permit all widely-
               used header names without causing problems for any
               widely-used  software.   Software  is nevertheless
               encouraged to cope correctly with the  full  range
               of  possibilities,  since aberrations are known to
               occur.

          Relayers MUST disregard headers not described in this  Draft
          (that  is,  with  header names not mentioned in this Draft),
          and pass them on unaltered.

          Posters wishing to convey non-standard information in  head-
          ers  SHOULD  use header names beginning with "X-".  No stan-
          dard header name will ever be of this form.  Reading  agents
          SHOULD  ignore  "X-"  headers,  or  at least treat them with



          2 June 1994                - 12 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                  sec. 4.2.1


          great care.

          The order of headers in an article is not significant.  How-
          ever, posting agents are encouraged to put mandatory headers
          (see section 5) first, followed  by  optional  headers  (see
          section 6), followed by headers not defined in this Draft.

               NOTE:  While  relayers  and reading agents must be
               prepared to handle any order, having the  signifi-
               cant  headers (the precise definition of "signifi-
               cant" depends on  context)  first  can  noticeably
               improve  efficiency,  especially in memory-limited
               environments where it is difficult to buffer up an
               arbitrary  quantity of headers while searching for
               the few that matter.

          Header names are case-insensitive.   There  is  a  preferred
          case  convention,  which  posters  and posting agents SHOULD
          use: each hyphen-separated "word" has its initial letter (if
          any)  in  uppercase  and  the rest in lowercase, except that
          some abbreviations have all letters  uppercase  (e.g.  "Mes-
          sage-ID"  and "MIME-Version").  The forms used in this Draft
          are the preferred forms for the  headers  described  herein.
          Relayers  and  reading agents are warned that articles might
          not obey this convention.

               NOTE: Although software must be prepared  for  the
               possibility  of random use of case in header names
               (and other case-independent text), establishing  a
               preferred  convention reduces pointless diversity,
               and may permit optimized software that  looks  for
               the  preferred  forms  before  resorting  to less-
               efficient case-insensitive searches.

          In general, a header can consist of several lines, with each
          continuation line beginning with white space.  The EOLs pre-
          ceding continuation lines are ignored when processing such a
          header, effectively combining the start-line and the contin-
          uations into a single logical line.  The logical line,  less
          the  header  name,  colon, and any white space following the
          colon, is the "header content".


          4.2.2. Undesirable Headers

          A header whose content is empty  is  said  to  be  an  empty
          header.   Relayers  and  reading  agents SHOULD not consider
          presence or absence of an empty header to alter  the  seman-
          tics  of  an  article  (although  syntactic  rules,  such as
          requirements that certain header names appear at  most  once
          in  an  article,  MUST  still be satisfied).  Posting agents
          SHOULD delete empty headers  from  articles  before  posting
          them.




          2 June 1994                - 13 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                  sec. 4.2.2


          Headers  that merely state defaults explicitly (e.g., a Fol-
          lowup-To header with the  same  content  as  the  Newsgroups
          header,   or   a  MIME  Content-Type  header  with  contents
          "text/plain; charset=us-ascii") or  state  information  that
          reading  agents  can  typically  determine easily themselves
          (e.g. the length of the body in octets) are redundant,  con-
          veying no information whatsoever.  Headers that state infor-
          mation which cannot possibly be of use to a significant num-
          ber  of relayers, reading agents, or readers (e.g., the name
          of the software package used as the posting agent) are  use-
          less and pointless.  Posters and posting agents SHOULD avoid
          including redundant or useless headers in articles.

               NOTE: Information that someone,  somewhere,  might
               someday  find useful is best omitted from headers.
               (There's quite enough of it  in  article  bodies.)
               Headers  should contain information of known util-
               ity only.  This is not meant to preclude inclusion
               of  information  primarily meant for news-software
               debugging, but such information should be included
               only  if there is real reason, preferably based on
               experience, to suspect that it  may  be  genuinely
               useful.  Articles passing through gateways are the
               only obvious case  where  inclusion  of  debugging
               information appears clearly legitimate.  (See sec-
               tion 10.1.)

               NOTE: A useful rule of thumb for  software  imple-
               mentors  is:  "if  I had to pay a dollar a day for
               the transmission of this  header,  would  I  still
               think it worthwhile?".


          4.2.3. White Space and Continuations

          The  colon  following the header name on the start-line MUST
          be followed by white space, even if the header is empty.  If
          the  header  is not empty, at least some of the content MUST
          appear on the start-line.  Posting agents MUST enforce these
          restrictions,  but  relayers (etc.) SHOULD accept even arti-
          cles that violate them.

               NOTE: MAIL does not require white space after  the
               colon,  but  it  is  usual.  RFC 1036 required the
               white space,  even  in  empty  headers,  and  some
               existing   software  demands  it.   In  MAIL,  and
               arguably in RFC  1036  (although  the  wording  is
               vague), it is technically legitimate for the white
               space to be part of  a  continuation  line  rather
               than the start-line, but not all existing software
               will accept  this.   Deleting  empty  headers  and
               placing some content on the start-line avoids this
               issue...  which  is  desirable  because   trailing
               blanks,  easily  deleted by accident, are best not



          2 June 1994                - 14 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                  sec. 4.2.3


               made significant in headers.

          In general, posters and  posting  agents  SHOULD  use  blank
          (ASCII  32), not tab (ASCII 9), where white space is desired
          in headers.  Existing software does not consistently  accept
          tab  as  synonymous with blank in all contexts.  In particu-
          lar, RFC 1036 appeared to specify that the character immedi-
          ately  following  the colon after a header name was required
          to be a blank, and some news software insists  on  that,  so
          this  character MUST be a blank.  Again, posting agents MUST
          enforce these restrictions but relayers SHOULD be more  tol-
          erant.

          Since  the white space beginning a continuation line remains
          a part of the logical line, headers  can  be  "broken"  into
          multiple  lines  only at white space.  Posting agents SHOULD
          not break headers unnecessarily.  Relayers  SHOULD  preserve
          existing header breaks, and SHOULD not introduce new breaks.
          Breaking headers SHOULD be a last resort; relayers and read-
          ing agents SHOULD handle long header lines gracefully.  (See
          the discussion of size limits in section 4.6.)


          4.3. Body

          Although the article body is unstructured for  most  of  the
          purposes  of  this  Draft, structure MAY be imposed on it by
          other means, notably MIME headers (see appendix B).


          4.3.1. Body Format Issues

          The body of an article MAY be empty, although posting agents
          SHOULD  consider this an error condition (meriting returning
          the article to the poster for revision).   A  posting  agent
          which does not reject such an article SHOULD issue a warning
          message to the poster and supply  a  non-empty  body.   Note
          that  the separator line MUST be present even if the body is
          empty.

               NOTE: An empty body is  probably  a  poster  error
               except, arguably, for some control messages... and
               even they really ought to have a  body  explaining
               the  reason  for  the  control  message.  Some old
               reading agents are known to generate empty  bodies
               for  "cancel"  control messages, so posting agents
               might opt not to reject body-less articles in such
               cases  (although  it  would  be  better to fix the
               reading agents to request a body).  However,  some
               existing  news software is known to react badly to
               body-less articles, hence the request for  posting
               agents to insert a body in such cases.





          2 June 1994                - 15 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                  sec. 4.3.1


               NOTE:  A possible posting-agent-supplied body text
               (already used by one widespread posting agent)  is
               "This  article  was  probably generated by a buggy
               news reader.".  (The use of "reader" to  refer  to
               the  reading  agent  is traditional, although this
               Draft uses more precise terminology.)

               NOTE: The requirement for the separator line  even
               in  a bodyless article is inherited from MAIL, and
               also distinguishes legitimately-bodyless  articles
               from articles accidentally truncated in the middle
               of the headers.

          Note that an article body is a sequence of lines  terminated
          by  EOLs,  not  arbitrary  binary data, and in particular it
          MUST end with an EOL.  However, relayers  SHOULD  treat  the
          body  of  an  article as an uninterpreted sequence of octets
          (except as mandated by changes of EOL representation and  by
          control-message  processing)  and SHOULD avoid imposing con-
          straints on it.  See also section 4.6.


          4.3.2. Body Conventions

          Although body lines can in principle be very long (see  sec-
          tion  4.6  for  some  discussion  of length limits), posters
          SHOULD restrict body line lengths to circa 70-75 characters.
          On  systems  where  text  is conventionally stored with EOLs
          only at paragraph breaks and  other  "hard  return"  points,
          with  software  breaking lines as appropriate for display or
          manipulation, posting agents SHOULD insert EOLs as necessary
          so that posted articles comply with this restriction.

               NOTE:  News  originated in environments where line
               breaks in plain text files were  supplied  by  the
               user, not the software.  Be this good or bad, much
               reading-agent and posting-agent  software  assumes
               that  news  articles follow this convention, so it
               is often inconvenient to read or respond to  arti-
               cles  which  violate it.  The "70-75" number comes
               from the widespread use of display  devices  which
               are 80 columns wide, and the desire to leave a bit
               of margin for quoting etc. (see below).

          Reading agents confronted with body lines much  longer  than
          the  available  output-device  width  SHOULD  break lines as
          appropriate.  Posters are warned that such  breaks  may  not
          occur exactly where the poster intends.

               NOTE:  "As  appropriate"  would  typically include
               breaking lines when supplying the text of an arti-
               cle to be quoted in a reply or followup, something
               that line-breaking reading agents often neglect to
               do now.



          2 June 1994                - 16 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                  sec. 4.3.2


          Although  styles  vary widely, for plain text it is usual to
          use no left margin, leave the right edge ragged, use a  sin-
          gle  empty  line  to  separate paragraphs, and employ normal
          natural-language usage on matters such  as  upper/lowercase.
          (In  particular,  articles SHOULD not be written entirely in
          uppercase.  In environments where posters have  access  only
          to  uppercase,  posting agents SHOULD translate it to lower-
          case.)

               NOTE: Most people find substantial bodies of  text
               entirely  in  uppercase  relatively  hard to read,
               while all-lowercase  text  merely  looks  slightly
               odd.   The  common  association  of uppercase with
               strong emphasis adds to this.

          Tone of voice does not carry well in written text, and  mis-
          understandings are common when sarcasm, parody, or exaggera-
          tion for humorous effect is attempted without explicit warn-
          ing.   It has become conventional to use the sequence ":-)",
          which (on most output devices) resembles a  rotated  "smiley
          face"  symbol,  as  a  marker for text not meant to be taken
          literally, especially when humor is intended.  This practice
          aids  communication  and averts unintended ill-will; posters
          are urged to use it.  A variety of analogous  sequences  are
          used with less-standardized meanings [Sanderson].

          The  order  of arrival of news articles at a particular host
          depends somewhat on  transmission  paths,  and  occasionally
          articles are lost for various reasons.  When responding to a
          previous article, posters SHOULD not assume that all readers
          understand the exact context.  It is common to quote some of
          the previous article to establish context.  This  SHOULD  be
          done  by  prefacing  each  quoted line (even if it is empty)
          with the character ">".  This will result in multiple levels
          of ">" when quoted context itself contains quoted context.

               NOTE:  It  may seem superfluous to put a prefix on
               empty lines, but it simplifies  implementation  of
               functions  such as "skip all quoted text" in read-
               ing agents.

          Readability is enhanced if quoted text and new text are sep-
          arated by an empty line.

          Posters  SHOULD  edit  quoted context to trim it down to the
          minimum  necessary.   However,  posting  agents  SHOULD  not
          attempt  to enforce this by imposing overly-simplistic rules
          like "no more than 50% of the lines should be quotes".

               NOTE: While encouraging trimming is desirable, the
               50%  rule  imposed  by  some old posting agents is
               both inadequate and counterproductive.  Posters do
               not  respond  to  it by being more selective about
               quoting; they respond by padding short  responses,



          2 June 1994                - 17 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                  sec. 4.3.2


               or  by  using  different  quoting styles to defeat
               automatic analysis.  The former  adds  unnecessary
               noise  and  volume,  while the latter also defeats
               more useful forms of automatic analysis that read-
               ing agents might wish to do.

               NOTE:  At  the  very  least, if a minimum-unquoted
               quota is being set, article  bodies  shorter  than
               (say)  20  lines, or perhaps articles which exceed
               the quota by only a few lines, should  be  exempt.
               This  avoids the ridiculous situation of complain-
               ing about a 5-line response to a 6-line quote.

               NOTE: A more subtle posting-agent rule,  suggested
               for  experimental  use, is to reject articles that
               appear to contain quoted signatures  (see  below).
               This  is almost certainly the result of a careless
               poster not bothering to trim down quoted  context.
               Also,  if  a  posting agent or followup agent pre-
               sents an article template to the poster for  edit-
               ing,  it  really  should  take note of whether the
               poster actually made any changes, and refrain from
               posting an unmodified template.

          Some  followup  agents supply "attribution" lines for quoted
          context, indicating where it first appeared and under  whose
          name.   When  multiple  levels  of  quoting  are present and
          quoted context is edited for  brevity,  "inner"  attribution
          lines  are not always retained.  The editing process is also
          somewhat error-prone.   Reading  agents  (and  readers)  are
          warned not to assume that attributions are accurate.

               UNRESOLVED  ISSUE:  Should  a  standard format for
               attribution lines be defined?   There  is  already
               considerable diversity... but automatic news anal-
               ysis would be substantially aided  by  a  standard
               convention.

          Early  difficulties in inferring return addresses from arti-
          cle headers led to "signatures": short closing texts,  auto-
          matically  added  to  the end of articles by posting agents,
          identifying the poster and giving his network addresses etc.
          If  a  poster or posting agent does append a signature to an
          article, the signature SHOULD be preceded with  a  delimiter
          line  containing  (only)  two hyphens (ASCII 45) followed by
          one blank (ASCII  32).   Posting  agents  SHOULD  limit  the
          length  of  signatures,  since  verbose  excess bordering on
          abuse is common if no restraint is imposed;  4  lines  is  a
          common limit.

               NOTE:  While  signatures  are  arguably a blemish,
               they are a well-understood convention, and convey-
               ing  the same information in headers exposes it to
               mangling and makes it rather less conspicuous.   A



          2 June 1994                - 18 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                  sec. 4.3.2


               standard  delimiter  line  makes  it  possible for
               reading agents to handle signatures  specially  if
               desired.    (This  is  unfortunately  hampered  by
               extensive misunderstanding of, and misuse of,  the
               delimiter.)

               NOTE: The choice of delimiter is somewhat unfortu-
               nate, since it relies on preservation of  trailing
               white  space,  but  it  is too well-established to
               change.  There is work underway to define  a  more
               sophisticated  signature  scheme  as part of MIME,
               and this will  presumably  supersede  the  current
               convention in due time.

               NOTE:  Four  75-column  lines of signature text is
               300 characters, which is ample to convey name  and
               mail-address  information  in  all  but  the  most
               bizarre situations.


          4.4. Characters And Character Sets

          Header and body lines MAY contain any ASCII characters other
          than CR (ASCII 13), LF (ASCII 10), and NUL (ASCII 0).

               NOTE:  CR  and  LF are excluded because they clash
               with common  EOL  conventions.   NUL  is  excluded
               because  it  clashes with the C end-of-string con-
               vention, which is  significant  to  most  existing
               news   software.    These   three  characters  are
               unlikely to be transmitted successfully.

          However, posters SHOULD avoid using ASCII control characters
          except for tab (ASCII 9), formfeed (ASCII 12), and backspace
          (ASCII 8).  Tab signifies sufficient horizontal white  space
          to  reach  the next of a set of fixed positions; posters are
          warned that there is no standard set of positions,  so  tabs
          should be avoided if precise spacing is essential.  Formfeed
          signifies a point at which a reading agent SHOULD pause  and
          await  reader  interaction  before  displaying further text.
          Backspace SHOULD be used only for  underlining,  done  by  a
          sequence of underscores (ASCII 95) followed by an equal num-
          ber of backspaces, signifying that the same number  of  text
          characters  following  are  to  be  underlined.  Posters are
          warned that underlining  is  not  available  on  all  output
          devices  and  is  best  not relied on for essential meaning.
          Reading agents SHOULD recognize underlining and translate it
          to the appropriate commands for devices that support it.

               NOTE: Interpretation of almost all control charac-
               ters  is  device-specific  to  some  degree,   and
               devices  differ.   Tabs  and  underlining are sup-
               ported, to some extent, by most modern devices and
               reading  agents, hence the cautious exemptions for



          2 June 1994                - 19 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 4.4


               them.  The underlining method is specified because
               the  inverse method, text and then underscores, is
               tempting to the naive... but if sent unaltered  to
               a  device  that shows only the most recent of sev-
               eral overstruck characters rather than  a  compos-
               ite, the result can be utterly unreadable.

               NOTE: A common interpretation of tab is that it is
               a request to space forward to  the  next  position
               whose  number  is  one  more than a multiple of 8,
               with positions numbered sequentially  starting  at
               1.  (So tab positions are 9, 17, 25, ...)  Reading
               agents not constrained by existing system  conven-
               tions might wish to use this interpretation.

               NOTE: It will typically be necessary for a reading
               agent to catch and interpret  formfeed,  not  just
               send  it  to  the output device.  The actions per-
               formed by typical output devices  on  receiving  a
               formfeed  are neither adequate for nor appropriate
               to the pause-for-interaction meaning.

          Cooperating subnets which wish to employ non-ASCII character
          sets  by using escape sequences (employing, e.g., ESC (ASCII
          27), SO (ASCII 14), and SI (ASCII 15)) to alter the  meaning
          of  superficially-ASCII  characters  MAY do so, but MUST use
          MIME headers to alert reading agents to the particular char-
          acter  set(s)  and escape sequences in use.  A reading agent
          SHOULD not pass such an escape sequence through,  unaltered,
          to  the  output  device  unless  the agent confirms that the
          sequence is one used to affect character sets and has reason
          to  believe  that the device is capable of interpreting that
          particular sequence properly.

               NOTE:  Cooperating-subnet  organizers  are  warned
               that  some very old relayers strip certain control
               characters out of articles they pass  along.   ESC
               is known to be among the affected characters.

               NOTE:  There  are  now standard Internet encodings
               for Japanese [rrr] and Vietnamese [rrr] in partic-
               ular.

          Articles  MUST  not  contain  any octet with value exceeding
          127, i.e. any octet that is not an ASCII character.

               NOTE: This rule, like others, may  be  relaxed  by
               unanimous  consent of the members of a cooperating
               subnet, provided suitable precautions are taken to
               ensure  that  rule-violating  articles do not leak
               out of the subnet.  (This has already been done in
               many  areas  where  ASCII  is not adequate for the
               local language(s).)  Beware that articles contain-
               ing non-ASCII octets in headers are a violation of



          2 June 1994                - 20 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 4.4


               the MAIL specifications and  are  not  valid  MAIL
               messages.   MIME  offers a way to encode non-ASCII
               characters in ASCII for use in headers;  see  sec-
               tion 4.5.

               NOTE: While there is great interest in using 8-bit
               character sets, not all software  can  yet  handle
               them  correctly.  Hence the restriction to cooper-
               ating subnets.  MIME  encodings  can  be  used  to
               transmit  such  characters  while remaining within
               the octet restriction.

          In anticipation of the day when it is possible to  use  non-
          ASCII  characters  safely  anywhere,  and to provide for the
          (substantial) cooperating subnets  that  are  already  using
          them, transmission paths SHOULD treat news articles as unin-
          terpreted sequences of octets (except perhaps for  transfor-
          mations  between  EOL  representations)  and relayers SHOULD
          treat non-ASCII characters in articles as  ordinary  charac-
          ters.

               NOTE:  8-bit  enthusiasts  are warned that not all
               software conforms to  these  recommendations  yet.
               In particular, standard NNTP [rrr] is a 7-bit pro-
               tocol, and  there  may  be  implementations  which
               enforce  this rule.  Be warned, also, that it will
               never be safe to send raw binary data in the  body
               of news articles, because changes of EOL represen-
               tation may (will!) corrupt it.

          Except  where  cooperating  subnets   permit   more   direct
          approaches,  MIME [rrr] headers and encodings SHOULD be used
          to transmit non-ASCII content using  ASCII  characters;  see
          section  4.5, appendix B, and the MIME RFCs for details.  If
          article content can be expressed in  ASCII,  it  SHOULD  be.
          Failing  that, the order of preference for character sets is
          that described in MIME [rrr].

               NOTE: Using the MIME facilities, it is possible to
               transmit ANY character set, and ANY form of binary
               data, using only ASCII characters.  Equally impor-
               tant,  such  articles  are self-describing and the
               reading agent can tell which octet-to-symbol  map-
               ping  is  intended!  Designation of some preferred
               character sets is intended to minimize the  number
               of character sets that a reading agent must under-
               stand in order to display most articles  properly.

          Articles  containing  non-ASCII  characters,  articles using
          ASCII characters (values 0 through 127)  to  refer  to  non-
          ASCII  symbols, and articles using escape sequences to shift
          character sets SHOULD include MIME headers indicating  which
          character set(s) and conventions are being used, and MUST do
          so  unless  such  articles  are  strictly  confined   to   a



          2 June 1994                - 21 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 4.4


          cooperating subnet which has its own pre-agreed conventions.
          MIME encodings are preferred over all these techniques.   If
          it  comes to a relayer's attention that it is being asked to
          pass an article using such techniques outward across what it
          knows  to  be  the boundary of such a cooperating subnet, it
          MUST report this error to its administrator, and MAY  refuse
          to  pass the article beyond the subnet boundary.  If it does
          pass the article, it MUST re-encode it with  MIME  encodings
          to make it conform to this Draft.

               NOTE:  Such re-encoding is a non-trivial task, due
               to MIME rules such as the  prohibition  of  nested
               encodings.   It's not just a matter of pouring the
               body through a simple filter.

          Reading agents SHOULD note MIME headers and attempt to  show
          the   reader  the  closest  possible  approximation  to  the
          intended content.  They SHOULD not just send the  octets  of
          the  article to the output device unaltered, unless there is
          reason to believe that the output device will indeed  inter-
          pret  them  correctly.   Reading  agents MUST not pass ASCII
          control characters or escape sequences, other than  as  dis-
          cussed above, unaltered to the output device; only by chance
          would the result be the desired one, and  there  is  serious
          potential  for  harmful  side  effects, either accidental or
          malicious.

               NOTE: Exactly what to  do  with  unwanted  control
               characters/sequences  depends on the philosophy of
               the reading agent, but passing  them  straight  to
               the  output device is almost always wrong.  If the
               reading agent wants to mark the presence of such a
               character/sequence  in  circumstances  where  only
               ASCII printable characters are  available,  trans-
               lating  it  to "#" might be a suitable method; "#"
               is a conspicuous character seldom used  in  normal
               text.

               NOTE: Reading agents should be aware that many old
               output devices (or the transmission paths to them)
               zero out the top bit of octets sent to them.  This
               can transform non-ASCII characters into ASCII con-
               trol characters.

          Followup  agents MUST be careful to apply appropriate trans-
          formations of representation to  the  outbound  followup  as
          well  as  the  inbound  precursor.  A followup to an article
          containing non-ASCII material is very likely to contain non-
          ASCII material itself.








          2 June 1994                - 22 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 4.5


          4.5. Non-ASCII Characters In Headers

          All  octets found in headers MUST be ASCII characters.  How-
          ever, it is desirable to have a way  of  encoding  non-ASCII
          characters,  especially  in "human-readable" headers such as
          Subject.  MIME [rrr]  provides  a  way  to  do  this.   Full
          details  may be found in the MIME specifications; herewith a
          quick summary to alert software authors to the issues...

               encoded-word  = "=?" charset "?" encoding "?" codes "?="
               charset       = 1*tag-char
               encoding      = 1*tag-char
               tag-char      = @,;:\"[]/?=>
               codes         = 1*code-char
               code-char     = 

          An encoded word is a sequence of ASCII printable  characters
          that  specifies the character set, encoding method, and bits
          of (potentially) non-ASCII characters.   Encoded  words  are
          allowed  only in certain positions in certain headers.  Spe-
          cific headers impose restrictions on the content of  encoded
          words beyond that specified in this section.  Posting agents
          MUST ensure that any material  resembling  an  encoded  word
          (complete  with  all delimiters), in a context where encoded
          words may appear, really is an encoded word.

               NOTE: The  syntax  is  a  bit  ugly,  but  it  was
               designed  to  minimize  chances  of confusion with
               legitimate header contents, and to satisfy  diffi-
               cult constraints on use within existing headers.

          An  encoded word MUST not be more than 75 octets long.  Each
          line of a header containing encoded word(s) MUST be at  most
          76 octets long, not counting the EOL.

               NOTE:  These  limits are meant to bound the looka-
               head needed to determine whether text that  begins
               "=?" is really an encoded word.

          The  details  of  charsets and encodings are defined by MIME
          [rrr]; the sequence of preferred character sets is the  same
          as  MIME's.   Encoded  words  SHOULD not be used for content
          expressible in ASCII.

          When an encoded word is used, other than in a newsgroup name
          (see  section  5.5),  it MUST be separated from any adjacent
          non-space characters  (including  other  encoded  words)  by
          white  space.   Reading  agents  displaying  the contents of
          encoded words (as opposed  to  their  encoded  form)  should
          ignore white space adjacent to encoded words.

               UNRESOLVED  ISSUE:  Should this section be deleted
               entirely, or made much more terse?   The  material
               is relevant, but too complex to discuss fully.



          2 June 1994                - 23 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 4.5


               NOTE: The deletion of intervening white space per-
               mits using multiple encoded words, implicitly con-
               catenated  by  the  deletion,  to encode text that
               will not fit within a single 75-character  encoded
               word.

          Reading-agent  implementors  are  warned  that although this
          Draft completely specifies where encoded words may appear in
          the  headers  it  defines, there are other headers (e.g. the
          MIME Content-Description header) that MAY contain them.


          4.6. Size Limits

          Implementations SHOULD avoid fixed constraints on the  sizes
          of  lines  within  an  article and on the size of the entire
          article.

          Relayers SHOULD treat the body of an article as an  uninter-
          preted  sequence of octets (except as mandated by changes of
          EOL representation and processing of control messages),  not
          to be altered or constrained in any way.

          If  it  is  absolutely  necessary  for  an implementation to
          impose a limit on the length of header lines, body lines, or
          header  logical  lines,  that  limit  shall be at least 1000
          octets, including EOL representations.  Relayers and  trans-
          mission  paths  confronted  with lines beyond their internal
          limits (if any)  MUST  not  simply  inject  EOLs  at  random
          places;  they MAY break headers (as described in 4.2.3) as a
          last resort, and otherwise they MUST either  pass  the  long
          lines  through  unaltered,  or refuse to pass the article at
          all (see section 9.1 for further discussion).

               NOTE: The limit here is essentially the same mini-
               mum  as  that  specified  for SMTP mail in RFC 821
               [rrr].  Implementors are  warned  that  Path  (see
               section  5.6)  and  References  (see  section 6.5)
               headers, in particular, often become several  hun-
               dred  characters  long,  so  1000 is not an overly
               generous limit.

          All implementations  MUST  be  able  to  handle  an  article
          totalling  at least 65,000 octets, including headers and EOL
          representations, gracefully and efficiently.  All  implemen-
          tations  SHOULD  be  able  to handle an article totalling at
          least 1,000,000 (one million) octets, including headers  and
          EOL  representations,  gracefully  and efficiently.  "Grace-
          fully and efficiently" is  intended  to  preclude  not  only
          failures,  but also major loss of performance, serious prob-
          lems in error recovery, or resource consumption beyond  what
          is reasonably necessary.





          2 June 1994                - 24 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 4.6


               NOTE:  The intent here is to prohibit lowering the
               existing  de-facto  limit   any   further,   while
               strongly  encouraging  movement  towards  a higher
               one.  Actually, although improvements  are  desir-
               able  in some cases, much news software copes rea-
               sonably well with very large articles.   The  same
               cannot  be said of the communications software and
               protocols used to transmit news from one  host  to
               another, especially when slow communications links
               are  involved.   Occasional  huge  articles   that
               appear now (by accident or through ignorance) typ-
               ically leave trails of  failing  software,  system
               problems,  and irate administrators in their wake.

               NOTE: It is intended that the  successor  to  this
               Draft will raise the "MUST" limit to 1,000,000 and
               the "SHOULD" limit still further.

          Posters SHOULD limit  posted  articles  to  at  most  60,000
          octets,  including  headers  and EOL representations, unless
          the articles are being posted only within a cooperating sub-
          net which is known to be capable of handling larger articles
          gracefully.  Posting agents presented with a  large  article
          SHOULD warn the poster and request confirmation.

               NOTE:  The difference between this and the earlier
               "MUST" limit is margin for header growth,  differ-
               ing  EOL  representations,  and transmission over-
               heads.

               NOTE: Disagreeable though these limits are, it  is
               a fact that in current networks, an article larger
               than 64K (after header growth etc.) simply is  not
               transmitted  reliably.   Note  also  the  comments
               above on the trauma caused  by  single  extremely-
               large articles now; the problems are real and cur-
               rent.  These problems arguably  should  be  fixed,
               but this will not happen network-wide in the imme-
               diate future.  Hence  the  restriction  of  larger
               articles to cooperating subnets, for now.

          Posters  using  non-ASCII characters in their text MUST take
          into account the overhead involved in MIME encoding,  unless
          the  article's  propagation  will  be  entirely limited to a
          cooperating subnet which does not  use  MIME  encodings  for
          non-ASCII  characters.   For  example,  MIME base64 encoding
          involves growth by a factor  of  approximately  4/3,  so  an
          article  which would likely have to use this encoding should
          be at most about 45,000 octets before encoding.

          Posters SHOULD use  MIME  "message/partial"  conventions  to
          facilitate  automatic  reassembly  of a large document split
          into smaller pieces for posting.  It is recommended that the
          content identifier used should be a message ID, generated by



          2 June 1994                - 25 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 4.6


          the same means as article message IDs (see section 5.3), and
          that  all  parts  should have a See-Also header (see section
          6.16) giving the message IDs of at least the previous  parts
          and preferably all the parts.

               NOTE:  See-Also  is  more correct for this purpose
               than References, although References is in  common
               use  today  (with  less-formal reassembly arrange-
               ments).  MIME reassemblers should probably examine
               articles  suggested  by References headers if See-
               Also headers  are  not  present  to  indicate  the
               whereabouts   of   the   other   parts   of  "mes-
               sage/partial" articles.

          To repeat: implementations SHOULD avoid fixed constraints on
          the  sizes of lines within an article and on the size of the
          entire article.


          4.7. Example

          Here is a sample article:

               From: jerry@eagle.ATT.COM (Jerry Schwarz)
               Path: cbosgd!mhuxj!mhuxt!eagle!jerry
               Newsgroups: news.announce
               Subject: Usenet Etiquette -- Please Read
               Message-ID: <642@eagle.ATT.COM>
               Date: Mon, 17 Jan 1994 11:14:55 -0500 (EST)
               Followup-To: news.misc
               Expires: Wed, 19 Jan 1994 00:00:00 -0500
               Organization: AT&T Bell Laboratories, Murray Hill

               body
               body
               body



          5. Mandatory Headers

          An article MUST have one, and only one, of each of the  fol-
          lowing headers: Date, From, Message-ID, Subject, Newsgroups,
          Path.

               NOTE: MAIL specifies (if read most carefully) that
               there  must be exactly one Date header and exactly
               one From header, but otherwise does  not  restrict
               multiple  appearances  of  headers.   (Notably, it
               permits  multiple   Message-ID   headers!)    This
               appears  singularly  useless,  or even harmful, in
               the context of news, and much current  news  soft-
               ware  will  not  tolerate  multiple appearances of
               mandatory headers.



          2 June 1994                - 26 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                      sec. 5


          Note also that there are situations, discussed in the  rele-
          vant  parts  of  section  6,  where  References,  Sender, or
          Approved headers are mandatory.

          In the discussions of the individual headers, the content of
          each is specified using the syntax notation.  The convention
          used is that the content of, for example, the Subject header
          is defined as .


          5.1. Date

          The  Date header contains the date and time when the article
          was submitted for transmission:

               Date-content  = [ weekday "," space ] date space time
               weekday       = "Mon" / "Tue" / "Wed" / "Thu"
                             / "Fri" / "Sat" / "Sun"
               date          = day space month space year
               day           = 1*2digit
               month         = "Jan" / "Feb" / "Mar" / "Apr" / "May" / "Jun"
                             / "Jul" / "Aug" / "Sep" / "Oct" / "Nov" / "Dec"
               year          = 4digit / 2digit
               time          = hh ":" mm [ ":" ss ] space timezone
               timezone      = "UT" / "GMT"
                             / ( "+" / "-" ) hh mm [ space "(" zone-name ")" ]
               hh            = 2digit
               mm            = 2digit
               ss            = 2digit
               zone-name     = 1*(  / space )

          This is a restricted subset of the MAIL date format.

          If a weekday is given, it MUST be consistent with the  date.
          The  modern  Gregorian  calendar  is used, and dates MUST be
          consistent with its usual conventions; for example,  if  the
          month  is  May,  the day must be between 1 and 31 inclusive.
          The year SHOULD be given as four digits, and posting  agents
          SHOULD  enforce this; however, relayers MUST accept the two-
          digit form, and MUST interpret it  as  having  the  implicit
          prefix "19".

               NOTE: Two-digit year numbers can, should, and must
               be phased out by 1999.

          The time is given on  the  24-hour  clock,  e.g.  two  hours
          before  midnight  is  "22:00" or "22:00:00".  The hh must be
          between 00 and 23 inclusive, the mm between 0 and 59  inclu-
          sive, and the ss between 0 and 61 inclusive.

               NOTE:  Leap  seconds  very  occasionally result in
               minutes that are 61 or 62 seconds long.





          2 June 1994                - 27 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 5.1


          The date and time SHOULD be  given  in  the  poster's  local
          timezone,  including  a  specification of that timezone as a
          numeric offset (which SHOULD include the timezone name, e.g.
          "EST",  supplied  in  parentheses  like a MAIL comment).  If
          not, they MUST be given in Universal Time (abbreviated "UT";
          "GMT"  is a historical synonym for "UT").  The timezone name
          in parentheses, if present,  is  a  comment;  software  MUST
          ignore  it, except that reading agents might wish to display
          it to the reader.  Timezone names other than "UT" and  "GMT"
          MUST appear only in the comment.

               NOTE: Attempts to deal with a full set of timezone
               names have all foundered on  the  vast  number  of
               such  names in use and the duplications (for exam-
               ple, there are at least FIVE  different  timezones
               called  "EST"  by somebody).  Even the limited set
               of North American zone names authorized by MAIL is
               subject to confusion and misinterpretation.  Hence
               the flat ban on non-UT timezone  names  except  as
               comments.

               NOTE:  RFC 1036 specified that use of GMT (aka UT,
               UTC) was preferred.  However, the local  time  (in
               the  poster's timezone) is arguably information of
               possible interest to the reader, and this requires
               some indication of the poster's timezone.  Numeric
               offsets are an unambiguous way of doing this,  and
               their  use was indeed sanctioned by RFC 1036 (that
               is, this is a change of preference only).

               NOTE:  There  is  frequent  confusion,   including
               errors  in  some news software, regarding the sign
               of numeric timezones.   Zones  west  of  Greenwich
               have  negative offsets.  For example, North Ameri-
               can Eastern Standard Time is zone -0500 and  North
               American Eastern Daylight Time is zone -0400.

               NOTE:  Implementors  are  warned  that the hh in a
               timezone can go up to about 14; it is not  limited
               to  12.   This  is  because the International Date
               Line does  not  run  exactly  along  the  boundary
               between zone -1200 and zone +1200.

               NOTE: The comments in section 2.6 regarding trans-
               lation to other languages are relevant here.   The
               Date-content format, and the spellings of its com-
               ponents, as  found  in  articles  themselves,  are
               always as defined in this Draft, regardless of the
               language  used  to  interact  with   readers   and
               posters.  Reading and posting agents should trans-
               late  as  appropriate.   Actually,  even  English-
               language  reading and posting agents will probably
               want to do some degree of translation on dates, if
               only   to   abbreviate   the  lengthy  format  and



          2 June 1994                - 28 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 5.1


               (perhaps) translate to and from the reader's time-
               zone.


          5.2. From

          The  From header contains the electronic address, and possi-
          bly the full name, of the article's author:

               From-content  = address [ space "(" paren-phrase ")" ]
                             /  [ plain-phrase space ] "<" address ">"
               paren-phrase  = 1*( paren-char / space / encoded-word )
               paren-char    = \>
               plain-phrase  = plain-word *( space plain-word )
               plain-word    = unquoted-word / quoted-word / encoded-word
               unquoted-word = 1*unquoted-char
               unquoted-char =
			@,;:\".[]>
               quoted-word   = quote 1*( quoted-char / space ) quote
               quote         = <" (ASCII 34)>
               quoted-char   = \>
               address       = local-part "@" domain
               local-part    = unquoted-word *( "." unquoted-word )
               domain        = unquoted-word *( "." unquoted-word )

          (Encoded words are described in section 4.5.)  The full name
          is  distinguished  from  the  electronic  address  either by
          enclosing the former in parentheses (making  it  resemble  a
          MAIL  comment, after the address) or by enclosing the latter
          in angle brackets.  The second form is  preferred.   In  the
          first  form, encoded words inside the full name MUST be com-
          posed  entirely  of  s.   In  the  second  form,
          encoded  words  inside the full name may not contain charac-
          ters other than letters (of either case),  digits,  and  the
          characters "!", "*", "+", "-", "/", "=", and "_".  The local
          part is case-sensitive (except that all case counterparts of
          "postmaster"  are  deemed  equivalent),  the domain is case-
          insensitive, and all other parts of  the  From  content  are
          comments  which  MUST  be  ignored  by news software (except
          insofar as reading agents may wish to display  them  to  the
          reader).   Posters  and  posting  agents MUST restrict them-
          selves to this subset of the MAIL From syntax; relayers  MAY
          accept  a  broader subset, but see the discussion in section
          9.1.

               NOTE: The syntax here is a  restricted  subset  of
               the  MAIL  From  syntax, with quoting particularly
               restricted, for simple  parsing.   In  particular,
               the  presence of "<" in the From content indicates
               that the second form is being used, otherwise  the
               first  form is being used.  The major restrictions
               here are those already de-facto imposed by  exist-
               ing software.





          2 June 1994                - 29 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 5.2


               NOTE: Overly-lenient posting agents sometimes per-
               mit the second form with a  full  name  containing
               "("  or  ")",  but it is extremely rare for a full
               name to contain "<" or ">" even in mail.   Accord-
               ingly,  reading  agents wishing to robustly deter-
               mine which form is in use in a particular  article
               should  key on the presence or absence of "<", not
               the presence or absence of "(".

          The address SHOULD be a valid and complete  Internet  domain
          address,  capable  of  being  successfully  mailed  to by an
          Internet host (possibly via an MX record and  a  forwarder).
          The  pseudo-domain  ".uucp" MAY be used for hosts registered
          in the UUCP maps (e.g. name "xyz.uucp" for  registered  site
          "xyz"), but such hosts SHOULD discontinue this usage (either
          by arranging a proper Internet address and forwarder, or  by
          using  the "% hack" (see below)), as soon as possible.  Bit-
          net hosts SHOULD use Internet addresses, avoiding the  obso-
          lescent  ".bitnet"  pseudo-domain.   Other  forms of address
          MUST not be used.

               NOTE: "Other forms" specifically include  UK-style
               "backward"  domains  ("uk.oxbridge.cs"  is  in the
               Czech Republic, not the UK), pure-UUCP  addressing
               ("knee!shin!foot"            instead            of
               "foot%shin@knee.uucp"),  and  abbreviated  domains
               ("zebra.zoo"  instead of "zebra.zoo.toronto.edu").

          If it is necessary to use the local part to specify a  rout-
          ing relative to the nearest Internet host, this MUST be done
          using the "% hack", using "%" as a secondary "@".  For exam-
          ple, to specify that mail to the address should go to Inter-
          net host "foo.bar.edu", then  to  non-Internet  host  "ein",
          then  to  non-Internet  host  "deux",  for delivery there to
          mailbox "fred", a suitable address would be:

               fred%deux%ein@foo.bar.edu

          Analogous forms using "!" in the  local  part  MUST  not  be
          used, as they are ambiguous; they should be expressed in the
          "%" form.

               NOTE: "a!b@c" can be interpreted as either "b%c@a"
               or  "b%a@c",  and there is no consistency in which
               choice is made.  Such addresses  consequently  are
               unreliable.   The  "%"  form  does not suffer from
               this problem, and although its use  is  officially
               discouraged,  it  is  a  de-facto standard, to the
               point that MAIL recognizes it.

          Relayers MUST not, repeat MUST not, repeat MUST not, rewrite
          From  lines,  in any way, however minor or innocent-seeming.
          Trying to "fix" a non-conforming address  has  a  very  high
          probability  of  making  things worse.  Either pass it along



          2 June 1994                - 30 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 5.2


          unchanged, or reject the article.

               NOTE: An additional reason for banning the use  of
               "!" addressing is that it has a much higher proba-
               bility of being rewritten into mangled unrecogniz-
               ability by old relayers.

          Posters  and  posting agents SHOULD avoid use of the charac-
          ters "!" and "@" in full names, as they may trigger unwanted
          header rewriting by old, simple-minded news software.

               NOTE: Also, the characters "." and ",", not infre-
               quently found in names (e.g., "John  W.  Campbell,
               Jr."), are NOT, repeat NOT, allowed in an unquoted
               word.  A From header like the following  MUST  not
               be written without the quotation marks:

                    From: "John W. Campbell, Jr." 



          5.3. Message-ID

          The  Message-ID  header contains the article's message ID, a
          unique identifier  distinguishing  the  article  from  every
          other article:

               Message-ID-content  = message-id
               message-id          = "<" local-part "@" domain ">"

          As  with  From addresses, a message ID's local part is case-
          sensitive and its domain is case-insensitive.  The  "<"  and
          ">"  are  parts  of the message ID, not peculiarities of the
          Message-ID header.

               NOTE: News message IDs are a restricted subset  of
               MAIL message IDs.  In particular, no existing news
               software copes properly with MAIL quoting  conven-
               tions  within  the local part, so they are forbid-
               den.  This is unfortunate, particularly for  X.400
               gateways  that  often  wish  to include characters
               which are not legal in unquoted message  IDs,  but
               it  is  impossible to fix net-wide.  See the notes
               on gatewaying in section 10.

          The domain in the message ID SHOULD  be  the  full  Internet
          domain name of the posting agent's host.  Use of the ".uucp"
          pseudo-domain (for hosts registered in the UUCP maps) or the
          ".bitnet"  pseudo-domain  (for Bitnet hosts) is permissible,
          but SHOULD be avoided.

          Posters and posting agents MUST generate the local part of a
          message ID using an algorithm which obeys the specified syn-
          tax (words separated by ".",  with  certain  characters  not



          2 June 1994                - 31 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 5.3


          permitted)  (see  section  5.2  for  details),  and will not
          repeat itself (ever).  The  algorithm  SHOULD  not  generate
          message  IDs which differ only in case of letters.  Note the
          specification in section 6.5 of a recommended convention for
          indicating  subject  changes.  Otherwise the algorithm is up
          to the implementor.

               NOTE: The crucial use of message IDs is to distin-
               guish  circulating  articles  from  each other and
               from articles circulated recently.  They are  also
               potentially  useful  as  permanent  indexing keys,
               hence the requirement for permanent  uniqueness...
               but   indexers  cannot  absolutely  rely  on  this
               because the earlier RFCs  urged  it  but  did  not
               demand  it.  All major implementations have always
               generated  permanently-unique   message   IDs   by
               design,  but  in  some  cases this is sensitive to
               proper administration,  and  duplicates  may  have
               occurred by accident.

               NOTE:  The most popular method of generating local
               parts is to use the date and time, plus  some  way
               of distinguishing between simultaneous postings on
               the same host (e.g. a process number), and  encode
               them  in a suitably-restricted alphabet.  An older
               but now  less-popular  alternative  is  to  use  a
               sequence  number,  incremented  each time the host
               generates a new message ID; this is workable,  but
               requires  careful  design  to  cope  properly with
               simultaneous  posting  attempts,  and  is  not  as
               robust  in  the presence of crashes and other mal-
               functions.

               NOTE: Some buggy news software  considers  message
               IDs  completely case-insensitive, hence the advice
               to  avoid  relying  on  case  distinctions.    The
               restrictions  placed  on  the  "alphabet" of local
               parts and domains in section 5.2 have  the  useful
               side effect of making it unnecessary to parse mes-
               sage IDs in complex ways to break them into  case-
               sensitive and case-insensitive portions.

          The  local  part of a message ID MUST not be "postmaster" or
          any other string that would compare equal to "postmaster" in
          a  case-insensitive  comparison.   Message  IDs  MUST  be no
          longer than 250 octets, including the "<" and ">".

               NOTE: "Postmaster"  is  an  irksome  exception  to
               case-sensitivity  in  local  parts, inherited from
               MAIL, and simply avoiding it is the  best  way  to
               deal  with it (not that it's likely, but the issue
               needs to be dealt  with).   The  length  limit  is
               undesirable,  but is present in widely-used exist-
               ing software.  The limit is actually  255,  but  a



          2 June 1994                - 32 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 5.3


               small safety margin is wise.


          5.4. Subject

          The  Subject header's content (the "subject" of the article)
          is a short phrase describing the topic of the article:

               Subject-content  = [ "Re: " ] nonblank-text

          Encoded words MAY appear in this header.

          If the article is a followup, the subject SHOULD begin  with
          "Re: "  (a  "back reference").  If the article is not a fol-
          lowup, the subject MUST not begin  with  a  back  reference.
          Back references are case-insensitive, although "Re: " is the
          preferred form.  A followup  agent  assisting  a  poster  in
          preparing a followup SHOULD prepend a back reference, UNLESS
          the subject already begins with one.  If the  poster  deter-
          mines  that  the topic of the followup differs significantly
          from what is described in the subject, a new, more  descrip-
          tive,  subject  SHOULD  be  substituted (with no back refer-
          ence).  An article whose subject begins with a  back  refer-
          ence  MUST  have a References header referencing the precur-
          sor.

               NOTE: A back reference  is  FOUR  characters,  the
               fourth being a blank.  RFC 1036 was confused about
               this.  Observe also that only ONE  back  reference
               should be present.

               NOTE:  There  is a semi-standard convention, often
               used, in which a subject change is flagged by mak-
               ing the new Subject-content of the form:

                    new topic (was: old topic)

               possibly  with  "old  topic"  somewhat  truncated.
               Posters wishing to  do  something  like  this  are
               urged  to  use  this exact form, to simplify auto-
               mated analysis.

          For historical reasons, the  subject  MUST  not  begin  with
          "cmsg " (note that this sequence ends with a blank).

               NOTE:  Some  old  news  software  takes  a subject
               beginning with "cmsg " as an indication  that  the
               article is a control message (see sections 6.6 and
               7).  This mechanism is obsolete  and  undesirable,
               but accidental triggering of it is still possible.

          The subject SHOULD be terse.  Posters SHOULD avoid trying to
          cram  their  entire  article into the headers; even the sim-
          plest query usually benefits  from  a  sentence  or  two  of



          2 June 1994                - 33 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 5.4


          elaboration  and  context, and the details of header display
          vary widely among reading agents.

               NOTE: All-in-the-subject  articles  are  sometimes
               the  result of misunderstandings over the interac-
               tion protocol of a posting agent.  Posting  agents
               might wish to give special attention to the possi-
               bility that a poster specifying a very  long  sub-
               ject  might have thought he was typing the body of
               the article.


          5.5. Newsgroups

          The Newsgroups header's content specifies which newsgroup(s)
          the article is posted to:

               Newsgroups-content  =
			newsgroup-name *( ng-delim newsgroup-name )
               newsgroup-name      = plain-component *( "." component )
               component           = plain-component / encoded-word
               plain-component     = component-start *13component-rest
               component-start     = lowercase / digit
               lowercase           = 
               component-rest      = component-start / "+" / "-" / "_"
               ng-delim            = ","

          Encoded words used in newsgroup names MUST not contain char-
          acters other than letters, digits, "+", "-", "/", "_",  "=",
          and "?"  (although they may encode them).

          A  newsgroup  name consists of one or more components, which
          may be plain components or (except for  the  first)  encoded
          words.   A plain component MUST contain at least one letter,
          MUST begin with a letter or digit, and MUST  not  be  longer
          than  14  characters.  The first component MUST begin with a
          letter; subsequent components SHOULD begin  with  a  letter.
          Newsgroup  names  MUST not contain uppercase letters, except
          where required by encodings in encoded words.  The sequences
          "all" and "ctl" MUST not be used as components.

               NOTE:  The  alphabet  and  syntax specified encom-
               passes all  existing  names  of  widespread  news-
               groups,  while  avoiding  various  forms  that are
               known to cause problems.  Important existing soft-
               ware  uses  various non-alphanumeric characters as
               punctuation  adjacent  to  newsgroup  names.   (It
               would,  in  fact,  be  preferable  to ban "+" from
               newsgroup  names,  were  it   not   that   several
               widespread  newsgroups related to the C++ program-
               ming language already use it.)

               NOTE: Much existing software  converts  the  news-
               group  name  into  a directory path and stores the
               articles themselves using  numeric  filenames,  so



          2 June 1994                - 34 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 5.5


               all-digit  name components can be troublesome; the
               "Great Renaming" early in the  history  of  Usenet
               included  revisions  of several newsgroup names to
               eliminate such components.

               NOTE: The same storage technique is the reason for
               the  14-character limit.  The limit is now largely
               historical, since most modern  systems  have  much
               larger limits on the length of a directory entry's
               name, but many old systems are still in use.  Sys-
               tems  with  shorter  limits  also  exist, but news
               software on such systems has had to deal with  the
               problem   already,   since   there   are   several
               widespread newsgroups with 14-character components
               in  their  names.  Implementors are warned that it
               is intended that the successor to this Draft  will
               increase  the 14-character limit, and are urged to
               fix their software to handle longer  names  grace-
               fully  (if  such  fixes  are  necessary, given the
               intended domain of application of  the  particular
               software).

               NOTE:  The requirement that the first character of
               a name be a letter accommodates existing  software
               which assumes it can tell the difference between a
               newsgroup name and other possible syntactic  enti-
               ties  by  inspecting the first character.  Similar
               considerations motivate excluding  "+",  "-",  and
               "_"  from  coming  first  in  a component, and the
               preference for components that do not  begin  with
               digits.   The "all" sequence is used as a wildcard
               symbol in much existing software,  and  the  "ctl"
               sequence  was  involved  in an obsolete historical
               mechanism for marking control  messages,  so  they
               are best avoided.

               NOTE:  Possibly  newsgroup  names should have been
               case-insensitive, but all existing software treats
               them  as  case-sensitive.   (RFC  977 [rrr] claims
               that they are case-insensitive in NNTP, but exist-
               ing  implementations are believed to ignore this.)
               The simplest solution is just to ban use of upper-
               case  letters,  since no widespread newsgroup name
               uses them anyway; this avoids any  possibility  of
               confusion.

               NOTE:  The syntax has the disadvantage of contain-
               ing no white space, making it impossible  to  con-
               tinue  a  Newsgroups  header across several lines.
               Implementors of relayers and  reading  agents  are
               warned  that  it is intended that the successor to
               this Draft will change the definition of  ng-delim
               to:




          2 June 1994                - 35 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 5.5


                    ng-delim = "," [ space ]

               and  are  urged  to  fix  their software to handle
               (i.e., ignore) white space following  the  commas.
               Meanwhile, posters must avoid inserting such space
               (despite  the  natural-language  convention  which
               permits  it)  and  posting  agents should strip it
               out.

               NOTE: Encoded words  as  components  are  somewhat
               problematic,  but are clearly desirable for use in
               non-English-speaking nations.  They are  not  sub-
               ject to the 14-character limit, and this (plus the
               possibility of "/" within them) may  require  spe-
               cial handling in news software.

          Encoded words are allowed in newsgroup names ONLY where non-
          ASCII characters are necessary to the name, and must use the
          "b"  encoding  [rrr] and the first suitable character set in
          the MIME order of preferred character sets [rrr].

               NOTE: Since the  newsgroup  name  is  the  encoded
               form,  NOT the underlying non-ASCII form, there is
               room for terrible confusion here if the choice  of
               encoding  for a particular name is not fully stan-
               dardized.

          Posters SHOULD use only the names of existing newsgroups  in
          the  Newsgroups  header,  because newsgroups are NOT created
          simply by being posted to.  However,  it  is  legitimate  to
          cross-post to newsgroup(s) which do not exist on the posting
          agent's host, provided that at least one of  the  newsgroups
          DOES  exist  there,  and  followup  agents  MUST accept this
          (posting agents MAY accept it, but SHOULD at least alert the
          poster to the situation and request confirmation).  Relayers
          MUST not rewrite Newsgroups headers in any way, even if some
          or all of the newsgroups do not exist on the relayer's host.

               NOTE: Early experience  with  news  software  that
               created  newsgroups  when they were mentioned in a
               Newsgroups header was thoroughly negative: posters
               frequently mistype newsgroup names.

               NOTE:  While it is legitimate for some of an arti-
               cle's newsgroups not to exist on the host where it
               is  posted,  this  IS  a  rather unusual situation
               except in followups (which should go to all  news-
               groups  the  precursor  was posted to, even if not
               all of them reach the site where the  followup  is
               being posted).

               NOTE:   Rewriting   Newsgroups  headers  to  strip
               locally-unknown   newsgroups   is    superficially
               attractive.    However,   early   experience  with



          2 June 1994                - 36 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 5.5


               exactly that policy was thoroughly negative:  news
               propagation   is  more  redundant  and  much  less
               orderly than many people imagine, and in  particu-
               lar  it  is  not  unheard-of  for  the (sometimes)
               fastest path between two (say) U of Toronto  sites
               to  pass  outside  U  of  Toronto... in which case
               newsgroup stripping can cause incomplete  propaga-
               tion.   Having  an  article's  set  of  newsgroups
               change as it propagates can also  result  in  fol-
               lowups  not  achieving the same propagation as the
               original.  It's been tried; it's more trouble than
               it's worth; don't do it.

               NOTE:  In particular, newsgroup stripping superfi-
               cially looks like a solution  to  the  problem  of
               duplicate  regional newsgroup names.  For example,
               both University of Toronto and University of Texas
               have  "ut.general" newsgroups, and material cross-
               posted to that name and a global newsgroup appears
               in  both universities' local newsgroups.  However,
               the side effects  of  stripping  are  sufficiently
               unacceptable  to  disqualify  it for this purpose.
               Don't do it.

          Cross-posting an article to several relevant  newsgroups  is
          far  superior  to  posting separate articles with duplicated
          content to each newsgroup, because reading agents can detect
          the  situation  and  show the article to a reader only once.
          Posters SHOULD cross-post rather than duplicate-post.

               NOTE: On the other hand, cross-posting to a  large
               number  of  newsgroups  usually indicates that the
               poster has not thought about his  audience;  arti-
               cles  are rarely pertinent to more than (say) half
               a dozen newsgroups.  Posting agents might wish  to
               request confirmation when the number of newsgroups
               exceeds (say) five in the presence of a  Followup-
               To  header,  or (say) two in the absence of such a
               header.

               NOTE: One problem with cross-postings is  what  to
               do  with an article cross-posted to a set of news-
               groups including both  moderated  and  unmoderated
               ones.   Posters  tend to expect such an article to
               show up immediately in the unmoderated newsgroups,
               especially if they do not realize that one or more
               of the newsgroups is moderated.  However, since it
               is  not  possible for a moderator to retroactively
               add an already-posted article to a moderated news-
               group,  the only correct action is to mail such an
               article to one (and only one)  of  the  moderators
               for  action.   It is probably best for the posting
               agent to detect this situation and ask the  poster
               what  action is preferred.  The acceptable choices



          2 June 1994                - 37 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 5.5


               are to alter the newsgroup list or to  mail  to  a
               moderator  of  the  poster's  choice;  the posting
               agent should NOT  offer  duplicate-posting  as  an
               easy-to-request  option (if only because many mod-
               erators will reject a submission that has  already
               been posted to unmoderated newsgroups).

               NOTE:  An  article cross-posted to multiple moder-
               ated newsgroups really should have  approval  from
               all  the  moderators  involved.   In practice, the
               only straightforward way to do this is to send the
               article  to  one  of them and have him consult the
               others.

          A newsgroup SHOULD not appear more than once  in  the  News-
          groups header.

          Newsgroup  names  having only one component are reserved for
          newsgroups whose propagation is restricted to a single  host
          (or  the  administrative  equivalent).  It is inadvisable to
          name a newsgroup "poster"  because  that  word  has  special
          meaning  in  the  Followup-To header (see section 6.1).  The
          names "control" and "junk" are frequently used  for  pseudo-
          newsgroups  internal  to  relayer implementations, and hence
          are also best avoided.

               NOTE: Beware of the  duplicate-regional-newsgroup-
               names  problem  mentioned  above.   In particular,
               there are many, many hosts with a newsgroup  named
               "general",  and  some surprising things show up in
               such newsgroups when  people  cross-post.   It  is
               probably  better  to  use  multi-component  names,
               which are less likely to  be  duplicated.   Fred's
               Widget  House should use "fwh.general" rather than
               just  "general"  as  its  in-house  general-topics
               newsgroup.

          It is conventional to reserve newsgroup names beginning with
          "to." for test messages sent  on  an  essentially  point-to-
          point basis (see also the ihave/sendme protocol described in
          section 7.2); newsgroup names beginning  with  "to."  SHOULD
          not be used for any other purpose.  The second (and possibly
          later) components of such a name should, together,  comprise
          the  relayer name (see section 5.6) of a relayer.  The news-
          group exists only at the named relayer  and  its  neighbors.
          The  neighbors all pass that newsgroup to the named relayer,
          while the named relayer does not pass it to anyone.

          The order of newsgroup names in the Newsgroups header is not
          significant.







          2 June 1994                - 38 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 5.6


          5.6. Path

          The Path header's content indicates which relayers the arti-
          cle has  already  visited,  so  that  unnecessary  redundant
          transmission can be avoided:

               Path-content    = [ path-list path-delimiter ] local-part
               path-list       = relayer-name *( path-delimiter relayer-name )
               relayer-name    = 1*rn-char
               rn-char         = letter / digit / "." / "-" / "_"
               path-delimiter  = "!"

          The  Path  content  is a list of relayer names, separated by
          path delimiters, followed (after a final delimiter)  by  the
          local  part of a mailing address.  Each relayer MUST prepend
          its name, and a delimiter, to the Path content in all  arti-
          cles  it processes.  A relayer MUST not pass an article to a
          neighboring relayer whose name is already  mentioned  in  an
          article's  path list, unless this is explicitly requested by
          the neighbor  in  some  way.   The  Path  content  is  case-
          sensitive.

               NOTE:  The Path header supplied by a posting agent
               should normally contain only the local part.   The
               relayer  that the posting agent passes the article
               to for posting will prepend its  relayer  name  to
               get the path list started.

               NOTE:  Observe that the trailing local part is NOT
               part of the path list.  This Path header:

                    Path: fee!fie!foe!fum

               contains three relayer names:  "fee",  "fie",  and
               "foe".  A relayer named "fum" is still eligible to
               be sent this article.

               NOTE: This syntax has the disadvantage of contain-
               ing  no  white space, making it impossible to con-
               tinue a Path header across several lines.   Imple-
               mentors  of relayers and reading agents are warned
               that it is intended that  the  successor  to  this
               Draft will change the definition of path delimiter
               to:

                    path-delimiter = "!" [ space ]

               and are urged to  fix  their  software  to  handle
               (i.e.,  ignore) white space following the exclama-
               tion points.  They are urged to hurry;  some  ill-
               behaved  systems  reportedly  already feel free to
               add such white space.





          2 June 1994                - 39 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 5.6


               NOTE: RFC 1036 allows considerably more  flexibil-
               ity  in  choice  of delimiter, in theory, but this
               flexibility has never  been  used  and  most  news
               software  does  not  implement  it  properly.  The
               grammar reflects the current  reality.   Note,  in
               particular,  that  RFC 1036 treats "_" as a delim-
               iter, but in fact it is known to appear in relayer
               names occasionally.

          Because  an  article will not propagate to a relayer already
          mentioned in its path list, the path list MUST  not  contain
          any  names  other  than  those  of  relayers the article has
          passed through AS NEWS.  This is trivially obvious for  nor-
          mal  news  articles, but requires attention from the modera-
          tors of moderated newsgroups and the implementors and  main-
          tainers of gateways.

               NOTE:  For  the  same  reason,  a  relayer and its
               neighbors need to agree on the choice  of  relayer
               name,  and  names  should  not  be changed without
               notifying neighbors.

          Relayer names need to be unique  among  all  relayers  which
          will  ever  see  the articles using them.  A relayer name is
          normally either an "official" name for the host the  relayer
          runs  on,  or  some  other "official" name controlled by the
          same organization.  Except in cooperating subnets that agree
          to  some  other  convention, and don't let articles using it
          escape beyond the subnet, a relayer name MUST  be  either  a
          UUCP  name  registered  in the UUCP maps (without any domain
          suffix such as ".UUCP"), or a complete Internet domain name.
          Use  of a (registered) UUCP name is recommended, where prac-
          tical, to keep the length of the path list down.

          The use of Internet domain names in the path  list  presents
          one problem: domain names are case-insensitive, but the path
          list is case-sensitive.   Relayers  using  domain  names  as
          their  relayer names MUST pick a standard form for the name,
          and use that form consistently to the exclusion of all  oth-
          ers.   The  preferred  form for this purpose, which relayers
          SHOULD use, is the all-lowercase form.

               NOTE: It is arguably  unfortunate  that  the  path
               list is case-sensitive, but it is much too late to
               change this.   Most  Internet  sites  do,  in  any
               event,  use  one  standardized  form of their name
               almost everywhere.

          In the ordinary case, where the poster is the author of  the
          article,  the  local  part following the path list SHOULD be
          the local part of the poster's full Internet domain  mailing
          address.





          2 June 1994                - 40 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 5.6


               NOTE:  It  should  be just the local part, not the
               full address.  The character "@" does  not  appear
               in a Path header.

          The  Path content somewhat resembles a mailing address, par-
          ticularly in the UUCP world with its manual routing and  "!"
          address  syntax.   Historically, this resemblance was impor-
          tant, and the  Path  content  was  often  used  as  a  reply
          address.  This practice has always been somewhat unreliable,
          since news paths are not always mail paths and news  relayer
          names  are  not  always recognized by mail handlers, and its
          reliability has generally worsened  in  recent  times.   The
          widespread   use  of  and  recognition  of  Internet  domain
          addresses, even outside the  actual  Internet,  has  largely
          eliminated  the  problem.   Readers  SHOULD not use the Path
          content as a reply address.   On  the  other  hand,  relayer
          administrators  are  urged  not  to break this usage without
          good reason; where practical, paths followed by news  SHOULD
          be  traversable  by mail, and mail handlers SHOULD recognize
          relayer names as host names.

          It will typically be difficult or impractical  for  gateways
          and  moderators to supply a Path content that is useful as a
          reply address for the author, bearing in mind that the  path
          list they supply will normally be empty.  (To reiterate: the
          path list MUST not contain any names  other  than  those  of
          relayers  the  article  has  passed  through AS NEWS.)  They
          SHOULD supply a local part that will result in replies to  a
          Path-derived  address  being  returned  to the sender with a
          brief explanation.   Software  permitting,  the  local  part
          "not-for-mail" is recommended.

               NOTE:  A  moderator  or  gateway administrator who
               supplies a local part that delivers such  mail  to
               an  administrative  mailbox  will quickly discover
               why it should be  bounced  automatically!   It  is
               best, however, for the returned message to include
               an explanation  of  what  has  probably  happened,
               rather than just a mysterious "undeliverable mail"
               complaint, since the sender may not be aware  that
               his/her  software  is unwisely using the Path con-
               tent as a reply  address.   Reply  software  might
               wish  to  question  attempts  to  reply to a Path-
               derived address ending in "not-for-mail" (which is
               why a specific name is being recommended here).


          6. Optional Headers

          Many  MAIL  headers,  and many of those specified in present
          and future MAIL extensions, are  potentially  applicable  to
          news.   Headers  specific to MAIL's point-to-point transmis-
          sion paradigm, e.g. To and Cc, SHOULD  not  appear  in  news
          articles.   (Gateways  wishing  to preserve such information



          2 June 1994                - 41 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                      sec. 6


          for debugging probably SHOULD hide it under different names;
          prefixing  "X-"  to  the original headers, resulting in e.g.
          "X-To", is suggested.)

          The following optional headers are either specific  to  news
          or  of particular note in news articles; an article MAY con-
          tain some or all of them.  (Note that there are some circum-
          stances  in  which  some  of  them  are mandatory; these are
          explained under the individual headers.)   An  article  MUST
          not contain two or more headers with any one of these header
          names.

               NOTE: The ban on duplicate header names  does  not
               apply  to  headers  not specified in this Draft at
               all, such as "X-" headers.   Software  should  not
               assume  that  all  header names in a given article
               are unique.


          6.1. Followup-To

          The Followup-To header contents specify  which  newsgroup(s)
          followups should be posted to:

               Followup-To-content = Newsgroups-content / "poster"

          The  syntax  is  the same as that of the Newsgroups content,
          with the exception that the magic word "poster"  means  that
          followups  should  be  mailed to the article's reply address
          rather than posted.  In  the  absence  of  Followup-To,  the
          default  newsgroup(s)  for a followup are those in the News-
          groups header.

               NOTE: The way to request that followups be  mailed
               to  a specific address other than that in the From
               line is  to  supply  "Followup-To: poster"  and  a
               Reply-To header.  Putting a mailing address in the
               Followup-To  line  is  incorrect;  posting  agents
               should reject or rewrite such headers.

               NOTE:   There  is  no  syntax  for  "no  followups
               allowed"  because   "Followup-To: poster"   accom-
               plishes this effect without extra machinery.

          Although it is generally desirable to limit followups to the
          smallest reasonable set of newsgroups, especially  when  the
          precursor was cross-posted widely, posting agents SHOULD not
          supply a Followup-To header except at the poster's  explicit
          request.

               NOTE: In particular, it is incorrect for the post-
               ing agent to assume that  followups  to  a  cross-
               posted  article  should  be  directed to the first
               newsgroup only.  Trimming the list  of  newsgroups



          2 June 1994                - 42 -       expires 15 July 1994





          INTERNET DRAFT to be        NEWS                    sec. 6.1


               should  be  the poster's decision, not the posting
               agent's.  However, when an article is to be cross-
               posted  to  a considerable number of newsgroups, a
               posting agent might wish to SUGGEST to the  poster
               that followups go to a shorter list.


          6.2. Expires

          The  Expires  header  content specifies a date and time when
          the article is deemed to be no longer useful and  should  be
          removed ("expired"):

               Expires-content = Date-content

          The  content syntax is the same as that of the Date content.
          In the absence of Expires, the default  is  decided  by  the
          administrators  of  each  host  the article reaches, who MAY
          also restrict the extent to which the Expires header is hon-
          ored.

          The Expires header has two main applications: removing arti-
          cles whose utility ends on  a  specific  date  (e.g.,  event
          announcements which can be removed once the day of the event
          is past) and preserving articles expected to be of prolonged
          usefulness  (e.g.,  information  aimed  at  new readers of a
          newsgroup).  The latter  application  is  sometimes  abused.
          Since individual hosts have local policies for expiration of
          news (depending on  available  disk  space,  for  instance),
          posters  SHOULD  not  provide  Expires  headers for articles
          unless there is a natural expiration  date  associated  with
          the  topic.   Posting  agents  MUST  not  provide  a default
          Expires header.  Leave it out and allow local policies to be
          used unless there is a good reason not to.  Expiry dates are
          properly the decision  of  individual  host  administrators;
          posters  and  moderators  SHOULD  set only expiry dates that
          most administrators would agree with.

               NOTE: A poster preparing an Expires header for  an
               article  whose  utility  ends  on  a  specific day
               should typically  specify  the  NEXT  day  as  the
               expiry  date.   A  meeting  on July 7th remains of
               interest on the 7th.