From SGML Mon Apr 26 09:32:09 MET DST 1993
Newsgroups: alt.hypertext,comp.multimedia,alt.news-media,comp.text,
	comp.text.sgml,comp.sys.amiga.multimedia
From: Erik Naggum <SGML@ifi.uio.no>
Message-ID: <19930426.003@erik.naggum.no>
Date: 26 Apr 1993 03:14:00 +0200
References: <1qn588INN27o@uwm.edu> <19930416.063132.922@almaden.ibm.com>
	<19930420.063124.67@almaden.ibm.com> <4942@ulysse.enst.fr>
	<a7fd5104@random.se> <19930423.072919.779@almaden.ibm.com>
	<a800bb38@random.se>
Subject: Re: Looking for Electronic Publshing formats... [long]
Lines: 107
Status: R


[Erik Naggum speaking]
======================

Through the barrage of silliness in the referenced article, a few gems
can still be seen.  I think we should focus on them, so I have cut more
than liberally in the article, which IMNSHO the author should have done
before posting it.

[Ian Feldman]
:
|   According to my private theory most online-people, weaned on a decade
|   of mindlessly-applied WYSIWYG, _intuitively_ associate structure with
|   that of typographical richness.
:
|   [As] long as we speak of emulating constructs in the real world, we may
|   perhaps bear in mind that there are no universal models in strictly-
|   electronic/online publishing yet.
:
|   I had never intended setext to be used for other than periodic online
|   publications of time- and subject-topical kind, where the content to
|   graphic-enhancement ratio never were very high anyway.  And if past
|   experiences are anything to go by, I believe that the concept has
|   proven to be successful.
:
|   Setext does not _only_ serve the lowest common denominator media, it
|   conforms to the constraints of them in order to be readable everywhere.
:
|   Setext is, at its simplest, entirely devoid of any visible tags.  It
|   will thus appear as merely some rigidly-formatted piece of plaintext
|   yet at the same time continue to carry the minimal structure (i.e.,
|   subdivisions of the whole into parts above the paragraph level, a basic
|   outline notation).
:
|   [B]ecause setext _is_ limited in scope its overhead can never be
|   much above the verified figure of 9%.

I have a few comments on what I consider to be a cause of the confusion so
far, and some final comments on where setext and SGML fit in.

Coombs, Renear and DeRose : Markup Systems and the Future of Scholarly Text
Processing, in CACM 30/11 (1987) pp 933-947, discuss markup systems in
general, and give some valuable pointers to what we might otherwise take
for granted.  Our writing system introduces us to punctuational and
presentational means of conveying the structure of information, the latter
including horizontal and vertical spacing, numbering of lists, paragraphs,
chapters and the like.  The article discusses four types of markup:
no markup (not even punctuation), presentational, procedural, and
descriptive.  The procedural markup systems associate control words with
procedures with fixed actions.  The descriptive markup systems associate
markup constructs with types of elements of the structure.

Against this background, SGML is a purely descriptive markup system, and as
a language, it provides declarations for the elements (among other things),
such that they are akin to variables in other languages.  As elsewhere, the
meaning of variable names can differ according to context, but an SGML
system provides the application with a structure of named elements that it
can act up on in any number of ways.  In SGML, the markup is rigid and this
is exploited by offering validation services for SGML documents as part of
the processing.  Indeed, the idea that the markup is or is not "valid"
independent of its processing is one of SGML's strongest points.

Compare now setext, which is a sort of cross-breed between punctuational
and presentational markup system with delusions of grandeur if I may put it
in a hostile form.  One of the disadvantages of punctuational markup, as
detailed in the above article, is its complexity and the difficulty of
authors to adhere to conventions.  Precisely this complexity will make it
very hard to validate punctuational markup systems, which is all the more
necessary because of the difficulty in adhering to them.  Presentational
markup systems suffer from a large variety of means to convey the same
structural information, as found in variations in indentation, vertical
spacing, etc.  Since such systems are mainly intended to be "processed" by
the human eye, "validity" will also be largely undefined for documents in
this system.  setext's redeeming quality is that it attaches descriptive
semantics to the punctuational-presentational markup, instead of procedural
semantics, which is definitely the more usual.

This still does not make setext any more a descriptive markup system than a
typewriter is a typesetting system, even if it can use more than one font.

We should diligently strive to call a spade a spade, and not wander off
into unknown territory claiming this and that about breakthroughs in
short-distance mud transportation technologies.  If setext is intended for
the ASCII-based on-line publications where before they had only ASCII, and
most readers will continue to have only ASCII terminals as their primary
means of viewing the publications, then setext may be a good candidate for
that specific field of application.  It is quite clear that setext is an
end-user presentational gimmick with high-feature ambitions that may be
useful to ill-equipped readers if we compare it with SGML.  If, however, we
compare like with like, we find that "final formatted form" is explicitly
excluded from SGML's field of application, and that it is central to
setext's field of application, so never the twain shall meet.  ("Formatted"
was changed to "imaged" in the amendment to the standard, but the original
was better, IMHO.)

Publications who might want to look into several media for presentation and
on-line publication may well find setext to be suitable for their _output_
needs, in addition to postscript and DVI files.  However, neither of these
formats are suitable as _input_ formats.  This is where SGML comes in.

"If you must compare apples and oranges, don't use redness and smoothness
as your criteria ... unless, of course, you're deliberately trying to
mislead." 				-- Michael A. Padlipsky, 1985

Best regards,
</Erik>
--
Erik Naggum                 ISO  8879 SGML                   +47 2295 0313
Oslo, Norway                ISO 10744 HyTime
<erik@naggum.no>            ISO  9899 C                 Memento, terrigena
<SGML@ifi.uio.no>           ISO 10646 UCS             Memento, vita brevis
 $$


From ianf Tue Apr 27 01:52:43 MET DST 1993
From: ianf@random.se (Ian Feldman, Current Setext Oracle)
Newsgroups: alt.hypertext,comp.multimedia,alt.news-media,comp.text,
	comp.text.sgml,comp.sys.amiga.multimedia
Date: Mon, 26 Apr 93 21:12:58 +0200
Message-ID: <a80200d3@random.se>
X-URL: file://garbo.uwasa.fi/mac/tidbits/setext/setext+sgml_03.etx
References: <1qn588INN27o@uwm.edu> <19930416.063132.922@almaden.ibm.com>
	<19930420.063124.67@almaden.ibm.com> <4942@ulysse.enst.fr>
	<a7fd5104@random.se> <19930423.072919.779@almaden.ibm.com>
	<a800bb38@random.se> <19930426.003@erik.naggum.no>
Content-Type: setext/plain; charset=ascii_827
Organization: random design -- "Opinions, cheaply"
Lines: 188
Summary: little silliness is what makes this medium human
Subject: Re: Looking for Electronic Publshing formats... [long]
Status: R


  beating the setext drum, proudly
==================================
  by Ian Feldman_

  Rising up in the dead of the night to the defense_ of domains of
  the SGML, Erik Naggum_ delivers a nicely-put summation of hither-
  to-gained common knowledge of the strengths and weaknesses of by
  myself advocated structure-enhanced text markup method for online
  publishing use. 

  Given that I readily admit the limits to usefulness of the setext,
  esp.  in heavy-duty, "professional," applications where nothing
  but the SGML WILL DO, perhaps nothing further needed to be said
  about it.  We all realize that should ever a different markup be
  required in similar circumstances, it would have to be as complex
  as is the SGML.  Therefore, why bother?

  Still, no less for the sake of completeness before we wrap this
  debate up, I feel that I ought to mention that several of Erik's
  points have indeed been addressed in setext, to the extent that it
  was deemed possible.  I have yet to read the referenced CACM
  article_, which sounds an appropriate basis enough for the purpose
  of this discussion, but the accumulated effect of other works in
  the field that I've read must have inspired me enough to come up
  with such an Obviously Grand-Delusional Scheme To Wrap Text Up ;-))

  For starters, both the important document-validation and adherence
  (to punctuational markup; difficulties of) aspects have not passed
  unmentioned in the setext.  They are also explicitly documented,
  albeit not in precisely such semantically-correct terms, in the
  two sermons_ that have been issued and are stored at the setext
  archival site_ (which is _not_ the same as the TidBITS listserver,
  see below). 


  Erik Naggum says:
-------------------

> In SGML, the markup is rigid and this is exploited by offering
> validation services for SGML documents as part of the processing. 
> Indeed, the idea that the markup is or is not "valid" independent
> of its processing is one of SGML's strongest points. 

  Agreed.  Setext "employs" a technique of _detection_ whether any
  submitted doc is a setext or not -- I suppose it could be called a
  form of validation prior to decoding.  If not containing at least
  one valid subhead or title, then the document is to be treated
  (and _paged_ through by mechanical means if displayed) as just
  another plaintext.  All the other typotags are considered optional
  and of lesser importance so, as much for reasons of simplicity, as
  for the need to prevent delays at runtime, they're not subjected
  to checks until decoding.  Admittedly this process is a simple
  one, but it's there nevertheless. 


> setext, is a sort of cross-breed between punctuational and
> presentational markup system with delusions of grandeur if
> I may put it in a hostile form.  One of the disadvantages of
> punctuational markup [...] is its complexity and the difficulty 
> of authors to adhere to conventions. 

  Precisely in order to address that latter point, effectively
  to _lower_ the threshold of complexity and raise that of input
  tolerance, setext's SOLE elements of importance, the subheads (and
  titles), require but that the writer/ publisher create them using
  a monospaced font (which, surprizingly enough, is not a widely-
  understood base prerequisite for all publishing intended for
  online distribution and -consumption).  

  Subheads and titles, you may care to recall, are both defined as
  "string of characters, that may be indented by any number of
  spaces, followed by a line ALWAYS beginning in column 1 and made
  up of as many dashes (or equal-signs for the titles) as there is
  the number of _rightmost_visible_ characters in the subhead/ title
  above."

  Imagine a "2-line-box" that is anchored firmly at the left margin,
  where the asterisk is below, and that the last _visible_
  character of the line above it is the plus (+). Grab hold of
  the latter and move freely left-right, never mind any trailing
  spaces, here expressed with underscore characters "_"  The all-
  dash line's length is supposed to equal that of line's above
  length counted in characters from column 1 to that of the +

         your possibly-indented subhead here+_________
*--------------------------------------------____

  Indeed, both the first, subhead-string, and the subhead-typotag
  lines can carry trailing, to the eye defacto invisible, white
  space and still be considered valid.  Perhaps this may strike
  someone as needlessly complex and confounding but as the method
  stands and falls on this _sole_ instance of non-negotiable
  adherence to punctuational exactness, equal rightmost-visible
  monospaced length of two lines of which one is a tag for the
  other, I decided to make it as tolerant of input errors as
  possible.  After all, either line could end up with trailing
  spaces unbeknownst to the author/ publisher. 

  As for the need for adherence in regard to the other typotags --
  the setext is explicitly targeted towards _publishers_ of online
  periodic matter, rather than towards (mere ;-)) authors.  This is
  an important distinction, since publishers can more readily be
  expected to ensure compliance with a given markup, whether they
  have full validation tools at their disposal or not (or else
  they're "worse publishers" than the other guys).  I had never
  considered it generic enough for use by everybody and his brother
  (in-law['s {cousin's /neighbor/}]).  Whatever it is that makes
  Erik Naggum, Eliot Kimber_, myself and a few other individuals
  (too few) adapt our writing style to the rigours of this narrow-
  bandwith medium, hand-code our texts in some near-optimal fashion,
  could therefore be expected of other _publishers_ as well, those
  for whom the setext primarily has been defined.  We are authors,
  but we're also publishers.  Those that do not care how (chaotic)
  their thoughts may appear online are "typing" at best, not writing
  and definitely not "publishing." They post, therefore they exist. 


  areas of application
----------------------

> If setext is intended for the ASCII-based on-line publications
> [...] then [it] may be a good candidate for that specific field 
> of application.  It is quite clear that setext is an end-user
> presentational gimmick with high-feature ambitions that may be
> useful to ill-equipped readers if we compare it with SGML. 

  It is intended for precisely that kind of audience and acc.  to my
  experiences the ill-equipped readers are the norm in RealLife[tm],
  rather than the exception.  It is the Academia and the High
  Commerce that are the freaks here ;-)) So it is gimmicry, smoke
  and mirrors is what we have to contend ourselves with, in lieu of
  absence of more suitable tools. 

> If, however, we compare like with like, we find that "final
> formatted form" is explicitly excluded from SGML's field of
> application, and that it is central to setext's field of
> application, so never the twain shall meet. 

  Sure, how can we compare something to That Which Has No Equals?


  uses of setext for input
--------------------------

> Publications who might want to look into several media for
> presentation and on-line publication may well find setext to be
> suitable for their _output_ needs, in addition to postscript and
> DVI files.  However, neither of these formats are suitable as
> _input_ formats.  This is where SGML comes in. 

  The last sentence is disputable at best...  by virtue of its
  inobtrusive and highly "natural" notation (observe the quotes) the
  setext may be a more appropriate method for hand-encoding of
  _basic_ structure into (limited-size) documents than is the SGML. 
  Once done they may later be enhanced further still in more-appro-
  priate, validated markup environemnts.  Else this wish-list from
  a known Internet high-flyer figure_ has no basis in reality:

> Some concrete things to do for setext:

> - do the setext to troff, setext to postscript, setext to rtf
>   formatters so we can put things onto paper
> - do the setext to RFC format converter so we can write RFCs 
>   in setext and then convert them off easily
> - get setext blessed by Postel et al in the IETF so we can
>   do proper MIME stuff
> - nail down the URL business
> - do the setext to HTML bit

> ah, there is plenty of work to be done...

   Delusions of grandeur, eh?


__Ian "The ``I had the bug narrowed down to a subroutine and then
       I lost all interest'' hacker" Feldman <ianf@random.se>

.. _site file://garbo.uwasa.fi/mac/tidbits/setext/
.. _sermons file://garbo.uwasa.fi/mac/tidbits/setext/?sermon_*
.. _figure (delusions of grandeur or NOT, I am no name-dropper; let him come forward and disclose his own views in person)
.. _defense news:19930426.003@erik.naggum.no
.. _article news:19930426.003@erik.naggum.no/"Coombs, Renear and DeRose"
.. _Naggum <erik@naggum.no> (Erik Naggum)
.. _Kimber <drmacro@ralvm13.vnet.ibm.com> (Eliot Kimber)
.. _Feldman <ianf@random.se> (Ian Feldman)

 $$