# From: ianf@random.se (Ian Feldman, Current Setext Oracle)
# Newsgroups: alt.hypertext (complete, original headers at end of file)
# Date: Sun, 25 Apr 93 22:03:47 +0200
# Message-ID: <a800bb38@random.se>
# X-URL: file://garbo.uwasa.fi/mac/tidbits/setext/setext+sgml_02.etx
# Organization: random design -- "any old TTY will do"
# Reply-To: setext-list-request@random.se
# Lines: 536
# Summary: setext is pure potential embossed in plaintext
# Subject: Re: Looking for Electronic Publshing formats... [long]


  setext and SGML
=================
  by Ian Feldman_

  In his last-referenced <news:19930423.072919.779@almaden.ibm.com>
  article <\news:19930423.072919.779@almaden.ibm.com> Eliot Kimber_
  adds a few timbers_ to the fire in what has now become a wholly
  different debate.  My previous post bore a title "SGML vs setext"
  and was meant to provide a glimpse of the structure-enhanced
  markup method for the setext-illiterati and to compare pure
  usability aspects of it (in online publishing) to same of RTF and
  SGML, both often **taken** to provide rougly the same level of
  services.  I know that it isn't so, but that's besides the point
  right now, as far too many people have <it>figured<\it> it out
  that way.  This article is all about "setext _and_ SGML," to
  underline threads common to both, clarify a few more points and,
  perhaps, muddy up some others. 

  Eliot has gone to some lengths to study the concept of the setext
  as that described in the basic introductory document available
  from the TidBITS listserver.  It certainly isn't the most descrip-
  tive one around and could probably been made both longer and more
  technical in scope.  This, however, has never been its intention,
  but only of providing the interested Mr & Ms Public with a first
  glance at the method.  Last in that doc I explicitly welcome
  further inquiries from those parties that are fascinated and
  motivated enough to take time off their busy schedules to call. 
  This may to a large degree explain Eliot's inability to approach
  and judge the setext as another <Autoritative!Answer> generalized
  markup method <\Autoritative!Answer> and, instead, declare it to
  be just a mere **formatting** language_ and/ or a more or less
  clever data notation_.  Not so.  There's more to setext than meets
  the eye.  It has been designed this way. 


  Silence. Camera? Action!
--------------------------
  Eliot says: 

> Setext is a *formatting* language.  Except for marking hyperlink
> anchors, it does nothing more than indicate text that is to be
> emphasized.  This is not a criticism of setext, just a statement
> of fact. 

  It is not a <true>statement of fact <\true>, it is an assumption. 
  Setext may be limited but is every bit as generalized a language
  as is the SGML.  It does, however, march to the sound of two
  different drums: that of an Instant-Aha! comprehension_ and the
  ease of implementation.  Setext is not primarily a method to
  format text for online distribution; it uses the constraints (or,
  in scientispeak, the semantics) of the lowest common denominator
  medium to capture (or encode) a limited, but hardly "small,"
  amount of logical structure into linearly-arranged matter.  The
  fact that I had consciously chosen to **label** four out of its 11
  _optional_ typotags_ bold-, italic-, underline- and hot-tt has
  nothing to do with their "weight" or meaning.  For starters, they
  have no intrinsic meaning.  They're only labels and are not
  <repeat> not <\repeat> bound to any strictly-typographical
  constructs, although they ~may~ be treated as such if a front-end
  author so desires. 

  Acc.  to my private theory most online-people, weaned on a decade
  of mindlessly-applied WYSIWYG, _intuitively_ associate structure,
  with that of typographical richness.  We can all debate why this
  is so, but this undeniably is closer to a <statement-of> fact
  <\statement-of> than the previous claim.  In view of that I really
  saw no option but to label the typotags in such fashion that could
  easily be associated with tangible (or visible) effects...  and
  should they later happen to be mistaken for strictly-typographic
  markup then I could envision worse disasters.  The alternatives
  were in any case unacceptable: call them Emphasis- Style-1 -2 -3
  --or, I don't know which would be worse still-- General-Tag-This
  -That -And -So -On. 

  That said, the setext does have one typotag that is more typo-
  graphical in flavour_ that are all the others: it is the quote-tt,
  leading right-chevron/ bigger-than-character, followed by a space,
  that indicates that a line

> has been included in a text from some other source.

  Such instances, if displayed, are to be shown in a monospaced
  font, where multifonts are used, so as to underline their foreign,
  included, status.  Still no default binding to any particular size
  or face.  End of typographic markup. 


  Euphemism Alert
-----------------

> SGML languages, at their purest, are *NOT* formatting languages,
> but are intended to capture the logical structure of the content
> *as it relates to constructs in the real world*, in exactly the
> same way that object-oriented programs and user interfaces are
> intended to model the properties and behaviors of objects in the
> real world. 

  Which the latter are nowhere near good at, when we think of it,
  but that's another matter.  Still, long as we speak of emulating
  **constructs in the real world** we may perhaps bear in mind that,
  apart from print media, there are no universal/ valid/ U-name-it
  models in strictly-electronic/ online publishing yet.  Therefore
  using the SGML to model something that's chaotic at best, intan-
  gible at worst is not in itself any better than using the setext. 
  At least the latter enforces a certain degree of universal online
  order, where none previously existed.  With the SGML you are
  certainly _creating_ order, but ONLY if source text is submitted
  to an available, dedicate front-end. 


> It is clear from Ian's responses and the comparison chart below
> that he has not fully grasped this distinction and is thus
> comparing apples (setext, RTF, etc.) to oranges (SGML languages
> that do not contain or model formatting semantics).

  So that we all may hereafter sleep soundly at night I hereby
  declare that I do, have grasped that distincion fully, or at least
  to the extent that it is graspable without extensive reprogramming
  at some SGML summer camp ("Sun, Fun and SGML, too" ;-)).  And
  whether it may be wholly inappropriate to compare the SGML with
  the lesser mortals or not, we could probably agree that (a) the
  three previously listed are all graphic markup languges, (b) RTF
  is often considered a viable alternative to the SGML because of
  its potential market saturation factor and (c) how else, but via
  open-minded comparison can the public ever arrive at valid
  conclusions?

  In addition to all that I claim that the setext, though limited in
  scope, is every bit as generalized as is the SGML.


  Advantages weighted and classified
------------------------------------

> SGML as a data notation, divorced from any application semantics,
> has certain advantages and disadvantages.  It's chief advantage is
> that it is a true standard.  It's chief disadvantage is that it is
> a complex notation that does take some programming and processing
> power to fully process. 

  We all agree on its advantages. The setext has come about to
  address what could be called the effect of the disadvantges
  of it and other visually-obtrusive coding methods, applied to
  simple text that should be easiy to read everywhere.


> Setext as a data notation has certain advantages and
> disadvantages.  Its chief advantage is that it is simple (in stark
> contrast to RTF, for example).  Its chief disadvantages are that,
> being simple, it is somewhat limited, and not being a standard,
> cannot be relied on to the same degree that standard notations 
> can be. 

  All true, albeit the limitations of it are that price that online
  publishers pay for the privilege of having a text that's bith
  universally readble AND structure-enhanced at the same time.  I
  had never intended the setext to be used for other than periodic
  online publications of time- and subject-topical kind, where the
  content:graphic-enhancement ratio never been very high anyway. 
  And if past experiences are anything to go by, I believe that the
  concept has proven to be successful.


> The second aspect of the comparison is the richness of semantic
> expression.  SGML has unlimited potential for semantic expression,
> in other words, capturing the details of what a given bit of
> information is *about*, not how it looks or should be processed. 
> This is the distinction between truly generic markup and
> typesetting schemes that have some indirection in them. 

  Nolo contendere.  Italian for "pass the ice-cream bowl, please."
  On the other hand the ASCII publishing has a large graphic
  potential than is seldom realized.  Most people, in their mind
  thinking of the richtext WYSIWYG, never even try to execute the
  options that are at their disposal...  creative use of tables,
  logical text constructs, delimiters, pseudo-graphics etc.  Yet if
  Vladimir Mayakovsky could why can't they? So in view of that I
  can't see the SGML route as other than pure overkill for strictly
  small-size online-publishing use. 


  The parrot is dead, anyway
----------------------------

> Setext does not meet this definition of generic markup, and as
> such, is not appropriate or useful for applications that need true
> generic markup. 

    Yes it does.
      No it doesn't
    Yes it does.
      No it doesn't
    Does.
      Doesn't.
    Does.
      Doesn't.
    Does, does, does, does.
      Doesn't, doesn't, doesn't, doesn't.

    Anyway, the parrot has long been dead, we can all agree on
    that ;-))


> I make the assumption that anyone who goes to the trouble to use
> or to consider using SGML for text and data management needs the
> full power of generic markup, *whether they realize it or not*. 

  It may well be true, but is the opposite equally so?

    "Anyone considering use of generic markup needs the full
     power of the SGML, **whether they realize it or not**"

  This rhetoric question is here preanswered for you: NOT.
  BTW, it's an Authoritiative Answer from the Authoritative-
  Answers Server[tm].


> Since the original question about online formats was asked in
> comp.text.sgml, I made the assumption that an SGML solution was
> desired.  I then attempted to show how setext could be made to be
> an SGML solution. 

  Funny you should mention it, I saw the original as being
  directed _primarily_ to alt.hypertext (first on the Newsgroups:
  line of Greg_ R Block's posting and following) and so I replied
  there. I do not partake in discussions in the sgml forums as
  I consider myself an amateur at best in this respect.


  Catering to the lowest common denominator
-------------------------------------------

> Note that setext has, from my point of view, somewhat limited
> utility because what it does *only serves the lowest common
> denominator*.  SGML solutions can serve the whole range of
> applications, from the simplest to the most complex. 

  Alas, this is yet another unfortunate effect of your orginal
  first-sight dismissal of setext as non-generalized markup, there-
  fore of lesser interest.  Setext does not **only** serve the
  lowest common denominator media, it conforms to the constraints of
  them in order to be readable everywhere.  There is nothing in it
  preventing arrival of an advanced implementation, say a richtext/
  indexing/ you-name-ing browser, post-processor and incremental-
  database front-end.  As with the SGML the beauty of it is in the
  eye of the beholder, though never as is with the SGML dependent on
  the very _presence_ of a browser. 

  In fact, were I to attempt to categorize it in relation to the
  SGML, I'd definitely underline the fact that both are generalized
  methods but only the setext provides an ability to be "consummed"
  ALSO in the lowest common denominator state.  More technically, I
  believe that --setext's limitated scope readily admitted-- the
  main difference between the two is the SGML's potential of being
  decoded acc.  to different DTD's in the same front-end.  This will
  probably never happen with the setext, since the implementations
  of browsers for it would then have to become as complex as those
  for the SGML.  Rather, it should be said, that each dedicated
  setext front-end will be an instance implementation of a single,
  hardcoded DTD.  Er...  I hope it all makes logical sense, as it
  does here ;-))


  Points taken & reconsidered
-----------------------------

>>  Eliot Kimber_ of IBM has declared_ it to be "a very
>>  primitive, obviously easy to implement and interchange."

> This is a statement of fact.

  This is not a statement of fact.  It is an informed, if somewhat
  hastily arrived at, and in my view mistaken, opinion.

> Compared to other languages for the structuring of information for
> online retrieval and presentation, setext is primitive.  IBM has
> invented a 300+ element_ language with complex semantics [....]. 
> If setext is not primitive compared to this, I obviously don't
> understand the meaning of the word "primitive". 

  Apparently not.  The word "primitive" is, apart from its meaning
  of "simple" also a low-value judgement of which you cannot have
  been unaware.  Now, that is a statement of <fact> fact <\fact>. 


> Again, this is not to say that setext is not useful, just that in
> terms of the functions it provides and the semantics it captures,
> it cannot compare to other systems of much greater complexity,
> including the OSF, Docbook, IBMIDDoc, and Daveport designs, not to
> mention other HyTime-based work such as the various IETM projects
> and other industry-specific applications like the ATA and CALS
> work. 

  No, indeed not, but can _they_ provide a solution for the net-
  worked have-nots, perpetually squeezed-in between the ever
  NEWER-FASTER-SLEEKER-BETTER hi-tech and the trend$y lo-tech of
  the haves? Setext is an example of what I call ADEQUA-TECH[tm],
  the bicycle-like vehicle among the gas-guzzling monsters on
  that internetwork of ours.  No, I don't fancy cars either. 


>>  contrast, SGML et al judged through the bias of human-readable-
>>  text/ ASCII will appear unduly complex and mostly inaccessible to
>>  anyone having but the lowest common denominator hardware/ software
>>  at their disposal (80% of all users? 90%?)

> I beg to differ.  As the folks at Exoterica will attest, SGML can
> be made no less human readable than setext.

  I am not familiar with what those folks at Exoterica might be
  doing, but if you are thinking of the shortrefs, which I take to
  be a minimal-size embedded notation that is expanded later via
  aliases? in the DTD or equivalent then, clearly, we're talking two
  different strategies.  Setext is, at its simplest, entirely devoid
  of any visible tags.  It will thus appear as merely some rigidly-
  formatted piece of plaintext yet at the same time continue to
  carry the minimal structure (== subdivisions of the whole into
  parts above the paragraph level, a basic outline notation). No
  amount of SGML-minimizing can approach that.


  Who has missed who's point and vice-versa
-------------------------------------------

>>>   SGML Source --> SGML2SETEXT --> setext --> setext viewer

>>  It strikes me as no little ironic that in order to view enhanced
>>  plaintext (i.e. the setext) in a basic-structured manner, say an
>>  outline of the submitted text, one would have to first encode it
>>  with SGML, then pipe it through a filter with a DTD acronym thrown

> You have missed my point.  The point was not to first encode an
> setext document into SGML to then immediately transform it back
> into setext, but to use an setext viewer to view *any* existing
> SGML document by mapping that document into setext.  Remember that
> setext is a *formatting notation* and that is the way I have used
> it here. 

  He, he....  I have not missed your point, you have missed mine. 
  Setext is an alternative to SGML in certain applications, where
  neither the generalized complexity, nor the cost of the SGML route
  can be justified.  Mapping SGML-notation documents onto setext,
  as you say, is missing the point up to a point, and twice over. 
  For one, such setext viewer(s) would then have to be much more
  complex than otherwise would be the case.  For another, adding
  SGML-markup to source texts automatically lowers the overall
  comprehensibility of these, shortrefs or no refs.  Setext is
  designed _also_ to cater to all those among ourselves that, even
  though they may have ready tools at their disposal, would rather
  "type" or "cat" files as they come in, because of the force of
  habit, procrastination or choice.  After all, if you know that 
  the text in question is a setext, thus readable anyway, then why 
  should you bother to launch, enter, load in, access &c just for 
  the privilege of casting an eye on **some** text?


  Cost of basic structure encoding
----------------------------------

> I'd have thought > that the opposite would be an altogether
> more-agreeable solution:

>>     plaintext --> setext --> setext2SGML --> SGML viewer

> There's nothing wrong with this path, but it misses the point made
> above. 

  I beg to differ, no less for strictly cost-of-basic-encoding
  reasons.  Not having studied it in depth I should perhaps not
  deliver any official prognoses but it appears to me that basic
  hand-encoding of setext can be much, much cheaper than that of
  enSGMLising. 


>>  Obviously, Kimber has all the resources at his beck and call and
>>  expects that others will have them too.  We may all yearn to become

> I have no more resources than anyone else with a desktop computer,
> access the Internet, and a C compiler, at least for the purposes
> of this discussion.

  Well, perhaps your outlook in relation to text encoding standards
  would have changed had you but had an asynchronous, 2400bps uucp
  connection to a mainframe at your disposal and never enough time
  to master a C compiler, much less make it perform ;-))


> I haven't proposed anything that can't be done by anyone who can
> write a little C code to Windows, or XWindows, or Mac, can
> integrate ARCSGML or SGMLS into a program, and has a computer to
> run it on.  This may require cleverness and skill, but not
> resources out of the ordinary. 

  PARSED! -->ANOTHER INSTANCE OF INTERNET-HI-FLIER ATTITUDE!<--


> I didn't suggest that everyone license DynaText or buy Omnimark
> or get a RISC machine.  That's clearly what big enterprises with
> big problems and big budgets need to do.  But someday, probably
> very soon, that degree of power will be available to everyone
> and will be the lowest common denominator.

  We'll cross that bridge when we come to it. In the meantime....


>>  1Mbit/sec-access high-flyers of the Internet, but in the meantime
>>  many of us have to make do with but Have-A-Mac and never enough
>>  funding to equip it with enough RAM to satisfy our needs.

> You've obviously never tried to order RAM within IBM :-).

  No, but could hardly think it more taxing than having to pay
  for it out of your own pocket.


> Easy-O-Meter[tm]
------------------

>> ______________  ___________ RTF  ___________ SGML  __________ setext
>>    generalized               no               YES                yes
>>        markup?

> I would argue that, from my argument above, setext does not
> qualify as the same sort of generalized markup that SGML enables. 
> Setext is generalized to the degree that there is an indirect
> mapping between the setext codes and their actual presentation
> effect, but it does not capture information semantics the way 
> SGML languages can and do. 

  No, of course not, but then I **did** weight the YES in favour
  of the SGML. Or did you think that that uppercase YES appeared
  there for no reason at all? By mistake?


>> --------------  ---------------  ----------------  -----------------
>> #typographical  a finite set     unlimited set     3 typographical
>> tags employed?                                     1 hypertextual

> In the sort of SGML languages I consider worth talking about,
> there are no "typographical" tags at all, because SGML languages
> capture information semantics, not formatting.  SGML data
> structured with languages of this sort is no more typographical
> than SGML databases. 

  Right, agreed. Unfortunate usage of words in an attempt to make
  it more easily understandable to the world at large. Try to sell
  SGML to any small-time publisher and the first thing they'll
  inquire about will be its ability to express styles.


>> --------------  ---------------  ----------------  -----------------
>>   tag overhead  +25%?            +30%?             +9% (verified)

> With SGML's tag minimization features, SGML can have no more
> overhead than setext.  I've seen some of the things the Exoterica
> folks have done with shortrefs and datatag, and it's pretty
> amazing (gives me the willies, though). 

  Of course, this is a point that will vary widely with the
  application.  Yet because the setext _is_ limited in scope it's 
  overhead can never be much above the verified figure of 9%.


  Is it over yet? I have a date
-------------------------------

> I want to emphasize that my intent is not to denegrate setext,
> because it is, as I've said, an elegant solution to a difficult
> problem.  My intent is only to emphasize the difference in focus
> and approach to the solutions to the sorts of problems setext
> solves and the full range of problems that SGML can solve, and
> that a comparison between notations like setext and fully-generic
> SGML languages is not a valid comparison. 

  Nor have I taken it as such, on the contrary; open debate is
  always preferable to no debate.  At the very least it allowed me
  to cast a light on the setext, in not too-inappropriate a forum,
  be it just a "notation" or another generalized graphic markup 
  method.  OK, OK, "bask in the enlightened light of the SGML"... 
  have it your own way ;-))


__Ian Feldman <ianf@random.se> "Those that do not understand
      Unix are bound to invent it, poorly" --Henry Spencer

   $$

.. The sharp-eyed among you will note, that here-appended notation
.. of hypertextual anchor expansion differs from that served in the
.. previous post of mine (news:a7fd5104@random.se).  Indeed, this
.. portion of the setext is still largely undefined and so I am
.. experimenting with various methods to arrive at an optimal
.. solution to the problem.  Ye sharp-minded 'uns will also 
.. immediately understand the reason for those changes.
.. _typotags (so named because at worst they will appear as mere typos in text)
.. _timbers (hey! I like the timbre of that sentence ;-))
.. _notation news:19930423.072919.779@almaden.ibm.com/"one of many similar schemes"
.. _language news:19930423.072919.779@almaden.ibm.com/"it does nothing more"
.. _flavour (well, if gluons can come in flavours so why not the typotags?)
.. _element (bully for you)
.. _comprehension ("ability to understand")
.. _Kimber <drmacro@ralvm13.vnet.ibm.com> (Eliot Kimber)
.. _Greg news:1qn588INN27o@uwm.edu
.. _Feldman <ianf@random.se> (Ian Feldman, Current Setext Oracle)
..


# original headers, suppressed on account of appearing AFTER a twodot-tt
# From ianf Sun Apr 25 23:42:30 MET DST 1993
# Path: random.se!ianf
# From: ianf@random.se (Ian Feldman)
# Newsgroups: alt.hypertext,comp.multimedia,alt.news-media,comp.text,comp.text.sgml,comp.sys.amiga.multimedia
# Date: Sun, 25 Apr 93 22:03:47 +0200
# Message-ID: <a800bb38@random.se>
# X-URL: file://garbo.uwasa.fi/mac/tidbits/setext/setext+sgml_02.etx
# X-URL: file://ftp.ifi.uio.no/pub/SGML/comp.text.sgml/by.msgid/a800bb38@random.se
# X-Aftp: <garbo.uwasa.fi> :/mac/tidbits/setext/setext_concepts_Aug92.etx
# References: <1qn588INN27o@uwm.edu> <19930416.063132.922@almaden.ibm.com>
#             <19930420.063124.67@almaden.ibm.com> <4942@ulysse.enst.fr>
#             <a7fd5104@random.se> <19930423.072919.779@almaden.ibm.com>
# Content-Type: setext/plain; charset=ascii_827
# Organization: random design -- "any old TTY will do"
# Lines: 507
# Summary: setext is pure potential embossed in plaintext
# Subject: Re: Looking for Electronic Publshing formats... [long]