From SGML Mon Apr 26 09:32:09 MET DST 1993 Newsgroups: alt.hypertext,comp.multimedia,alt.news-media,comp.text, comp.text.sgml,comp.sys.amiga.multimedia From: Erik Naggum Message-ID: <19930426.003@erik.naggum.no> Date: 26 Apr 1993 03:14:00 +0200 References: <1qn588INN27o@uwm.edu> <19930416.063132.922@almaden.ibm.com> <19930420.063124.67@almaden.ibm.com> <4942@ulysse.enst.fr> <19930423.072919.779@almaden.ibm.com> Subject: Re: Looking for Electronic Publshing formats... [long] Lines: 107 Status: R [Erik Naggum speaking] ====================== Through the barrage of silliness in the referenced article, a few gems can still be seen. I think we should focus on them, so I have cut more than liberally in the article, which IMNSHO the author should have done before posting it. [Ian Feldman] : | According to my private theory most online-people, weaned on a decade | of mindlessly-applied WYSIWYG, _intuitively_ associate structure with | that of typographical richness. : | [As] long as we speak of emulating constructs in the real world, we may | perhaps bear in mind that there are no universal models in strictly- | electronic/online publishing yet. : | I had never intended setext to be used for other than periodic online | publications of time- and subject-topical kind, where the content to | graphic-enhancement ratio never were very high anyway. And if past | experiences are anything to go by, I believe that the concept has | proven to be successful. : | Setext does not _only_ serve the lowest common denominator media, it | conforms to the constraints of them in order to be readable everywhere. : | Setext is, at its simplest, entirely devoid of any visible tags. It | will thus appear as merely some rigidly-formatted piece of plaintext | yet at the same time continue to carry the minimal structure (i.e., | subdivisions of the whole into parts above the paragraph level, a basic | outline notation). : | [B]ecause setext _is_ limited in scope its overhead can never be | much above the verified figure of 9%. I have a few comments on what I consider to be a cause of the confusion so far, and some final comments on where setext and SGML fit in. Coombs, Renear and DeRose : Markup Systems and the Future of Scholarly Text Processing, in CACM 30/11 (1987) pp 933-947, discuss markup systems in general, and give some valuable pointers to what we might otherwise take for granted. Our writing system introduces us to punctuational and presentational means of conveying the structure of information, the latter including horizontal and vertical spacing, numbering of lists, paragraphs, chapters and the like. The article discusses four types of markup: no markup (not even punctuation), presentational, procedural, and descriptive. The procedural markup systems associate control words with procedures with fixed actions. The descriptive markup systems associate markup constructs with types of elements of the structure. Against this background, SGML is a purely descriptive markup system, and as a language, it provides declarations for the elements (among other things), such that they are akin to variables in other languages. As elsewhere, the meaning of variable names can differ according to context, but an SGML system provides the application with a structure of named elements that it can act up on in any number of ways. In SGML, the markup is rigid and this is exploited by offering validation services for SGML documents as part of the processing. Indeed, the idea that the markup is or is not "valid" independent of its processing is one of SGML's strongest points. Compare now setext, which is a sort of cross-breed between punctuational and presentational markup system with delusions of grandeur if I may put it in a hostile form. One of the disadvantages of punctuational markup, as detailed in the above article, is its complexity and the difficulty of authors to adhere to conventions. Precisely this complexity will make it very hard to validate punctuational markup systems, which is all the more necessary because of the difficulty in adhering to them. Presentational markup systems suffer from a large variety of means to convey the same structural information, as found in variations in indentation, vertical spacing, etc. Since such systems are mainly intended to be "processed" by the human eye, "validity" will also be largely undefined for documents in this system. setext's redeeming quality is that it attaches descriptive semantics to the punctuational-presentational markup, instead of procedural semantics, which is definitely the more usual. This still does not make setext any more a descriptive markup system than a typewriter is a typesetting system, even if it can use more than one font. We should diligently strive to call a spade a spade, and not wander off into unknown territory claiming this and that about breakthroughs in short-distance mud transportation technologies. If setext is intended for the ASCII-based on-line publications where before they had only ASCII, and most readers will continue to have only ASCII terminals as their primary means of viewing the publications, then setext may be a good candidate for that specific field of application. It is quite clear that setext is an end-user presentational gimmick with high-feature ambitions that may be useful to ill-equipped readers if we compare it with SGML. If, however, we compare like with like, we find that "final formatted form" is explicitly excluded from SGML's field of application, and that it is central to setext's field of application, so never the twain shall meet. ("Formatted" was changed to "imaged" in the amendment to the standard, but the original was better, IMHO.) Publications who might want to look into several media for presentation and on-line publication may well find setext to be suitable for their _output_ needs, in addition to postscript and DVI files. However, neither of these formats are suitable as _input_ formats. This is where SGML comes in. "If you must compare apples and oranges, don't use redness and smoothness as your criteria ... unless, of course, you're deliberately trying to mislead." -- Michael A. Padlipsky, 1985 Best regards, -- Erik Naggum ISO 8879 SGML +47 2295 0313 Oslo, Norway ISO 10744 HyTime ISO 9899 C Memento, terrigena ISO 10646 UCS Memento, vita brevis $$ From ianf Tue Apr 27 01:52:43 MET DST 1993 From: ianf@random.se (Ian Feldman, Current Setext Oracle) Newsgroups: alt.hypertext,comp.multimedia,alt.news-media,comp.text, comp.text.sgml,comp.sys.amiga.multimedia Date: Mon, 26 Apr 93 21:12:58 +0200 Message-ID: X-URL: file://garbo.uwasa.fi/mac/tidbits/setext/setext+sgml_03.etx References: <1qn588INN27o@uwm.edu> <19930416.063132.922@almaden.ibm.com> <19930420.063124.67@almaden.ibm.com> <4942@ulysse.enst.fr> <19930423.072919.779@almaden.ibm.com> <19930426.003@erik.naggum.no> Content-Type: setext/plain; charset=ascii_827 Organization: random design -- "Opinions, cheaply" Lines: 188 Summary: little silliness is what makes this medium human Subject: Re: Looking for Electronic Publshing formats... [long] Status: R beating the setext drum, proudly ================================== by Ian Feldman_ Rising up in the dead of the night to the defense_ of domains of the SGML, Erik Naggum_ delivers a nicely-put summation of hither- to-gained common knowledge of the strengths and weaknesses of by myself advocated structure-enhanced text markup method for online publishing use. Given that I readily admit the limits to usefulness of the setext, esp. in heavy-duty, "professional," applications where nothing but the SGML WILL DO, perhaps nothing further needed to be said about it. We all realize that should ever a different markup be required in similar circumstances, it would have to be as complex as is the SGML. Therefore, why bother? Still, no less for the sake of completeness before we wrap this debate up, I feel that I ought to mention that several of Erik's points have indeed been addressed in setext, to the extent that it was deemed possible. I have yet to read the referenced CACM article_, which sounds an appropriate basis enough for the purpose of this discussion, but the accumulated effect of other works in the field that I've read must have inspired me enough to come up with such an Obviously Grand-Delusional Scheme To Wrap Text Up ;-)) For starters, both the important document-validation and adherence (to punctuational markup; difficulties of) aspects have not passed unmentioned in the setext. They are also explicitly documented, albeit not in precisely such semantically-correct terms, in the two sermons_ that have been issued and are stored at the setext archival site_ (which is _not_ the same as the TidBITS listserver, see below). Erik Naggum says: ------------------- > In SGML, the markup is rigid and this is exploited by offering > validation services for SGML documents as part of the processing. > Indeed, the idea that the markup is or is not "valid" independent > of its processing is one of SGML's strongest points. Agreed. Setext "employs" a technique of _detection_ whether any submitted doc is a setext or not -- I suppose it could be called a form of validation prior to decoding. If not containing at least one valid subhead or title, then the document is to be treated (and _paged_ through by mechanical means if displayed) as just another plaintext. All the other typotags are considered optional and of lesser importance so, as much for reasons of simplicity, as for the need to prevent delays at runtime, they're not subjected to checks until decoding. Admittedly this process is a simple one, but it's there nevertheless. > setext, is a sort of cross-breed between punctuational and > presentational markup system with delusions of grandeur if > I may put it in a hostile form. One of the disadvantages of > punctuational markup [...] is its complexity and the difficulty > of authors to adhere to conventions. Precisely in order to address that latter point, effectively to _lower_ the threshold of complexity and raise that of input tolerance, setext's SOLE elements of importance, the subheads (and titles), require but that the writer/ publisher create them using a monospaced font (which, surprizingly enough, is not a widely- understood base prerequisite for all publishing intended for online distribution and -consumption). Subheads and titles, you may care to recall, are both defined as "string of characters, that may be indented by any number of spaces, followed by a line ALWAYS beginning in column 1 and made up of as many dashes (or equal-signs for the titles) as there is the number of _rightmost_visible_ characters in the subhead/ title above." Imagine a "2-line-box" that is anchored firmly at the left margin, where the asterisk is below, and that the last _visible_ character of the line above it is the plus (+). Grab hold of the latter and move freely left-right, never mind any trailing spaces, here expressed with underscore characters "_" The all- dash line's length is supposed to equal that of line's above length counted in characters from column 1 to that of the + your possibly-indented subhead here+_________ *--------------------------------------------____ Indeed, both the first, subhead-string, and the subhead-typotag lines can carry trailing, to the eye defacto invisible, white space and still be considered valid. Perhaps this may strike someone as needlessly complex and confounding but as the method stands and falls on this _sole_ instance of non-negotiable adherence to punctuational exactness, equal rightmost-visible monospaced length of two lines of which one is a tag for the other, I decided to make it as tolerant of input errors as possible. After all, either line could end up with trailing spaces unbeknownst to the author/ publisher. As for the need for adherence in regard to the other typotags -- the setext is explicitly targeted towards _publishers_ of online periodic matter, rather than towards (mere ;-)) authors. This is an important distinction, since publishers can more readily be expected to ensure compliance with a given markup, whether they have full validation tools at their disposal or not (or else they're "worse publishers" than the other guys). I had never considered it generic enough for use by everybody and his brother (in-law['s {cousin's /neighbor/}]). Whatever it is that makes Erik Naggum, Eliot Kimber_, myself and a few other individuals (too few) adapt our writing style to the rigours of this narrow- bandwith medium, hand-code our texts in some near-optimal fashion, could therefore be expected of other _publishers_ as well, those for whom the setext primarily has been defined. We are authors, but we're also publishers. Those that do not care how (chaotic) their thoughts may appear online are "typing" at best, not writing and definitely not "publishing." They post, therefore they exist. areas of application ---------------------- > If setext is intended for the ASCII-based on-line publications > [...] then [it] may be a good candidate for that specific field > of application. It is quite clear that setext is an end-user > presentational gimmick with high-feature ambitions that may be > useful to ill-equipped readers if we compare it with SGML. It is intended for precisely that kind of audience and acc. to my experiences the ill-equipped readers are the norm in RealLife[tm], rather than the exception. It is the Academia and the High Commerce that are the freaks here ;-)) So it is gimmicry, smoke and mirrors is what we have to contend ourselves with, in lieu of absence of more suitable tools. > If, however, we compare like with like, we find that "final > formatted form" is explicitly excluded from SGML's field of > application, and that it is central to setext's field of > application, so never the twain shall meet. Sure, how can we compare something to That Which Has No Equals? uses of setext for input -------------------------- > Publications who might want to look into several media for > presentation and on-line publication may well find setext to be > suitable for their _output_ needs, in addition to postscript and > DVI files. However, neither of these formats are suitable as > _input_ formats. This is where SGML comes in. The last sentence is disputable at best... by virtue of its inobtrusive and highly "natural" notation (observe the quotes) the setext may be a more appropriate method for hand-encoding of _basic_ structure into (limited-size) documents than is the SGML. Once done they may later be enhanced further still in more-appro- priate, validated markup environemnts. Else this wish-list from a known Internet high-flyer figure_ has no basis in reality: > Some concrete things to do for setext: > - do the setext to troff, setext to postscript, setext to rtf > formatters so we can put things onto paper > - do the setext to RFC format converter so we can write RFCs > in setext and then convert them off easily > - get setext blessed by Postel et al in the IETF so we can > do proper MIME stuff > - nail down the URL business > - do the setext to HTML bit > ah, there is plenty of work to be done... Delusions of grandeur, eh? __Ian "The ``I had the bug narrowed down to a subroutine and then I lost all interest'' hacker" Feldman .. _site file://garbo.uwasa.fi/mac/tidbits/setext/ .. _sermons file://garbo.uwasa.fi/mac/tidbits/setext/?sermon_* .. _figure (delusions of grandeur or NOT, I am no name-dropper; let him come forward and disclose his own views in person) .. _defense news:19930426.003@erik.naggum.no .. _article news:19930426.003@erik.naggum.no/"Coombs, Renear and DeRose" .. _Naggum (Erik Naggum) .. _Kimber (Eliot Kimber) .. _Feldman (Ian Feldman) $$