# From: ianf@random.se (Ian Feldman, Current Setext Oracle) # Newsgroups: alt.hypertext (complete, original headers at end of file) # Date: Sun, 25 Apr 93 22:03:47 +0200 # Message-ID: # X-URL: file://garbo.uwasa.fi/mac/tidbits/setext/setext+sgml_02.etx # Organization: random design -- "any old TTY will do" # Reply-To: setext-list-request@random.se # Lines: 536 # Summary: setext is pure potential embossed in plaintext # Subject: Re: Looking for Electronic Publshing formats... [long] setext and SGML ================= by Ian Feldman_ In his last-referenced article <\news:19930423.072919.779@almaden.ibm.com> Eliot Kimber_ adds a few timbers_ to the fire in what has now become a wholly different debate. My previous post bore a title "SGML vs setext" and was meant to provide a glimpse of the structure-enhanced markup method for the setext-illiterati and to compare pure usability aspects of it (in online publishing) to same of RTF and SGML, both often **taken** to provide rougly the same level of services. I know that it isn't so, but that's besides the point right now, as far too many people have figured<\it> it out that way. This article is all about "setext _and_ SGML," to underline threads common to both, clarify a few more points and, perhaps, muddy up some others. Eliot has gone to some lengths to study the concept of the setext as that described in the basic introductory document available from the TidBITS listserver. It certainly isn't the most descrip- tive one around and could probably been made both longer and more technical in scope. This, however, has never been its intention, but only of providing the interested Mr & Ms Public with a first glance at the method. Last in that doc I explicitly welcome further inquiries from those parties that are fascinated and motivated enough to take time off their busy schedules to call. This may to a large degree explain Eliot's inability to approach and judge the setext as another generalized markup method <\Autoritative!Answer> and, instead, declare it to be just a mere **formatting** language_ and/ or a more or less clever data notation_. Not so. There's more to setext than meets the eye. It has been designed this way. Silence. Camera? Action! -------------------------- Eliot says: > Setext is a *formatting* language. Except for marking hyperlink > anchors, it does nothing more than indicate text that is to be > emphasized. This is not a criticism of setext, just a statement > of fact. It is not a statement of fact <\true>, it is an assumption. Setext may be limited but is every bit as generalized a language as is the SGML. It does, however, march to the sound of two different drums: that of an Instant-Aha! comprehension_ and the ease of implementation. Setext is not primarily a method to format text for online distribution; it uses the constraints (or, in scientispeak, the semantics) of the lowest common denominator medium to capture (or encode) a limited, but hardly "small," amount of logical structure into linearly-arranged matter. The fact that I had consciously chosen to **label** four out of its 11 _optional_ typotags_ bold-, italic-, underline- and hot-tt has nothing to do with their "weight" or meaning. For starters, they have no intrinsic meaning. They're only labels and are not not <\repeat> bound to any strictly-typographical constructs, although they ~may~ be treated as such if a front-end author so desires. Acc. to my private theory most online-people, weaned on a decade of mindlessly-applied WYSIWYG, _intuitively_ associate structure, with that of typographical richness. We can all debate why this is so, but this undeniably is closer to a fact <\statement-of> than the previous claim. In view of that I really saw no option but to label the typotags in such fashion that could easily be associated with tangible (or visible) effects... and should they later happen to be mistaken for strictly-typographic markup then I could envision worse disasters. The alternatives were in any case unacceptable: call them Emphasis- Style-1 -2 -3 --or, I don't know which would be worse still-- General-Tag-This -That -And -So -On. That said, the setext does have one typotag that is more typo- graphical in flavour_ that are all the others: it is the quote-tt, leading right-chevron/ bigger-than-character, followed by a space, that indicates that a line > has been included in a text from some other source. Such instances, if displayed, are to be shown in a monospaced font, where multifonts are used, so as to underline their foreign, included, status. Still no default binding to any particular size or face. End of typographic markup. Euphemism Alert ----------------- > SGML languages, at their purest, are *NOT* formatting languages, > but are intended to capture the logical structure of the content > *as it relates to constructs in the real world*, in exactly the > same way that object-oriented programs and user interfaces are > intended to model the properties and behaviors of objects in the > real world. Which the latter are nowhere near good at, when we think of it, but that's another matter. Still, long as we speak of emulating **constructs in the real world** we may perhaps bear in mind that, apart from print media, there are no universal/ valid/ U-name-it models in strictly-electronic/ online publishing yet. Therefore using the SGML to model something that's chaotic at best, intan- gible at worst is not in itself any better than using the setext. At least the latter enforces a certain degree of universal online order, where none previously existed. With the SGML you are certainly _creating_ order, but ONLY if source text is submitted to an available, dedicate front-end. > It is clear from Ian's responses and the comparison chart below > that he has not fully grasped this distinction and is thus > comparing apples (setext, RTF, etc.) to oranges (SGML languages > that do not contain or model formatting semantics). So that we all may hereafter sleep soundly at night I hereby declare that I do, have grasped that distincion fully, or at least to the extent that it is graspable without extensive reprogramming at some SGML summer camp ("Sun, Fun and SGML, too" ;-)). And whether it may be wholly inappropriate to compare the SGML with the lesser mortals or not, we could probably agree that (a) the three previously listed are all graphic markup languges, (b) RTF is often considered a viable alternative to the SGML because of its potential market saturation factor and (c) how else, but via open-minded comparison can the public ever arrive at valid conclusions? In addition to all that I claim that the setext, though limited in scope, is every bit as generalized as is the SGML. Advantages weighted and classified ------------------------------------ > SGML as a data notation, divorced from any application semantics, > has certain advantages and disadvantages. It's chief advantage is > that it is a true standard. It's chief disadvantage is that it is > a complex notation that does take some programming and processing > power to fully process. We all agree on its advantages. The setext has come about to address what could be called the effect of the disadvantges of it and other visually-obtrusive coding methods, applied to simple text that should be easiy to read everywhere. > Setext as a data notation has certain advantages and > disadvantages. Its chief advantage is that it is simple (in stark > contrast to RTF, for example). Its chief disadvantages are that, > being simple, it is somewhat limited, and not being a standard, > cannot be relied on to the same degree that standard notations > can be. All true, albeit the limitations of it are that price that online publishers pay for the privilege of having a text that's bith universally readble AND structure-enhanced at the same time. I had never intended the setext to be used for other than periodic online publications of time- and subject-topical kind, where the content:graphic-enhancement ratio never been very high anyway. And if past experiences are anything to go by, I believe that the concept has proven to be successful. > The second aspect of the comparison is the richness of semantic > expression. SGML has unlimited potential for semantic expression, > in other words, capturing the details of what a given bit of > information is *about*, not how it looks or should be processed. > This is the distinction between truly generic markup and > typesetting schemes that have some indirection in them. Nolo contendere. Italian for "pass the ice-cream bowl, please." On the other hand the ASCII publishing has a large graphic potential than is seldom realized. Most people, in their mind thinking of the richtext WYSIWYG, never even try to execute the options that are at their disposal... creative use of tables, logical text constructs, delimiters, pseudo-graphics etc. Yet if Vladimir Mayakovsky could why can't they? So in view of that I can't see the SGML route as other than pure overkill for strictly small-size online-publishing use. The parrot is dead, anyway ---------------------------- > Setext does not meet this definition of generic markup, and as > such, is not appropriate or useful for applications that need true > generic markup. Yes it does. No it doesn't Yes it does. No it doesn't Does. Doesn't. Does. Doesn't. Does, does, does, does. Doesn't, doesn't, doesn't, doesn't. Anyway, the parrot has long been dead, we can all agree on that ;-)) > I make the assumption that anyone who goes to the trouble to use > or to consider using SGML for text and data management needs the > full power of generic markup, *whether they realize it or not*. It may well be true, but is the opposite equally so? "Anyone considering use of generic markup needs the full power of the SGML, **whether they realize it or not**" This rhetoric question is here preanswered for you: NOT. BTW, it's an Authoritiative Answer from the Authoritative- Answers Server[tm]. > Since the original question about online formats was asked in > comp.text.sgml, I made the assumption that an SGML solution was > desired. I then attempted to show how setext could be made to be > an SGML solution. Funny you should mention it, I saw the original as being directed _primarily_ to alt.hypertext (first on the Newsgroups: line of Greg_ R Block's posting and following) and so I replied there. I do not partake in discussions in the sgml forums as I consider myself an amateur at best in this respect. Catering to the lowest common denominator ------------------------------------------- > Note that setext has, from my point of view, somewhat limited > utility because what it does *only serves the lowest common > denominator*. SGML solutions can serve the whole range of > applications, from the simplest to the most complex. Alas, this is yet another unfortunate effect of your orginal first-sight dismissal of setext as non-generalized markup, there- fore of lesser interest. Setext does not **only** serve the lowest common denominator media, it conforms to the constraints of them in order to be readable everywhere. There is nothing in it preventing arrival of an advanced implementation, say a richtext/ indexing/ you-name-ing browser, post-processor and incremental- database front-end. As with the SGML the beauty of it is in the eye of the beholder, though never as is with the SGML dependent on the very _presence_ of a browser. In fact, were I to attempt to categorize it in relation to the SGML, I'd definitely underline the fact that both are generalized methods but only the setext provides an ability to be "consummed" ALSO in the lowest common denominator state. More technically, I believe that --setext's limitated scope readily admitted-- the main difference between the two is the SGML's potential of being decoded acc. to different DTD's in the same front-end. This will probably never happen with the setext, since the implementations of browsers for it would then have to become as complex as those for the SGML. Rather, it should be said, that each dedicated setext front-end will be an instance implementation of a single, hardcoded DTD. Er... I hope it all makes logical sense, as it does here ;-)) Points taken & reconsidered ----------------------------- >> Eliot Kimber_ of IBM has declared_ it to be "a very >> primitive, obviously easy to implement and interchange." > This is a statement of fact. This is not a statement of fact. It is an informed, if somewhat hastily arrived at, and in my view mistaken, opinion. > Compared to other languages for the structuring of information for > online retrieval and presentation, setext is primitive. IBM has > invented a 300+ element_ language with complex semantics [....]. > If setext is not primitive compared to this, I obviously don't > understand the meaning of the word "primitive". Apparently not. The word "primitive" is, apart from its meaning of "simple" also a low-value judgement of which you cannot have been unaware. Now, that is a statement of fact <\fact>. > Again, this is not to say that setext is not useful, just that in > terms of the functions it provides and the semantics it captures, > it cannot compare to other systems of much greater complexity, > including the OSF, Docbook, IBMIDDoc, and Daveport designs, not to > mention other HyTime-based work such as the various IETM projects > and other industry-specific applications like the ATA and CALS > work. No, indeed not, but can _they_ provide a solution for the net- worked have-nots, perpetually squeezed-in between the ever NEWER-FASTER-SLEEKER-BETTER hi-tech and the trend$y lo-tech of the haves? Setext is an example of what I call ADEQUA-TECH[tm], the bicycle-like vehicle among the gas-guzzling monsters on that internetwork of ours. No, I don't fancy cars either. >> contrast, SGML et al judged through the bias of human-readable- >> text/ ASCII will appear unduly complex and mostly inaccessible to >> anyone having but the lowest common denominator hardware/ software >> at their disposal (80% of all users? 90%?) > I beg to differ. As the folks at Exoterica will attest, SGML can > be made no less human readable than setext. I am not familiar with what those folks at Exoterica might be doing, but if you are thinking of the shortrefs, which I take to be a minimal-size embedded notation that is expanded later via aliases? in the DTD or equivalent then, clearly, we're talking two different strategies. Setext is, at its simplest, entirely devoid of any visible tags. It will thus appear as merely some rigidly- formatted piece of plaintext yet at the same time continue to carry the minimal structure (== subdivisions of the whole into parts above the paragraph level, a basic outline notation). No amount of SGML-minimizing can approach that. Who has missed who's point and vice-versa ------------------------------------------- >>> SGML Source --> SGML2SETEXT --> setext --> setext viewer >> It strikes me as no little ironic that in order to view enhanced >> plaintext (i.e. the setext) in a basic-structured manner, say an >> outline of the submitted text, one would have to first encode it >> with SGML, then pipe it through a filter with a DTD acronym thrown > You have missed my point. The point was not to first encode an > setext document into SGML to then immediately transform it back > into setext, but to use an setext viewer to view *any* existing > SGML document by mapping that document into setext. Remember that > setext is a *formatting notation* and that is the way I have used > it here. He, he.... I have not missed your point, you have missed mine. Setext is an alternative to SGML in certain applications, where neither the generalized complexity, nor the cost of the SGML route can be justified. Mapping SGML-notation documents onto setext, as you say, is missing the point up to a point, and twice over. For one, such setext viewer(s) would then have to be much more complex than otherwise would be the case. For another, adding SGML-markup to source texts automatically lowers the overall comprehensibility of these, shortrefs or no refs. Setext is designed _also_ to cater to all those among ourselves that, even though they may have ready tools at their disposal, would rather "type" or "cat" files as they come in, because of the force of habit, procrastination or choice. After all, if you know that the text in question is a setext, thus readable anyway, then why should you bother to launch, enter, load in, access &c just for the privilege of casting an eye on **some** text? Cost of basic structure encoding ---------------------------------- > I'd have thought > that the opposite would be an altogether > more-agreeable solution: >> plaintext --> setext --> setext2SGML --> SGML viewer > There's nothing wrong with this path, but it misses the point made > above. I beg to differ, no less for strictly cost-of-basic-encoding reasons. Not having studied it in depth I should perhaps not deliver any official prognoses but it appears to me that basic hand-encoding of setext can be much, much cheaper than that of enSGMLising. >> Obviously, Kimber has all the resources at his beck and call and >> expects that others will have them too. We may all yearn to become > I have no more resources than anyone else with a desktop computer, > access the Internet, and a C compiler, at least for the purposes > of this discussion. Well, perhaps your outlook in relation to text encoding standards would have changed had you but had an asynchronous, 2400bps uucp connection to a mainframe at your disposal and never enough time to master a C compiler, much less make it perform ;-)) > I haven't proposed anything that can't be done by anyone who can > write a little C code to Windows, or XWindows, or Mac, can > integrate ARCSGML or SGMLS into a program, and has a computer to > run it on. This may require cleverness and skill, but not > resources out of the ordinary. PARSED! -->ANOTHER INSTANCE OF INTERNET-HI-FLIER ATTITUDE!<-- > I didn't suggest that everyone license DynaText or buy Omnimark > or get a RISC machine. That's clearly what big enterprises with > big problems and big budgets need to do. But someday, probably > very soon, that degree of power will be available to everyone > and will be the lowest common denominator. We'll cross that bridge when we come to it. In the meantime.... >> 1Mbit/sec-access high-flyers of the Internet, but in the meantime >> many of us have to make do with but Have-A-Mac and never enough >> funding to equip it with enough RAM to satisfy our needs. > You've obviously never tried to order RAM within IBM :-). No, but could hardly think it more taxing than having to pay for it out of your own pocket. > Easy-O-Meter[tm] ------------------ >> ______________ ___________ RTF ___________ SGML __________ setext >> generalized no YES yes >> markup? > I would argue that, from my argument above, setext does not > qualify as the same sort of generalized markup that SGML enables. > Setext is generalized to the degree that there is an indirect > mapping between the setext codes and their actual presentation > effect, but it does not capture information semantics the way > SGML languages can and do. No, of course not, but then I **did** weight the YES in favour of the SGML. Or did you think that that uppercase YES appeared there for no reason at all? By mistake? >> -------------- --------------- ---------------- ----------------- >> #typographical a finite set unlimited set 3 typographical >> tags employed? 1 hypertextual > In the sort of SGML languages I consider worth talking about, > there are no "typographical" tags at all, because SGML languages > capture information semantics, not formatting. SGML data > structured with languages of this sort is no more typographical > than SGML databases. Right, agreed. Unfortunate usage of words in an attempt to make it more easily understandable to the world at large. Try to sell SGML to any small-time publisher and the first thing they'll inquire about will be its ability to express styles. >> -------------- --------------- ---------------- ----------------- >> tag overhead +25%? +30%? +9% (verified) > With SGML's tag minimization features, SGML can have no more > overhead than setext. I've seen some of the things the Exoterica > folks have done with shortrefs and datatag, and it's pretty > amazing (gives me the willies, though). Of course, this is a point that will vary widely with the application. Yet because the setext _is_ limited in scope it's overhead can never be much above the verified figure of 9%. Is it over yet? I have a date ------------------------------- > I want to emphasize that my intent is not to denegrate setext, > because it is, as I've said, an elegant solution to a difficult > problem. My intent is only to emphasize the difference in focus > and approach to the solutions to the sorts of problems setext > solves and the full range of problems that SGML can solve, and > that a comparison between notations like setext and fully-generic > SGML languages is not a valid comparison. Nor have I taken it as such, on the contrary; open debate is always preferable to no debate. At the very least it allowed me to cast a light on the setext, in not too-inappropriate a forum, be it just a "notation" or another generalized graphic markup method. OK, OK, "bask in the enlightened light of the SGML"... have it your own way ;-)) __Ian Feldman "Those that do not understand Unix are bound to invent it, poorly" --Henry Spencer $$ .. The sharp-eyed among you will note, that here-appended notation .. of hypertextual anchor expansion differs from that served in the .. previous post of mine (news:a7fd5104@random.se). Indeed, this .. portion of the setext is still largely undefined and so I am .. experimenting with various methods to arrive at an optimal .. solution to the problem. Ye sharp-minded 'uns will also .. immediately understand the reason for those changes. .. _typotags (so named because at worst they will appear as mere typos in text) .. _timbers (hey! I like the timbre of that sentence ;-)) .. _notation news:19930423.072919.779@almaden.ibm.com/"one of many similar schemes" .. _language news:19930423.072919.779@almaden.ibm.com/"it does nothing more" .. _flavour (well, if gluons can come in flavours so why not the typotags?) .. _element (bully for you) .. _comprehension ("ability to understand") .. _Kimber (Eliot Kimber) .. _Greg news:1qn588INN27o@uwm.edu .. _Feldman (Ian Feldman, Current Setext Oracle) .. # original headers, suppressed on account of appearing AFTER a twodot-tt # From ianf Sun Apr 25 23:42:30 MET DST 1993 # Path: random.se!ianf # From: ianf@random.se (Ian Feldman) # Newsgroups: alt.hypertext,comp.multimedia,alt.news-media,comp.text,comp.text.sgml,comp.sys.amiga.multimedia # Date: Sun, 25 Apr 93 22:03:47 +0200 # Message-ID: # X-URL: file://garbo.uwasa.fi/mac/tidbits/setext/setext+sgml_02.etx # X-URL: file://ftp.ifi.uio.no/pub/SGML/comp.text.sgml/by.msgid/a800bb38@random.se # X-Aftp: :/mac/tidbits/setext/setext_concepts_Aug92.etx # References: <1qn588INN27o@uwm.edu> <19930416.063132.922@almaden.ibm.com> # <19930420.063124.67@almaden.ibm.com> <4942@ulysse.enst.fr> # <19930423.072919.779@almaden.ibm.com> # Content-Type: setext/plain; charset=ascii_827 # Organization: random design -- "any old TTY will do" # Lines: 507 # Summary: setext is pure potential embossed in plaintext # Subject: Re: Looking for Electronic Publshing formats... [long]