NEWS/Markup
From ThorxWiki
				
								
				
				
																
				
				
								
				This page supercedes NEWS/TextFormatting. That page should now be used for brainstorming and discussion of ideas that are then presented here when accepted as "Correct".
So:
NEWS MARKUP SYNTAX
##################
Basic Text Formatting
=====================
        Syntax                          Basic HTML Translation
        ------                          ----------------------
        *bold text*             =>      <STRONG>bold text</STRONG>
        /italic text/           =>      <EM>italic text</EM>
        _underlined text_       =>      <U>underlined text</U>
        -strikethrough text-    =>      <S>strikethrough text</S>
        ^superscript text^      =>      <SUP>superscript text</SUP>
        ,subscript text,        =>      <SUB>subscript text</SUB>
        =monospaced text=       =>      <TT>monospaces text</TT>
* Underlining is frowned upon in good html - and typographic society.
  Is it really needed?
* This syntax is parsed per-line as regexp sees them. Lines that end
  with a "\" are joined for regexp parsing however. (This is primarily a
  concession to non-wordwrap textbox widgets, be it a webbrowser or local
  editer).
* Tags can be nested, for eg: /*italic and bold text looks like a C comment*/
* The simple syntax shown should not be mistaken to mean that any random
  asterisk (for example), is a start or end marker to bold text. Rather
  a start-bold marler is more complicated: a newline, whitespace, or
  other markup character, then the markup character in question, and then
  followed by a regular alphanum character, or other markup character,
  defines a start marker. Similarly, reverse for an end marker.
* Yes, this means in-word markup is not possible. eg, "/in/flammable"
  would NOT italicise "in". I believe that this is not a common need, but
  if it is REALLY needed, then it can be accomplished either by rewording
  or using pure html.
* Both the start and end markers must exist - if not, then it degrades
  gracefully - ie, the raw characters are passed through to the output.
Headings/Sections
=================
        Top level heading
        #################
        Subsection heading
        ==================
        sub-subsection heading
        ----------------------
* In HTML we think in headings. In XHTML2 they have sections. NEWS does
  not differentiate - the headings can be translated into either syntax as
  needed. Within NEWS, they merely provide an in-document heirarchy.
* Only three levels of sectioning are given - on the basis that this fulfills
  the 10/90 rule. If you need deeper levels, then maybe consider making
  subpages instead. (On which NEWS has no depth limit for)
* As sections, it is considered that each section ends when a new section at
  the same level begins.
* Sections can be translated directly into lists, usefull for table of
  contents creations.
* Underlining means multi-line parsing, but is fairly simple to handle
  otherwise. If two lines are identical length, have identical whitespacing
  at the start, and the secondline is nothing but repetition of a single
  character, then it's an underline, and the previous line becomes a header.
* An unrecognised underlining character should be treated as a subsection.
LISTS
=====
* Lists markup attempts to be both formal and flexible, to account for the
  many ways in which lists are created.
* List lines begin with a leading whitespace (may be zero), a listmarker
  character, and additional whitespace.
* Newlines which have a leading whitespace count that brings the start of
  the text to the same as the text on the previous list line, is considered
  to be part of hte same list item. (saves a "\" at the end of line). This
  very list item you are reading is a good example of this.
* Following items which match the character and leading whitespace are
  part of the same list.
* Increase the whitespace to begin a new nested list. Return to the old
  leading whitespace value to return to the parent list.
* Unordered lists use "*" for regular bullet lists, or "+" for Explorer-like
  branched tree lists.
* Ordered lists may use "#" (roman type display),  numbered as "1)", "2)",
  etc (numeral type display), or "a)", "b)", etc (alpha type display)
* Definition lists are handled slightly differently: There is no leading
  character to show the list, but rather is determined by:
  a) The line ends with a colon ":" (which is not rendered, but any previous
     punctuation IS - a "?" or another ":" for example)
  b) The next immediate line begins with a leading whitespace.
  * Thus a) is the term to be defined, and b) is the definition.
Indenting
=========
* Indenting. Simply indent each line in the block with equal amounts of
  leading whitespace. More whitespace means a nested block. Less means the
  end of a block. Just like lists really.
* Note however that since this follows the same rules as lists, don't start
  your indented block of text with a "*", a "#" or a number "1)", etc.
Hardrules
=========
* Just like usemod: "----". There are no other options or variations.
Preformatted blocks
===================
* A preformatted block is nested inside two lines with just "--".
* Conceptually, imagine the text between the two lines of a big "=".
  This then conceptually matches the syntax for =inline preformatted text=.
* Primarily used for code or other text where exact monospace layout is
  important.
Literal block
=============
* Text enclosed within "{{{" and "}}}" tags. If these are found on a line
  alone, then a matching pair must be found. Thus the same syntax is valid
  for multiline and inline literals.
* This is for signifying text that is not parsed by ANY other NEWS markup
  mechanisms. The better designed the rest of NEWS is, the less this should
  have to be used - thus I don't mind making this seem rather ugly.
Tables
======
* Aiming for minimal complexity here. The pipe is the relevant control
  character.
* The character | has three different functions: It begins a new line,
  it seperates two cells, it ends a line. Whitespaces and tab characters
  are allowed between the beginning of a new line and the first |.
* Cellspanning is not allowed.
* If the first line in a table has all bold text, then it is treated as a
  table header line. (<TH></TH> tags in html)
* multiline block markup cannot be places in a single cell - all inline
  markup should work.
Linebreaks
==========
* We assume that each newline in the source translated into a <BR> or <P>
  or some equivelant in the resulting html. Of course, in many of the block
  markup controls, this is not needed. UseMod drives me crazy in that you
  cannot have two sequential short lines without them being concatenated. 
* If two lines of markup REALLY should be together, then "\" at the end of
  the first line will cancel the <BR> creation.
Images and Links
================
        Syntax                          Translation
        ------                          -----------
images:
        image.jpg             =>      image rendered inline
        [image.jpg|comment]   =>      image rendered with text as title
                                        below
links:
        CamelCase             =>      internal link to "CamelCase"
        [Wikilink page]       =>      internal link to "Wikilink_page"
        http://foo.org/       =>      external link to site named
        [http://foo.org/]     =>      external link to site named
        [display->linkname]   =>      link (in- or ex-) to "linkname",
                                        displayed as "display"
* textlink-to-an-image, imagelink-to-a-page, or imagelink-to-image (thumbnail)
  is accomplished via hidden link syntax. Either or both of "linkname" and
  "display" may be an image URL, and treated appropriately.
* [image.jpg|comment->linkname] is valid syntax, and works as "image with
  comment is a clickable link to "linkname".
* in-page anchors are created automatically at section/heading boundaries,
  and are linked as [linkname#section_name]. The "url" portion may be either
  an in- or ex- link. LinkName/SubPage#section_name would also work. (to
  combine subpages and in-page anchors together. The rendered display would be
  "LinkName/SubPage (section name)" however.
* To link to an anchor within the page current, you'd either have to give the
  full camelcase version, or squarebrackets the anchorname alone.
  eg. [#example]
Still to solve: (discussion to /TextFormatting pls)
- Aligning images (and their titles) to left or right.
- Bugs in syntax?

