NEWS/Markup
From ThorxWiki
Revision as of 01:15, 25 October 2007 by Conversion script (Talk)
This page supercedes NEWS/TextFormatting. That page should now be used for brainstorming and discussion of ideas that are then presented here when accepted as "Correct".
So:
NEWS MARKUP SYNTAX ################## Basic Text Formatting ===================== Syntax Basic HTML Translation ------ ---------------------- *bold text* => <STRONG>bold text</STRONG> /italic text/ => <EM>italic text</EM> _underlined text_ => <U>underlined text</U> -strikethrough text- => <S>strikethrough text</S> ^superscript text^ => <SUP>superscript text</SUP> ,subscript text, => <SUB>subscript text</SUB> =monospaced text= => <TT>monospaces text</TT> * Underlining is frowned upon in good html - and typographic society. Is it really needed? * This syntax is parsed per-line as regexp sees them. Lines that end with a "\" are joined for regexp parsing however. (This is primarily a concession to non-wordwrap textbox widgets, be it a webbrowser or local editer). * Tags can be nested, for eg: /*italic and bold text looks like a C comment*/ * The simple syntax shown should not be mistaken to mean that any random asterisk (for example), is a start or end marker to bold text. Rather a start-bold marler is more complicated: a newline, whitespace, or other markup character, then the markup character in question, and then followed by a regular alphanum character, or other markup character, defines a start marker. Similarly, reverse for an end marker. * Yes, this means in-word markup is not possible. eg, "/in/flammable" would NOT italicise "in". I believe that this is not a common need, but if it is REALLY needed, then it can be accomplished either by rewording or using pure html. * Both the start and end markers must exist - if not, then it degrades gracefully - ie, the raw characters are passed through to the output. Headings/Sections ================= Top level heading ################# Subsection heading ================== sub-subsection heading ---------------------- * In HTML we think in headings. In XHTML2 they have sections. NEWS does not differentiate - the headings can be translated into either syntax as needed. Within NEWS, they merely provide an in-document heirarchy. * Only three levels of sectioning are given - on the basis that this fulfills the 10/90 rule. If you need deeper levels, then maybe consider making subpages instead. (On which NEWS has no depth limit for) * As sections, it is considered that each section ends when a new section at the same level begins. * Sections can be translated directly into lists, usefull for table of contents creations. * Underlining means multi-line parsing, but is fairly simple to handle otherwise. If two lines are identical length, have identical whitespacing at the start, and the secondline is nothing but repetition of a single character, then it's an underline, and the previous line becomes a header. * An unrecognised underlining character should be treated as a subsection. LISTS ===== * Lists markup attempts to be both formal and flexible, to account for the many ways in which lists are created. * List lines begin with a leading whitespace (may be zero), a listmarker character, and additional whitespace. * Newlines which have a leading whitespace count that brings the start of the text to the same as the text on the previous list line, is considered to be part of hte same list item. (saves a "\" at the end of line). This very list item you are reading is a good example of this. * Following items which match the character and leading whitespace are part of the same list. * Increase the whitespace to begin a new nested list. Return to the old leading whitespace value to return to the parent list. * Unordered lists use "*" for regular bullet lists, or "+" for Explorer-like branched tree lists. * Ordered lists may use "#" (roman type display), numbered as "1)", "2)", etc (numeral type display), or "a)", "b)", etc (alpha type display) * Definition lists are handled slightly differently: There is no leading character to show the list, but rather is determined by: a) The line ends with a colon ":" (which is not rendered, but any previous punctuation IS - a "?" or another ":" for example) b) The next immediate line begins with a leading whitespace. * Thus a) is the term to be defined, and b) is the definition. Indenting ========= * Indenting. Simply indent each line in the block with equal amounts of leading whitespace. More whitespace means a nested block. Less means the end of a block. Just like lists really. * Note however that since this follows the same rules as lists, don't start your indented block of text with a "*", a "#" or a number "1)", etc. Hardrules ========= * Just like usemod: "----". There are no other options or variations. Preformatted blocks =================== * A preformatted block is nested inside two lines with just "--". * Conceptually, imagine the text between the two lines of a big "=". This then conceptually matches the syntax for =inline preformatted text=. * Primarily used for code or other text where exact monospace layout is important. Literal block ============= * Text enclosed within "{{{" and "}}}" tags. If these are found on a line alone, then a matching pair must be found. Thus the same syntax is valid for multiline and inline literals. * This is for signifying text that is not parsed by ANY other NEWS markup mechanisms. The better designed the rest of NEWS is, the less this should have to be used - thus I don't mind making this seem rather ugly. Tables ====== * Aiming for minimal complexity here. The pipe is the relevant control character. * The character | has three different functions: It begins a new line, it seperates two cells, it ends a line. Whitespaces and tab characters are allowed between the beginning of a new line and the first |. * Cellspanning is not allowed. * If the first line in a table has all bold text, then it is treated as a table header line. (<TH></TH> tags in html) * multiline block markup cannot be places in a single cell - all inline markup should work. Linebreaks ========== * We assume that each newline in the source translated into a <BR> or <P> or some equivelant in the resulting html. Of course, in many of the block markup controls, this is not needed. UseMod drives me crazy in that you cannot have two sequential short lines without them being concatenated. * If two lines of markup REALLY should be together, then "\" at the end of the first line will cancel the <BR> creation. Images and Links ================ Syntax Translation ------ ----------- images: image.jpg => image rendered inline [image.jpg|comment] => image rendered with text as title below links: CamelCase => internal link to "CamelCase" [Wikilink page] => internal link to "Wikilink_page" http://foo.org/ => external link to site named [http://foo.org/] => external link to site named [display->linkname] => link (in- or ex-) to "linkname", displayed as "display" * textlink-to-an-image, imagelink-to-a-page, or imagelink-to-image (thumbnail) is accomplished via hidden link syntax. Either or both of "linkname" and "display" may be an image URL, and treated appropriately. * [image.jpg|comment->linkname] is valid syntax, and works as "image with comment is a clickable link to "linkname". * in-page anchors are created automatically at section/heading boundaries, and are linked as [linkname#section_name]. The "url" portion may be either an in- or ex- link. LinkName/SubPage#section_name would also work. (to combine subpages and in-page anchors together. The rendered display would be "LinkName/SubPage (section name)" however. * To link to an anchor within the page current, you'd either have to give the full camelcase version, or squarebrackets the anchorname alone. eg. [#example]
Still to solve: (discussion to /TextFormatting pls)
- Aligning images (and their titles) to left or right.
- Bugs in syntax?