Previous Entry Share Next Entry
Okay, let's expand the discussion of QText emphasis
querki
jducoeur wrote in querki_project
The comments from yesterday's post have been fascinating, although mainly in driving home the point that there is simply no agreement on this topic, so I'm just gonna have to make a decision.

However, I was struck by the comment from dsrtao (and echoed by laurion) that:
*I* use asterisk to indicate bold text, and _underscores_ mean underlining, and /this/ is either a very small regexp or italics.
I have to admit, this fits my instincts as well -- I've gotten used to the Markdown approach of *this* being italic, but have to admit that I instinctively expect it to be bold.

So let's widen the question. We are *not* committed to sticking to Markdown's format, so I'm open to the above suggestion; I don't believe it would be especially difficult to implement. (Really, I expect that most of the effort would be fixing scads of unit tests.)

Since somebody's going to ask: no, I am not willing to produce different HTML for these depending on Space or User settings, at least for the time being -- it's fairly nightmarish from a management perspective. *But* I think I can do something nearly as good, possibly better. Currently, what's coming out isn't actually i/b tags -- it's actually em/strong tags, the conceptual versions. It would not be at all difficult for me to replace those with spans whose class names call a spade a spade -- spans of "asterisk", "double-asterisk", "slash", "underscore", etc. At that point, the decision about how to render them becomes simply a CSS question, and you *are* allowed to customize CSS semi-arbitrarily in Querki. So those of you who give a damn and are at least slightly comfortable with CSS can do what you like in your Spaces.

The question then becomes what the default CSS says. This will wind up being the way it works in 99% of Spaces -- even if we eventually provide a visual CSS editor (which is in the very long-term plans), I expect few users to avail themselves of it -- so I'd still like to make the best decision I can.

So: calling for opinions again. Do you prefer the above approach (*bold*, /italic/, _underlined_), the Markdown style (**bold**, *italic*, _italic_), or something else? Note that this will likely be coupled with whitelisting the HTML tags, so you can use those if you prefer, but I expect most people to use the simpler QText...


ETA: I should note that the original point that I was making yesterday -- that underscores were misbehaving -- turned out to actually be off-base. The QText parser *is* smart about not picking up a closing tag if it is preceded by a space, so saying "_self blah blah blah _self" won't result in "self blah blah blah self" as I thought. The problem, though, is that that only applies to whitespace, so if you're talking about code like this -- "foo._self blah blah blah foo._self" -- then it totally will turn everything between the underscores into italics.

I don't actually see a clean way to avoid this: it's about as smart as is reasonable to expect already. So I'll just pass on the note that, if you find things being unintentionally italicized because of this, remember that backslash-escaping does exactly what you expect it to -- a backslash before the accidental closing underscore makes the problem go away...
Tags:

  • 1
bold is important, italics are important... it's ok if underlining doesn't work, because in most contexts, underlining was a substitute for italics anyway.

/*bold italics*/ needs to work, though. And some form of quote-me-in-monospace, such as =command to type= or (preferably and)

quote
# spacing and tabs remain intact
add sticks to stones yielding cobolfingers
endquote

bold is important, italics are important... it's ok if underlining doesn't work, because in most contexts, underlining was a substitute for italics anyway.

Well, remember that you'll have HTML access to all of this. The question is mostly what the convenient common shortcuts are.

/*bold italics*/ needs to work, though.

It would, but that's just saying that these expressions are nestable. (Which they already are: that expression is currently _**like this**_, and I just tested it. Changing the sigils won't change that.)

And some form of quote-me-in-monospace

At the paragraph level, that's already implemented with triple-backticks:

```
like this
```
At the inline level, I'm not convinced that a shortcut is actually required for a consumer-market tool. I'm not dead-set against it, but you can already do it with a CSS-defined span {{mono:like this}}, and I would need to see enough common need to build it more deeply into the language.

I've found the Markdown-esque `inline monospace` syntax to be very handy, but then, I don't know how much your users need to write code...

There's an inline monospace format? (Checks.) Huh -- so there is. Does it already work? (Checks.) Yep, it does, although Bootstrap renders it in red for some reason. Had totally missed that -- I'll add it to the documentation.

I think it's a very minor feature as far as Querki is concerned: code probably isn't going to be a common usage. That said, backtick isn't a character I care much about preserving, and this lines up well with the block-monospace format (which I personally use quite frequently), so I'll probably just leave it in place, for use in documentation. (Which *is* probably going to be a common use case.)

My two main thoughts:
1. Given the preponderance of underscores in QL, that you will pre-emptively avoid 10,000 annoying errors by removing underscores as a Markdown-significant character. (I run into this all the time when looking at documentation.) This will also make mixed QText/QL clearer.
2. KISS: Just keep producing em/strong tags. Implement the CSS-based solution if anyone ever actually gives a damn enough to press you for it.

Secondary thoughts:
3. If the HTML tags are whitelisted, I don't really care what the Markdown version is.
4. Even if they aren't, I would slightly prefer the Markdown-standard **bold** *italic*, as /italic/ parses as a regex and /*bold italic*/ parses as a comment - the latter being particularly deceptive.
5. I don't particularly care about a shorthand for underlining. <u> is fine.


I think Markdown got a lot of things Right, and their suite of syntax hangs together better as a unit than taken individually. It's also a little easier to explain that way to (some) new users ("this is approximately a subset of Markdown").

Another wrinkle, not related--I use *this* to mean "surrounded by stars and therefore of heavier weight", not "bold". I get peeved when MSWord autoconverts it to *this*...it's just odd. And over on G+ I had to invent /slashes/ as something that it _didn't_ autoconvert. So depending on the users--specifically, if you're planning on having a significant population of grumpy old Usenet/*nix types--you might want to think about that.

Or not. We are a bit of an edge case. ;)

Second note: Balsamiq, another tool I use a lot, uses this sort of syntax for formatting. I point this out because I feel like you're running into some of the same design challenges, but perhaps you can head off some of the resultant pain points...e.g. for historical reasons, in Balsamiq _this_ means italic, and &this& means underline! It drives me batty to cope with that difference. Also, the underline-collision thing you allude to comes up a lot, especially when I'm trying to do something like "test _this_is_the_test_name.foo_". I end up having to escape all the inner underscores, as you allude to.

It'd make more sense to require whitespace on the outside of an underscore pair. Maybe that's only true if there's no whitespace between the underscores. That'd also solve the "foo._self blah blah foo._self" problem. Do people really need to italicize only part of a word? I guess sometimes, for effect, but maybe that can be the outlier case with the special syntax?


It's also a little easier to explain that way to (some) new users ("this is approximately a subset of Markdown").

This is a really good point.

It'd make more sense to require whitespace on the outside of an underscore pair.

FosWiki works like that, and I frequently have to hack around it. Requiring whitespace means italicizing words with punctuation directly afterwards don't work. It's not an uncommon case.

Especially given your point re: "this is a subset of Markdown" being easier to explain, I think underscores should simply be dropped as formatting characters. Markdown offers two ways of doing italics; saying "one of these is no longer available" is fairly straightforward. It will undoubtedly throw some people for a loop, but I think that will be true of any path.

I think Markdown got a lot of things Right, and their suite of syntax hangs together better as a unit than taken individually.

Conceptually, I agree with some of it -- the logic of *emphasis* vs. **strong** makes sense, for example. I just happen to have a lot of long-baked habits of expecting emphasis to be bolded, and it's interesting to note that I'm not the only one. Haven't made any decisions there yet, though, and I'm not taking it lightly.

The underscore thing I'm more skeptical about, though, and there seems less consensus. In this sense underscores simply duplicate asterisks in the syntax, and I don't agree with the visual logic as much. So I am mildly leaning towards simply removing them as a shortcut for now, although I haven't made a decision yet. Visually, I think that shortcut should be underline, but I'm not sure that it makes the "every feature must fight for its life" cut. Folks just don't use underscores as often, and typing <u> isn't that hard. And of course, in Querki many of the common use cases for underlining, such as citations, will probably wind up as automatically-formatted data structures rather than usually typed by hand -- it's a very different environment than a typical wiki.

I get peeved when MSWord autoconverts it to *this*...it's just odd.

Agreed -- I've always found that to be a strange choice, and it always bothers me visually.

for historical reasons, in Balsamiq _this_ means italic, and &this& means underline!

Yuck. Okay, I think that's a lose from every perspective. But this conversation has called into question whether underlining even needs a sigil at all. Now that I'm thinking about it more I'm not sure that it's at the same level as emphasis, paragraph separation or bullet lists -- the sorts of things that *are* so common that I think the special markup is highly valuable.

It'd make more sense to require whitespace on the outside of an underscore pair.

Possible, although whatever we go towards should be true for all sigils -- I don't want to treat underscore as a special case. If we do reconsider this question, I'll probably wind up looking for more subtlety than simply whitespace: for example, I would intuitively expect "*this is emphasized*" to work, but it doesn't currently. (Edited: yes, it does -- I'm misremembering my experiments from yesterday.) But that's a medium-term discussion.

Edited at 2014-04-04 12:34 pm (UTC)

Having *this* be italic confuses me each time, too. But them, having <em> mean "italic" (by default) confuses me too...so since *this* "really" means <em>this</em>, and then the CSS determines what that renders as...yeah.

I've been experimenting with different ways of showing emphasis in the interfaces I'm generating: coloring text, or color plus bold, or italic plus lighter, or even outlined...it turns out to be horridly context-dependent, so it's hard to really apply those lessons to something wiki-esque, where the solution needs to fit a lot of contexts. (Unless folks want to write their own CSS, of course.)

I expect that serious Apps will tend to use a bit of CSS in the medium term, so I'm content to let them fiddle with it.

The real question is what the lowest common denominator should be, and this conversation has been a very entertaining exploration of how messy a problem that is. I may yet wind up leaving things in place (folks have done a good job of advocating Markdown's decisions), but enhancing the parser a little to be more sensitive to the lexical context so that in-word underscores and asterisks don't get counted...

Offline, Aaron adds an interesting detail that wouldn't have occurred to me: the underline-for-italics convention isn't something that was invented by the computer industry. It's apparently an old editing convention, and is how you would represent italicization in a manuscript. So that's an argument for leaving the Markdown convention in place...

Well, you'd represent an italicized statement by underlining the whole thing. Using _outside underscores_ is a spiritual successor, but not actually a direct inheritance.

(I still remember doing this with a typewriter / dot matrix printer, growing up, when writing bibliographies.)

  • 1
?

Log in

No account? Create an account