Previous Entry Share Next Entry
Tagging Along
device
jducoeur wrote in querki_project
[This one's not quite as technical, but it goes into a fair level of detail about the new feature of Tags. Social Media geeks in general may find this one interesting.]

My wedding party was two days ago, so I'm taking the weeks before and after that to do something relatively relaxing and fun for Querki. I'm building the second real Space: the Poker Encyclopedia, to replace the one that I have been maintaining in my household wiki for ten years or so. That old one's actually a ProWiki instance (the prototype for Querki), but ProWiki was enough hassle to use that I never really structured the data. This time, though, I'm going to have fun doing it right, building a small focused Space with all the searching, categorization and collaboration that I've always wanted for this project.

The Poker Encyclopedia is just what I was looking for in a second use case for Querki: it is almost ridiculously easy to build (once I have the tech ready, I expect it to take about 15-20 minutes to actually set the thing up), but requires One New Feature. That's generally the driving force behind which use cases will get done when -- ideally, each use case should require One New Feature that is generally useful for many applications. In this case, the feature is Tagging.


Tags are a good illustration of how Querki thinks. It is basically just another web framework, but deliberately playing at a very high level of abstraction. That means that it will, eventually have a lot of high-level "types" that represent very common features of online systems. These will cover everything from Dates (with automatic date-pickers) to Ratings (combining individual and aggregate ratings into one general concept). But one of the most useful is likely to be Tags.

I decided early on that Tags were absolutely essential, and it took a little while to tease apart why. I've found that most really good online systems have a concept of tags; moreover, the ones that *don't* have it tend to annoy me. That's because Tags are a solid, flexible way to deal with the problem of Categorization, and almost *all* online systems need to deal with this. Electronic data quickly becomes hard to follow, and rigid systems of categorization (eg, hierarchies) generally turn out to get in the way. Indeed, it is almost always best to think in terms of tag *sets* -- allowing the editors to assign multiple tags to a Thing as needed, since most real-world data sets turn to be multi-dimensional once you get to know them well.


However, that quickly led me to a bunch of interesting follow-on questions. For instance, what *is* a Tag? What do you *do* with a Tag? The most common use is being able to click on a Tag, and see all the Things with that Tag. But what if I want more than that? For instance, say that I am building a Period Recipe Cookbook. I probably want Source to be a Tag, so I can quickly and easily say where something comes from, without necessarily having to build out a "Cookbook" structure first -- I want to just say "Martino" in the Source Tag, and be able to click on that, and see everything else from Martino.

But what if I then want to be able to give more details about Martino? I probably want to be able to go back later, and create a Cookbook Thing for that Source. Then, when I click on the Tag, I get a page with all the Recipes that use Martino *and* shows me information about the book itself.

This led to the realization: a Tag is, exactly, a Name. I already sort of had the concept of Names in the system. Each Thing has two primary identifiers: an Object ID, which is permanent, system-assigned and not especially human-readable (technically, it's a 64-bit integer, rendered in base 36); and a Name, which is assigned by the user. OIDs are globally unique; Names are unique within this particular Space, can be changed, and can be any reasonable alphanumeric phrase.

So far, I've mostly been focusing on Links -- hard pointers from one Thing to another by OID. Tags are essentially the same idea, but linking by Name instead. The big advantage here is that *the Name doesn't have to exist yet*.

The result is quite a bit like references in a typical Wiki -- unsurprising, since Querki is very much the "wikified" version of a database. If a Tag names an existing Thing, clicking on it goes to that Thing. If it doesn't, it brings up a predefined page. (Technically, it displays the value of Space._showUnknownName, which by default lists all Things that refer to this Name.) If you "edit" this pseudo-Thing, Querki creates it, with its properties preset to look exactly like the way it originally displayed, but you can then edit it to look however you want. The default front page for each Space lists all of the Tag Properties and values currently in use, to make it quick and easy to use Tags for navigation. (This will probably eventually turn into a classic Tag Cloud display, but first things first.)


It'll be interesting to see how this feature gets used in practice. I deliberately built it to be quite flexible, possibly more than it needs. For example, you can define as many Tag Set Properties as you like, and they are treated separately, so you *can* use this to have different dimensions of categorization if that suits your App. OTOH, if you decide that that's overkill, you can just define one Tag Set Property and use the same set everywhere.

For purposes of the Poker Encyclopedia, the most important Tag Set is going to be Game Type, and that's a good illustration of the value of Tags as Names. Sometimes, the Game Type is just "Unique", or something like that -- a loose category that doesn't matter much. But often, the Game Type is going to be something like "Seven-Card Stud" or "Holdem" -- that is, the "Game Type" is actually a reference to the base game that this is a variant of, which is itself an entry in the Encyclopedia.

The Poker Encyclopedia also illustrates why the system works the way it does. The default version of _showUnknownName just displays a bullet list of Things with this Tag. But the Poker Space will override this, to instead display a full summary of each game: not just the name of each, but a paragraph with the basic rules, maybe a Rating, and stuff like that. Indeed, I expect we will *mostly* look at the Poker Space through these Tag listings, as the quick way to skim a flavor of game and remind ourselves of what we have available. Each variant will have its own page, with additional details, but I bet we'll use the Tag pages 90% of the time.

(And saying this makes an interesting point: when we implement Search, it probably wants to work the same way, so that the Space can easily declare how to display each of the Things that match a Search. That way, clicking through a Tag and Searching can look and feel very similar, making the whole thing more intuitive -- you can query in a structured or unstructured way, but get similar displays.)

This feature is probably going to evolve quickly. I have the basic notion in place now: you describe the Tags as a comma-separated list, and the system interprets it from there. I will probably shortly add a "prompting" input control -- as soon as you start typing, it will offer you existing tags that match what you are typing, to encourage consistency in your Tags. Before long, I'm sure we will have to add Tag Curation tools, so you can easily clean up your Tags and make them more consistent. (Tag Clouds almost always want curation, or they gradually accumulate a lot of annoying quirks.) And so on: I won't be surprised if over half of all Querki Spaces wind up using Tags, which will push us to spend a bunch of effort on really making them hum.


Thoughts? I'm curious whether folks agree that this feature is likely to be useful. What am I not thinking of here? What best practices have you observed for Tags as a feature?
Tags:

  • 1
I seem to be specializing lately in drive-by commentary rather than deep, thinky thoughts. I'm sorry. But in case it helps...

Tag bucket + description is important. c.f. Stack Exchange tag wikis.

The option for multiple dimensions for tagging is essential. It frustrates me that I can't do that with something as "simple" as my MP3 collection; whatdayamean only one genre per track? The richer and less-characterized the data is, the more important I expect this will be.

Is the tag space flat, or can hierarchies exist within it? How you use tags depends on hierarchy and containment.

Tag sets will morph over time; the tags I'm using on LJ are not the ones I would create now, for example, but LJ provides almost no support for refactoring tags. What can you put in place now that will make it easier for people to restructure their tags a year down the line?

Sometimes tags are per-collection (like your LJ tags); sometimes they're globally collaborative (like Twitter hashtags). Both have value for different use cases; both get in the way for other use cases. (You don't want your LJ "me" tag to be linked with everybody else's, for instance.) What use cases in that space are you thinking of?

Hope this helps.

Is the tag space flat, or can hierarchies exist within it? How you use tags depends on hierarchy and containment.

Ah -- yes, good point, and one that was a "D'oh" moment for me a couple of months ago. Yes, I decided some time back that to really make tags sing, you want to allow hierarchy -- that seems to be extremely useful in cases where you want just a single free-form Tag Property, and potentially develop structure organically within that.

As a result, I actually wound up refactoring the URL paths last week, so that Names can now have slashes in them, to more or less arbitrary depth, specifically to permit you to define Tag hierarchies. I'm not *doing* anything with that yet (aside from allowing you to have slashes in Tags), but I suspect that we'll eventually wind up adding, eg, support for collapsing and expanding those hierarchies in displays, providing multi-level prompting, and stuff like that.

What can you put in place now that will make it easier for people to restructure their tags a year down the line?

I'm not sure -- this is all hand-wave at this point -- but my sense is that there is some low-hanging fruit we could implement fairly easily. For example, we should make it relatively easy to rename a Tag: to point to a specific Tag, give a new name, and have all instances of that Tag get changed. (Yay for doing things in-memory: a lot of operations are *much* easier in Querki than in a conventional DB.) Maybe better, we might have some sort of drag-and-drop editable Tag Cloud, where you could drag one tag onto another, confirm the operation, and have the system merge the two names.

We'll see -- I've been long surprised at how weak tag-refactoring support tends to be in most systems, and I suspect we'll wind up pushing the boundaries on this.

Sometimes tags are per-collection (like your LJ tags); sometimes they're globally collaborative (like Twitter hashtags). Both have value for different use cases; both get in the way for other use cases. (You don't want your LJ "me" tag to be linked with everybody else's, for instance.) What use cases in that space are you thinking of?

Well, there is little-to-no concept of "global" in Querki -- that's part of what makes it different from most systems. (As I often say, I'm leading the Small Data Revolution here, since everybody else is focused on Big Data.)

In the near term, tags are simply data in the Space, like any other data: they're a field that some people have permission to edit. In that respect, they are more like the LJ definition, particularly how tags work in LJ communities.

A little further out, I suspect we will add the ability to have "personal tags", as part of a more general concept of the "personal view" of a Space. This personal view will probably cover lots of things -- I'm likely to initially implement it for Ratings, which kind of require it -- but the high concept is that these are my personal annotations to Things in a Space that I can read. It's quite plausible that we might aggregate those across Spaces, but there are some technical challenges there, so I'm not promising that yet.

Maybe a year or so out, there's a vague notion of medium-scale collaboration, both in the form of larger crowdsourced Spaces (which would be hugely useful, but a *major* architectural problem), and/or rolling up instances of an App into a larger community (which is more Querki-like, but I don't quite know what it means yet). Those might wind up with larger-scale use of tagging.

Further down the line, if there is sufficient user demand, we might consider truly global tags a la Twitter. But that violates enough of Querki's core concept that I'm not going to do it casually. It works for Twitter because Twitter is in many ways very homogeneous. Querki is intentionally at precisely the opposite end of the scale: the high concept is providing a framework for folks to manage highly heterogeneous data in a structured way. So global tags may never make enough sense to be worth the effort.

Hope this helps.

Definitely -- thanks...

Things in this post that make me especially happy:

"If you "edit" this pseudo-Thing, Querki creates it, with its properties preset to look exactly like the way it originally displayed, but you can then edit it to look however you want."

"...the Space can easily declare how to display each of the Things that match a Search."

"...the Space can easily declare how to display each of the Things that match a Search."

Of course, keep in mind that I made that up as I was writing this post. That actually illustrates the way I'm tackling design: there is a great deal of me realizing that something is obviously correct, tossing it into the requirements, and then figuring out how to implement it...

Have you looked at Semantic Mediawiki?

That's an extension for Mediawiki that turns Categories into true tags -- but also provides tag-like items that get a value and type, which can also be quite useful.

Not in any depth, but yes, I glanced at it early in the project.

One of the hardest questions I had to wrestle with was projects like that and XWiki, which extend wiki tech with properties. (Very much like ProWiki did.) The question was: do those render Querki redundant?

My eventual conclusion was no, mainly on the basis of ease-of-use. Semantic Mediawiki is *conceptually* similar to Querki, but last I checked, it still felt like the "property" concept was sort of bolted on. Querki is intentionally coming at it from the other direction: under the hood, Querki really believes that it is an (admittedly weird) database engine, which *presents* itself like a wiki.

Querki doesn't even really have a concept of a "page" -- instead, when you look at a Thing, you are actually rendering the value of its Display Text Property. Indeed, a Simple Thing -- the "rootiest" Model you can derive from -- doesn't even have a Display Text, and defaults to just showing its Properties as a definition list.

My belief is that this is going to result is *much* greater power, and I think I'm also going to wind up with better usability in the long run as well. But that remains to be seen.

And XWiki is *way* too SQL-ish for the end user. It's not half-bad for Enterprise use, but I think it's pretty terrible for consumers. This is part of why Querki decided to focus mainly on the consumer market: companies like XWiki are already pretty mature in the Enterprise market, but haven't been very serious about appealing to the masses...

Oh, yes, absolutely; I was more thinking in terms of concepts to steal.

Are you actually displaying the Display Text property or rendering a template property? Because that seems like a useful thing [or, I guess, templates could be on the type level, which might also make sense].

Are tags implemented as a first class thing? Or is there a keyword list type that tags are an example of? Because coming from the POV of someone who uses wikis primarily to keep track of ongoing rpgs and design larps, I can easily see a use for multiple "keyword list" types attached to a given entity.

Are you actually displaying the Display Text property or rendering a template property? Because that seems like a useful thing [or, I guess, templates could be on the type level, which might also make sense].

That's kind of a complex question, actually.

The important thing is that nearly all Text Properties are by definition templates. Technically speaking, they are of type QLText, which can contain embedded QL expressions (the programming language), which in turn can contain more QLText -- like all good modern template languages, you can recurse pretty much arbitrarily between the template text and the programming language.

So the rendering is a two-step process. First we go through the template and process all of the QL expressions. The end result is QText -- essentially an extended version of Markdown -- which then gets XML-neutered and wikitext-processed.

This is basically the only way Querki *could* work, when you think about it. Since the individual Variant objects are just data structures, it's usually up to the Variant Model itself to describe how to display them, and therefore the Model has to describe that display in terms of a template.

So in that sense, yes, everything is templatized from the word go. There *is* a PlainText Type as well, but it is rarely used.

However, at the moment there is only one level of templatization. In the medium-term, we also need a concept of a Master Template at the Space level, so you can build proper navigation frames and the like. That isn't implemented yet -- for example, the footer at the bottom of the page has to be separately put into each Display Text, but it should eventually be simply part of a larger frame described by the Space.

Are tags implemented as a first class thing? Or is there a keyword list type that tags are an example of? Because coming from the POV of someone who uses wikis primarily to keep track of ongoing rpgs and design larps, I can easily see a use for multiple "keyword list" types attached to a given entity.

Essentially the latter. Details:

At the heart of it is the base NameType class, which is a mostly-internal Type that represents the name of a Thing. There are several Querki Types (which are Scala objects, just to be confusing) based on that, including the user-visible NameType, the system-internal UnknownNameType (for when we look up a name and don't find it), and TagSetType, which is basically a NameType that thinks of itself as a list of "tags". (And is always plural, never singular.)

All that said, an actual Model or Thing doesn't have Types -- it has Properties, and each Property has a Type. You, the user, can define whatever Properties you want and assign them to Things and Models. So in practice, while there is only one "Tag Set Type", you can have any number of distinct Properties based on that Type, and the API lets you treat them as distinct.

This is actually very much in evidence in the Poker Space: I have distinct Tag Set Properties for Category, Source (who invented the game), and Derived From (the game(s) that this one was adapted from). So yes, it is very easy to use Tags at whatever granularity is appropriate for your problem. Indeed, I'm finding Tags so nice to use that I may emphasize them more than I originally intended.

None of which is coincidence for the LARP case, BTW. Keep in mind that ProWiki was originally built, ten years ago, *specifically* for LARP design, and I created most of my games in it. I originally was doing my writing in UseModWiki, and built ProWiki (as a fork of UseMod) precisely because I found the process frustrating. Querki treats ProWiki as a prototype, taking everything I learned there and doing it better. So my expectation is that Querki is going to be *great* for LARP design, and I plan to build some sample Spaces to show that off soonish.

Speaking of which (shameless plug time, but entirely apropos): I will be running a presentation on "LARP Design and Writing Using Querki" at NELCO in a couple of months. So you might want to consider coming up, if you're free...

  • 1
?

Log in

No account? Create an account