Previous Entry Share Next Entry
Design notes for Personal Properties
jducoeur wrote in querki_project

The Problem

Over in email, mindways and I have been chatting about how he might use Querki to manage some of his game-development work, and discussion turned to "ratings" -- people rating a given card/power/whatever with 1-5 stars, with a comment. I replied that I was planning on implementing ratings soon, but that the idea was that it would be the beginning of "personal properties", and only the aggregate sums/averages would be visible to anybody else. He pushed back with a good argument:
Hmm - I don't think that ratings of this sort are a good fit for Personal Properties. I can imagine Ratings which would be - personal opinions that one wished to keep screened from others, like one's rating of a song on iTunes - but the very point of these Ratings (and the comments thereupon) is to convey information to a third party. This isn't an unusual case; all of the sites with ratings I make use of on any regular basis (Amazon, Yelp,, NewEgg, boardgamegeek,, etc.) include the ability to see what a particular user has rated a thing. It's (a) sometimes critical to the app; (b) when not critical, generally hugely useful, and (c) supports accountability and credibility in a variety of ways (preventing sock-puppet votes, understanding *why* someone rated something how they did, etc).

Some practical examples of why it matters:
* The difference between "ten Ratings of 1 and ten Ratings of 5" and "twenty Ratings of 3" is immense, but both have the same average of 3.
* A "5" Rating on appeal means something different from someone who's given other cards 1s through 4s, vs. someone who's rated everything a 4 or a 5.
* Ratings on Balance - particularly 1s and 5s - from playtesters who've logged many plays / shown deep understanding of the game on the forums deserve a different sort of attention than Balance ratings from people who've just started playing, who in turn deserve a different sort of attention than ratings from people who routinely shoot their mouth off in the forums and betray a lack of understanding of the game.
* Playtesters who've gone through and rated every card deserve a different sort of attention (and permit a different sort of analysis) than 1-off ratings.
* Commentary on ratings is pointless unless people other than the rater can view them.
I started to push back, but gradually realized that he was correct. This post is going to discuss the ramifications, which are fairly deep. It's largely for myself, so I can remember the design decisions, but opinions are welcomed.

Part of the disagreement was that I had been thinking of simple numeric ratings, where you tend to only care about the averages and distribution, but rarely want to see who voted how. Mindways, OTOH, was specifically looking for rating-plus-comment, the sort of thing I think of in terms of, say, the Google App Store or TripAdvisor. The truth is, I hadn't even considered that use case, but it's certainly a common one -- enough so that we almost certainly should be supporting it. The problem is, it *totally* doesn't work in Querki's architecture so far.

Why It's Hard

The thing is, Querki gets its massively experimental power by being entirely in-memory, at least for the moment. (Eventually we'll need to write a proper on-disk database engine for it, but I want to know what we want, in considerable detail, before we go there.) This is why it is specifically focused on "small" Spaces for the time being. Mind, I think of "small" in terms of <50k Things, plenty big enough for most personal and SOHO projects.

The catch is that this *totally* breaks down when things get "multiplicative", and this is a fine example of that. In his use case, he probably only needs a few hundred Things -- but each of those Things might have dozens or even hundreds of Reviews, potentially blowing the Thing quota right out of the water. I don't want to maintain every review of every Thing in memory: that becomes too scary even for my plans.

Therefore, I'd been planning to deal with Property-Value-Per-User as a totally separate mechanism. Each Space would have a separate Personal Properties table; each User/Thing pair would potentially have a row in that. When you begin using a Space, it would sweep your personal values into memory, and overlay those onto the Space while you are working on it.

But I think that's now been blown out of the water. We have a lot of more-complex needs that are surfacing here -- we clearly need to be able to examine all of the reviews for a specific Thing (to really understand how the Thing is being received), or all of the reviews submitted by a given User (to get a sense of this person's reviewing patterns), and so on. The granularity of my previous plan is all wrong for this.

So -- here's a redesign, looking for thoughts.

The UserValues Table

Querki will gain a new table per Space, which will contain "User Values". (Not obvious from the surface: Querki keeps a distinct set of tables for each and every Space. This has a *lot* of advantages, but is damned weird. Part of me is nervously waiting to see whether MySQL can actually cope with having tens or hundreds of thousands of tables per shard.)

This is specifically the place for any Properties that are defined as being distinct per-User, such as Rating or Review. The table contains four significant columns:
  • Thing ID

  • Property ID

  • User ID

  • Value
(There may also be a synthetic ID column, but I'm honestly not sure we need it. And there will likely be standard columns like Created/Modified Time, but I'm not going to worry about those now.)

Unlike most Querki Space data, this will generally be kept on-disk. In the medium term we will likely cache some of it in-memory as needed, but we'll start off the primitive way, fetching these values per-request.

QL Support for User Values

I'm honestly unsure at this point how much we'll use distinct methods to surface these in QL. We might have as many as three different methods, but my sense is that having one context-sensitive one might be best:
  • Thing -> Property.userValues() -- returns all of the values of Property on Thing, from any User.

  • User -> Property.userValues() -- returns all of this person's uses of Property, from any Thing in this Space.

  • Thing -> Property.userValues(User) -- returns this specific person's value on this Thing, if any.
That last one likely should be "_userValue" instead, since it returns an Optional value instead of a Set. (The types are polymorphic enough that it wouldn't produce errors, but the usage is very different.) The argument for splitting the first two is that a User is, of course, also a Thing, so the first two examples are ambiguous if Property is defined on Person in this Space.

Type System Enhancements

More significantly, we need to beef up the type system to cope. This isn't a surprise -- it's been designed for a while, but hasn't risen to the top of the stack -- but we really don't want the first two to return a simple List or Set. They really should be returning a Map. The first should be pairs of (User -> Value), the second of (Thing -> Value).

So before we can implement Map, we have to implement Tuples. (God bless Scala for providing me with good examples of how to think about these problems.) A Map is basically a list of 2-tuples, where you can look things up by the first value. You can either index into it that way, or iterate over it as a List of Tuples.

A Tuple, in turn, is our first runtime-defined Type. Specifically, a Tuple is a higher-kinded Type, defined as a List of N Types. For instance, if my Property is a Rating of 1-5 stars, the returned Tuples from the first _userValues example above are of type (ExactlyOne[Link[User]], ExactlyOne[Int]), and we are building from that a Map[Link[User], ExactlyOne[Int]]. (This syntax is just illustrative, and might never be user-exposed, but it's sometimes easier to use Scala's type system to describe things.)

Long-term implications -- Static Pipeline Analysis

This is the camel's nose in the tent of using the underlying database *as* a database. Truth is, I had hoped to put that off longer, for efficiency reasons. Querki uses SQL only in *very* primitive ways at this point, intentionally.

The thing is, QL is, itself, essentially a database language. It's a full-scale functional programming language, but most of its operations are very familiar as database ones, as are the data types. Links are foreign key pointers; dereferencing a Link is essentially a database join. _filter is essentially a WHERE clause, and _sort is, of course, SORTED. The idea is to reframe data-processing in a pipeline-oriented way that is hopefully a bit more amenable to a good UI, and more comprehensible.

But as we begin to move into data-on-disk, we're gradually going to have to get smarter about that. For this first baby step we can live with the above model, simply pulling some rows into memory and handling them there. But in the long run, that's stupidly inefficient. What we *should* be doing is statically analyzing the QL, translating as much as possible into SQL, and dealing with it on the database server. That project is potentially huge, and a tad scary -- getting it right is a long-term research project. We'll need to tackle it eventually, to help Querki scale better by storing most/all of the data on-disk. (It's essential before I can even think about approaching the enterprise market.) We'll see when it happens.

For now, expect pages that perform computations on User Property values to run significantly slower than average. Mind, that is compared to Querki's usual lightning-quick response time -- even with the vast amount of computation we're doing per page, the fact that Spaces are mostly in memory means that most render in a matter of milliseconds. User Properties will require disk reads, so they'll introduce quite a bit of extra latency. Hopefully it won't be painful.


So that's the current design, which will probably be implemented in another month or two. Thoughts?

  • 1
(Deleted comment)
Hmm. It's a new idea, but it's not crazy. I believe the new design supports it. Basically, what I'm describing above is the notion of how to create a Property that, instead of having a single value per Thing, has one value per User per Thing. Basically, it's the generalized notion of how people chime in on Things.

Nothing about that is specific to the 1-5 Ratings, intentionally. I might get caught out by some detail, but it seems like you ought to be able to just define one of these User Properties as a Tag Set, and it should Just Work.

Intriguing notion. Keep it in mind, and we can try it out when the tech is in place...


General approach looks good, but:

Thing -> Property.userValues() -- returns all of the values of Property on Thing, from any User.
User -> Property.userValues() -- returns all of this person's uses of Property, from any Thing in this Space.
Thing -> Property.userValues(User) -- returns this specific person's value on this Thing, if any.

This scheme precludes Users rating (or having personally-specific properties for) other Users, because User -> Property.userValues() will be ambiguous.

Whoops! Re-reading, I see that you already called that out; never mind. :)

Yaas. So it may be that we should have, eg, [[Thing -> Property.userValues]] vs. [[User -> Property.thingValues]], or something like that, just to be crisp and unambiguous...

  • 1

Log in

No account? Create an account