Previous Entry Share Next Entry
jducoeur wrote in querki_project
Today's release is kinda-sorta a point release -- it's a bit more than just bugfixes, but the changes are pretty subtle and technical. But for those who are curious about Power Querki, here's what's up. (The following is mainly for people who have at least *some* programming background.)

On Saturday, I sat down to work on my Comics Space, the growing inventory of my comic book collection. This is Big: almost 3000 Titles. (I'm only now starting to input the Issues.) It's going to be Querki's official stress-test in the medium term, my definition of a Really Big, but still realistic for personal use, Space. Problem is, my page that displays 50 of them at a time was taking *forever* to display -- almost eight seconds. Everything below is a consequence of me improving that. (For general reference, anything non-pathological that routinely takes more than two seconds on a fast connection is Too Slow, and automatically counts as a bug. We won't always be able to fix it, but we'll take it seriously.)

The first change is the really important one, although it's very much a power feature: Functions now resolve differently in the lexical context. To understand what that means, let's give an simplified example of functions.

A Querki Function is pretty much what you'd expect: it takes some input and produces some output. It's a QL expression, and you define it just like any other Property. So say I have a Show Titles page, which contains this:
[[Title._instances -> _sort -> _take(50) -> Do Display]]
"Do Display" here is a Function, that turns Titles into actual displayed output.

But think about it for a second: where is Do Display defined? Is this a method defined on the Show Titles page, or on the individual Title? Both are totally legal. But that turns out to be a crucial distinction, because the expectation is that, if it's defined on the page, I expect it to be a method that takes the list of Titles, but if it is defined on the individual Title, I expect it to be called separately on each of them.

So we wind up with two changes in how functions are invoked:
  • First, lexical scope takes priority. That is, if you name a Function that is defined *on the object that is invoking it*, that takes precedence over a Function defined *on the received context*. This is very much an edge case, and I'm having trouble coming up with a case where it would ever matter in a well-constructed Space, but we should be precise about these things.

  • Second, a Function invoked through the lexical scope receives the entire received context, but a Function invoked on the context is applied separately to each of the received Things.
All of which is just a fancy way of saying DWIM, as usual. But figuring out WIM continues to be a fun and subtle challenge.

The other substantive change (although really a bugfix) is that bindings now work correctly with collections. This change came about due to the Next button on my page.

The Show Titles page's content actually looks something like this:
[[Title._instances -> _filter(_not(Sort Name -> _lessThan($start))) -> _sort -> *""
... display the first 50...
[[Next Button]]""]]
That is, "Take all the Titles, filter out the ones before the start URL parameter, sort them, display the first 50, and call the Next Button function with that sorted list". Meanwhile, Next Button looks like this:
Show Titles -> _withParam(""start"", $_context -> _drop(50) -> _first -> Sort Name) -> _linkButton(""Next"")
That is basically creating a link button back to the Show Titles page; the interesting bit is that _withParam() in the middle, which is figuring out the start of the next page. You see where it has $_context in there? That's the context that was passed into the call to Next Button -- the sorted list of Titles. From that, we drop the first 50 on the floor, and use the next one as our starting point.

(BTW, don't get intimidated by all this. Probably sooner than later, I'm going to wrap it all up into a standard feature, so you don't have to do it by hand.)

The bug was that $_context, until now, only produced the *first* Thing that was passed in, instead of all of them. This left me scratching my head for a fair while, and took some fancy code rearrangement to fix, but it now works as expected. If $_context (or any other binding -- values that start with $) contains multiple Things, it should do the right thing when you use it.

Finally, there was the heart of the problem, which was Computed Name. This feature was added some months back, and it's terribly powerful and useful: you can use Computed Name instead of Display Name on your Things, so that their name can be any arbitrary QL expression.

It's wonderful, and I use it in the Comics Space. Unfortunately, it's also *slow* -- slow as mud compared to using an ordinary Display Name. (Mind, slow means maybe a millisecond, but that matters when you have thousands of Things.) And it was being automatically calculated on *every* object when you used _sort, even if you weren't sorting on it. (It's the fallback for disambiguating when two things otherwise sort the same.)

There's not much that can be done about the slowness for now -- eventually we're going to do some heavy-duty optimization on the QL stack, but I'm hoping to push that off until I have more programmers than just me. But for now, I've made two tweaks:
  • First, _sort computes the Computed Name lazily, only doing so when necessary. This turns out to involve a horrible code hack, which I'll have to improve someday, but it'll do for now.

  • Second and more importantly, we now cache the Computed Names in the Space. This is *not* a panacea: it still takes a relatively long time to calculate, and we have to start again from scratch every time the Space changes in any way. But it means that we're no longer duplicating unnecessary effort, and in practice it means that Spaces that aren't constantly changing will see dramatic improvements.
Mind, none of this matters much unless you have a *lot* of Things (hundreds-to-thousands) that are using Computed Name. But I expect that I won't always be the only person in that particular boat, so it's good to have it working better. And in practice, the new cache will slightly speed up even normal Things (even Display Name isn't free), so it should generally be a win.

  • 1
Does it matter (I know not functionally speaking) that you are filtering your list of titles and then sorting? What's the trade off versus sorting and then filtering? Is there any way to avoid having to do sorting every time? Er, wait, that's what's happening with the new $_context, isn't it?

Does it matter (I know not functionally speaking) that you are filtering your list of titles and then sorting? What's the trade off versus sorting and then filtering?

Just efficiency / speed. Sorting's a relatively slow operation, which doesn't *usually* matter -- but when you're dealing with several thousand elements (as in my Comics Space) it can become noticeable. So if you can filter out the bits you don't care about first, it goes a bit faster.

(Technically speaking, filter is O(n), where sort is O(n log n) with a bigger constant. So it's more efficient overall to do the filter first.)

Is there any way to avoid having to do sorting every time? Er, wait, that's what's happening with the new $_context, isn't it?

Yep, that's the motivation for why I'm structuring it this way, so that I only _sort once and can then reuse it in multiple places. Frankly, I consider that kind of a design bug in Querki -- there is no *way* the average user is going to think of this sort of thing.

In the fairly long run, the hope is to do expression melding behind the scenes, with the engine saying, "Hmm. You're doing [[Title._instances -> _sort]] in three different places on the page, but they all return the same value, so let's just do it once and stuff it into all three places." That's pretty deep magic, so I'm not even going to tackle it soon, but the advantage of QL being a rigorously pure-functional language is that it's at least *possible* to do stuff like that automatically. And once we're pre-compiling the QL expressions (also in the plans), it'll become more conceptually straightforward. (That is, once I'm taking a Large Text and turning it into an Abstract Syntax Tree internally, it's much easier to analyse that tree and see where there are duplications.)

Also, in principle I'd really, really, *really* like _instances to always pre-sort the Instances by Display Name, since you want them sorted 90% of the time -- but see "sorting can get pretty expensive". I might yet rewrite the internals to *store* them pre-sorted, but that's also not simple in the presence of Computed Name. Computed Name is incredibly powerful and useful, but complicates the bleeding hell out of the internals. It has the unfortunate implication that altering Thing X could change the Computed Name of Thing Y, which makes caching Much Harder. I'd be tempted to simply drop it for now, but I suspect I want it in the long run, so I'm coping with getting the architecture right.

All that said -- thanks for asking the question. Now that I think on it, I might be able to design a compromise that is pre-sorted and very fast for all cases *except* Computed Name, which is a pretty good 90/10 win. Interesting...

  • 1

Log in

No account? Create an account