This page shows the source for this entry, with WebCore formatting language tags and attributes highlighted.

Title

Quino: partially-mapped queries

Description

In <a href="{app}/view_article.php?id=2950">Quino: an overview of query-mapping in the data driver</a> we took a look at some of the basics of querying data with Quino while maintaining acceptable performance and memory usage. Now we'll take a look at what happens with partially-mapped queries. Before explaining what those are, we need a more concrete example to work with. Here's the most-optimized query we ended up with in the previous article: <code> var query = Session.GetQuery<person>(); query.Join(Person.Relations.Company).WhereEqual(Company.Fields.Name, "IBM"); Assert.That(Session.GetCount(query), Is.GreaterThanEqual(140000)); </code> With so many entries, we'll want to trim down the list a bit more before we actually create objects. Let's choose only people whose last names start with the letter "M". <code> var query = Session.GetQuery<person>(); query <hl>.Where(Person.Fields.LastName, ExpressionOperator.StartsWith<fn>, "M")</hl> .Join(Person.Relations.Company).WhereEqual(Company.Fields.Name, "IBM"); Assert.That(Session.GetCount(query), Is.Between(100, 120)); </code> This is the kind of stuff that works just fine in other ORMs, like Entity Framework. Where Quino goes just a little farther is in being more forgiving when a query can be only partially mapped to the server. If you've used EF for anything beyond trivial queries, you've surely run into an exception that tells you that some portion of your query could not be mapped.<fn> Instead of throwing an exception, Quino sends what it can to the database and uses LINQ to post-process the data sent back by the database to complete the query. <h>Introducing unmappable expressions</h> Unmappable code can easily sneak in through aspects in the metadata that define filters or sorts using local methods or delegates that do not exist on the server. Instead of building a complex case, we're going to knowingly include an unmappable expression in the query. <code> var query = Session.GetQuery<person>(); query .Where(<hl>new DelegateExpression<fn>(c => c.GetObject<person>().LastName.StartsWith("M")</hl><fn>) .Join(Person.Relations.Company).WhereEqual(Company.Fields.Name, "IBM"); Assert.That(Session.GetCount(query), Is.Between(100, 120)); </code> The new expression performs the same check as the previous example, but in a way that cannot be mapped to SQL.<fn> With our new example, we've provoked a situation where any of the following could happen: <ul> The ORM could throw up its hands and declare the query unmappable, pushing the responsibility for separating mappable from unmappable onto the shoulders of the developers. As noted above, this is what EF does. The ORM could determine that the query is unmappable and <i>evaluate everything locally</i>, retrieving only the initial set of <c>Person</c> objects from the server (all several million of them, if you'll recall from the previous post). The ORM could map part of the query to the database, retrieving the minimal set of objects necessary in order to guarantee the correct result. This is what Quino does. The strategy works in many cases, but is not without its pitfalls. </ul> What happens when we evaluate the query above? With partial mapping, we know that the restriction to "IBM" will be applied on the database. But we still have an additional restriction that must be applied locally. Instead of being able to get the count from the server without creating any objects, we're now forced to create objects in memory so that we can apply the local restrictions and only count the objects that match them all. But as you'll recall from the previous article, the number of matches for "IBM" is 140,000 objects. The garbage collector just gave you a dirty look again. <h>Memory bubbles</h> There is no way to further optimized this query because of the local evaluation, but there is a way to avoid another particularly nasty issue: memory bubbles. What is a memory bubble you might ask? It describes what happens when your application is using <i>n</i>MB and then is suddenly using <i>n</i> + 100MB because you created 140,000 objects <i>all at once</i>. Milliseconds later, the garbage collector is thrashing furiously to clean up all of these objects---and all because you created them only in order to filter and count them. A few milliseconds after that, your application is back at <i>n</i>MB but the garbage collector's fur is ruffled and it's still trembling slightly from the shock. The way to avoid this is to <i>stream</i> the objects through your analyzer one at a time rather than to create them all at once. Quino uses lazily-evaluated <c>IEnumerable<t></c> sequences throughout the data driver specifically to prevent memory bubbles. <h>Streaming with <c>IEnumerable<t></c> sequences</h> Before tackling how the Quino ORM handles the <c>Count()</c>, let's look at how it would return the actual objects from this query. <ul> Map the query to create a <c>SELECT</c> statement At this point, it doesn't matter whether the entire query could be mapped Create an <c>IEnumerable<t></c> sequence that represents the result of the mapped query At this point, nothing has been executed and no objects have been returned Wrap the sequence in another sequence that applies all of the "unhandled" parts of the query Return that sequence as the result of executing the query At this point, we <i>still</i> haven't actually executed anything on the database or created any objects </ul> Right, now we have an <c>IEnumerable<t></c> that represents the result set, but we haven't lit the fuse on it yet. How do we light the fuse? Well, the most common way to do so is to call <c>ToList()</c> on it. What happens then? <ul> The <c>IEnumerator<t></c> requests an element The query is executed against the database and returns an <c>IDataReader</c> The reader requests a row and creates a <c>Person</c> object from that row's data The wrapper that performs the local evaluation applies its filter(s) to this <c>Person</c> and yields it if it matches If it matched the local filters, the <c>Person</c> is added to the list Control returns to the <c>IDataReader</c>, which requests another row Repeat until no more rows are returned from the database </ul> Since the decision to add all objects to a list occurs <i>all the way at the very outer caller</i>, it's the caller that's at fault for the memory bubble not the driver.<fn> We'll see in the section how to avoid creating a list when none is needed. <c>Using cursors to control evaluation</c> If we wanted to process data but perhaps offer the user a chance to abort processing at any time, we could even get an <c>IDataCursor<t></c> from the Quino ORM so control iteration ourselves. <code> <b>using</b> (<b>var</b> cursor = Session.CreateCursor(query)) { <b>foreach</b> (<b>var</b> obj <b>in</b> cursor) { // Do something with obj <b>if</b> (userAbortedOperation) { <b>break</b>; } } } </code> <h>And finally, the count query</h> But back to evaluating the query above. The Quino ORM handles it like this: <ul> Try to map the query to create a <c>COUNT</c> statement Notice that at least one restriction could not be mapped Create a cursor to SELECT all of the objects for the same query (shown above) and count all the objects returned by that instead </ul> So, if a count-query cannot be fully mapped to the database, the most efficient possible alternative is to execute a query that retrieves as few objects as possible (i.e. maps as much to the server as it can) and streams those objects to count them locally. Tune in next time for a look at how to exert more control with <c>limit</c> and <c>offset</c> and how those work together with partial mapping. <hr> <ft>These exceptions are not always obvious, either. Often, the exception complains about not being able to call a static method on a non-static object or class, which occurs when mapping the Expression object to SQL. EF's error messages have in general gotten much better but the mapping failures can be quite confusing.</ft> <ft>If we were worried that the last names in our database might not necessarily be capitalized, we would use the <c>ExpressionOperator.StartsWithCI</c> to perform the check in a case-insensitive manner instead.</ft> <ft>A <c>DelegateExpression</c> simply wraps the lambda given in the constructor in a Quino expression object. The parameter <c>c</c> is an <c>IExpressionContext</c> that provides the target object, which is in this case a <c>Person</c>.</ft> <ft>I'm going to go ahead and assume that my database schema doesn't allow nulls for the <c>LastName</c> field.</ft> <ft>If Quino had a LINQ-to-SQL provider, there's a chance that more of these delegates could be mapped, but we don't have one...and they can't.</ft> <ft>Did we still create 140,000 objects? Yes we did, but not <i>all at once</i>. Now, there are probably situations where it is better to create several objects rather than streaming them individually, but I'm confident that keeping this as the default is the right choice. If you find that your particular situation warrants different behavior, feel free to use <c>Session.CreateCursor()</c> to control evaluation yourself and create the right-sized batches of objects to count. The <c>ChangeAndSave()</c> extension method does exactly that to load objects in batches (size adjustable by an optional parameter) rather than one by one.</ft>