This page shows the source for this entry, with WebCore formatting language tags and attributes highlighted.

Title

Quino Data Driver architecture, Part III: The Pipeline

Description

In <a href="{app}view_article.php?id=406">part I</a> of these series, we discussed applications, which provide the model and data provider, and sessions, which encapsulate high-level data context. In <a href="{app}view_article.php?id=402">part II</a>, we covered command types and inputs to the data pipeline. In this article, we're going to take a look at the data pipeline itself. <ol> <a href="{app}view_article.php?id=406">Applications & Sessions</a> <a href="{app}view_article.php?id=402">Command types & inputs</a> The Data Pipeline Builders & Commands Contexts and Connections Sessions, resources & objects </ol> <h>Overview</h> <img src="{att_link}data_driver_structure_2015_02_02.png" href="{att_link}data_driver_structure_2015_02_02.png" align="right" caption="Major Components of the Data Driver" scale="65%">The primary goal of the data pipeline is, of course, to correctly execute each query to retrieve data or command to store, delete or refresh data. The diagram to the right shows that the pipeline consists of several data handlers. Some of these refer to data sources, which can be anything: an SQL database or a remote service.<fn> The name "pipeline" is only somewhat appropriate: A command can jump out anywhere in the pipeline rather than just at the opposite end. A given command will be processed through the various data handlers until one of them pronounces the command to be "complete". <h>Command context: recap</h> In the previous parts, we learned that the input to the pipeline is an <c>IDataCommandContext</c>. To briefly recap, this object has the following properties: <ul> Session: Defines the context within which to execute the command Handler: Implements an abstraction for reading/writing values and flags to the objects (e.g. <c>SetValue(IMetaProperty)</c>); more detail on this later Objects: The sequence of objects on which to operate (e.g. for save commands) or to return (e.g. for load commands) ExecutableQuery: The query to execute when loading or deleting objects MetaClass: The metadata that describes the root object in this command; more detail on this later as well </ul> <h>Handlers</h> Where the pipeline metaphor holds up is that the command context will always start at the same end. The ordering of data handlers is intended to reduce the amount of work and time invested in processing a given command. <h level="3">Analyzers</h> The first stage of processing is to quickly analyze the command to handle cases where there is nothing to do. For example, <ul> The command is to save or delete, but the sequence of <c>Objects</c> is empty The command is to save or reload, but none of the objects in the sequence of <c>Objects</c> has changed The command is to load data but the query restricts to a <c>null</c> value in the primary key or a foreign key that references a non-nullable, unique key. </ul> It is useful to capture these checks in one or more analyzers for the following reasons, <ol> All drivers share a common implementation for efficiency checks Optimizations are applied independent of the data sources used Driver code focuses on driver-specifics rather than general optimization </ol> <h level="3">Caches</h> If the analyzer hasn't categorically handled the command and the command is to load data, the next step is to check caches. For the purposes of this article, there are two things that affect how long data is cached: <ol> If the session is in a transacted state, then only immutable data, data that was loaded before the transaction began or data loaded within that transaction can be used. Data loaded/saved by other sessions---possibly to global caches---is not visible to a session in a transaction with an <c>isolationLevel</c> stricter than <c>RepeatableRead</c>. The metadata associated with the objects can include configuration settings that control maximum caching lifetime as well as an access-timeout. The default settings are good for general use but can be tweaked for specific object types. </ol> Caches currently include the following standard handlers<fn>: <ul> The <c>ValueListDataHandler</c> returns immutable data. Since the data is immutable, it can be used independent of the transaction-state of the session in which the command is executed. The <c>SessionCacheDataHandler</c> returns data that's already been loaded or saved in this session, to avoid a call to a possibly high-latency back-end. This data is safe to use within the session with transactions because the cache is rolled back when a transaction is rolled back. </ul> <h level="3">Data sources</h> If the analyzer and cache haven't handled a command, then we're finally at a point where we can no longer avoid a call to a data source. Data sources can be internal or external. <h level="4">Databases</h> The most common type is an external database: <ul> PostgreSql 8.x and higher (PostgreSql 9.x for schema migration) Sql Server 2008 and higher (w/schema migration) Mongo (no schema; no migration) SQlite (not yet released) </ul> <h level="4">Remoting</h> Another standard data source is the Quino remote application server, which provides a classic interface- and method-based service layer as well as mapping nearly the full power of Quino's generalized querying capabilities to an application server. That is, an application can smoothly switch between a direct connection to a database to using the remoting driver to call into a service layer instead. The remoting driver supports both binary and JSON protocols. Further details are also beyond the scope of this article, but this driver has proven quite useful for scaling smaller client-heavy applications with a single database to thin clients talking to an application server. <h level="4">Custom/Aspect-based</h> And finally, there is another way to easily include "mini" data drivers in an application. Any metaclass can include an <c>IDataHandlerAspect</c> that defines its own data driver as well as its capabilities. Most implementations use this technique to bind in immutable lists of data. But this technique has also been used to load/save data from/to external APIs, like REST services. We can take a look at some examples in more detail in another article. The mini data driver created for use with an aspect can relatively easily be converted to a full-fledged data handler. <h level="3">Local evaluation</h> The last step in a command is what Quino calls "local evaluation". Essentially, if a command cannot be handled entirely within the rest of the data pipeline---either entirely by an analyzer, one or more caches or the data source for that type of object---then the local analyzer completes the command. What does this mean? Any orderings or restrictions in a query that cannot be mapped to the data source (e.g. a C# lambda is too complex to map to <dfn>SQL</dfn>) are evaluated on the client rather than the server. Therefore, any query that can be formulated in Quino can also be evaluated fully by the data pipeline---the question is only of how much of it can be executed on the server, where it would (usually) be more efficient to do so. Please see the article series that starts with <a href="{app}view_article.php?id=3003">Optimizing data access for high-latency networks</a> for specific examples. In this article, we've learned a bit about the ways in which Quino retrieves and stores data using the data pipeline. In the next part, we’ll cover the topic “Builders & Commands”. <hr> <ft>E.g. Quino uses a ProtoBuf-like protocol to communicate with its standard application server.</ft> <ft>There is an open issue to <a href="https://secure.encodo.ch/jira/browse/QNO-4767">Introduce a global cache for immutable objects or objects used not in a transaction</a>.</ft>