AsterionDB's CEO Steve Guilford publishes thought leadership article in IDG's TECH(Talk) Online Community
The following is a reprint of an article that first appeared in IDG’s TECH(Talk) online community forum.
“Data to the left of me, Business logic to the right, here I am, stuck in the middle-tier with you” (Apologies to Stealers Wheel)
Yes, sometimes life does imitate art. Here we are, suffering from a bloated middle-tier that has too many critical user resources exposed to cybersecurity threats. Well, I know what I’m going to do, I’m going to figure out how to get those resources out of the middle-tier.
Previously, we discussed the need to migrate user data files out of the middle-tier. Now, we need to look at the other critical component, our business logic. That too needs to move down to the data layer. But, how is that to be accomplished? After all, the middle-tier gives us flexibility and elasticity. Are we going to forgo those aspects if we lift-and-shift to the data layer? Monolithic architectures have been in disfavor for quite some time. Does this mean that we are moving back to vendor dominance and platform lock-in?
The flip-side of all of the flexibility that we have in the middle-tier is a growing level of complexity that prevents software developers from delivering secure, stable applications that answer the needs of their users. This is a recognized trend as demonstrated by this recent article in InfoWorld:
(…being a software developer myself, I take this a bit personally…)
Like a pendulum that has swung from monolithic to distributed systems, a shift to a middle-ground may be required. By this I mean, let’s keep what’s good about the middle-tier; things like flexibility, elasticity and networked services. But, could we flip the orientation around so that instead of the middle-tier being in control of things it takes its direction from the data-layer – where all of our logic and data will live?
Moving Logic to the Data Layer
What would it take to move all of our logic to the data-layer? For many programmers, this is a somewhat foreign concept. After all, the majority of the programming languages we work with are primarily middle-tier and client-layer oriented.
True data-layer languages are actually few-and-far between. A data-layer language is one that is tightly integrated into the data repository itself, not separated by a ‘network-layer’. Data-layer languages have built in advantages such as being co-located with the data and localized data processing where the data does not have to leave in order to be processed.
Time For a Big Decision
We’re going to have to make a important choice now. We’re going to base much of our apparatus and infrastructure upon the database we choose. We will be relying upon it to provide most of the infrastructure, data processing and security for our structured and unstructured data and our business logic. So, what kind of database should we be looking at?
Obviously, we’ll need a database with a broad range of capabilities. We’ve already discussed how the database will need advanced features to manage BLOB data as well as relational facilities to implement a rich keyword tagging mechanism. NoSQL doesn’t look like it’s going to fill these requirements. This is especially true when one considers that the NoSQL paradigm does not allow for the inclusion of business logic within the database engine.
Our old standby, the relational database, has the ability to incorporate business logic. Oracle, PostgreSQL and SQLServer all have their own language embedded within database that allows programmers to implement business logic. These databases also support BLOB data. Clearly, we will need to evaluate which of these, and other RDBMS candidates, are up to the task.
Rich Logical Environment Needed
Every programmer and development team manager knows how an application’s code-base can quickly become a sprawl rivaling the complexity of the application itself. Moving business logic certainly promises to remove a lot of technologies (e.g. data-layer abstraction libraries) that are no longer needed in order to connect the middle tier and the data-layer together. Still, we need a rich environment to organize our applications which are now going to be expressed as logical elements in the database.
We will be building stored procedures and functions in the database. We are also going to need to compartmentalize business logic segments (e.g. purchasing, big-data analysis) into their own organizational units within the database. Some databases provide basic data-layer programming capabilities while others have invested more effort to deliver a rich logical environment.
We will need a database that can organize business logic elements, your functions and procedures, into logical units or groups. A common nomenclature for these organizational elements is stored packages, procedures and functions. Other elements that come into play are triggers, user-defined data types and so-forth.
I Use Database Logic Already, What’s the Difference?
Of course, these logical database elements have been in place, to varying degrees, in many RDMS products. Programmers are familiar with calling a database procedure from a client application. Most of the time however, stored logic in the database is concerned with data manipulation (i.e. insert, update, delete) and returning individual data elements. Unfortunately, it’s difficult to return a set of data from a stored function.
Functions by design return a specific instance of a data type – that does not work well when you want to return a set of data that consists of columns and rows. Some languages support functions that return cursors or table sets but these mechanism are highly vendor specific, rather uncommon and difficult to use.
It’s just easier and quicker for programmers to code their SELECT statements directly into their application logic and (maybe) call the database to do some specific manipulation through functions and procedures. If we’re going to push all of the logic down to the data-layer, that has to include our SELECT statements as well.
Now, not to get too far ahead of myself, this has some very significant security implications; but I digress.
Functions That Return Complex Data-Sets
So, what we need is a data-type that a function can return that will represent our data-set. The data-type needs to be self-describing – that allows us to return one element that contains many sub-elements. In addition, the data-type needs to be common and easily implemented by programmers.
If we had such a data-type, we could then write functions that pack the data fetched by a SELECT statement and return that to the caller. Fortunately, certainly for the benefit of this article series, there is such a data-type; it’s called a character string and all you have to do is encode it as a JSON value.
Really, it is quite beautiful. Express your return set as a JSON value, serialize that into a character string and have the function return a well known data type that is easy to manipulate and work with.
Another big benefit of embedding your SELECT statements into functions that return a JSON string is that you can now easily isolate changes between the data-layer and the presentation-layer. It’s a simple process for the database programmer to add a new column to a data-set. It’s also just as easy for the client-side programmer to check for the presence of the value and act upon it accordingly. This allows your two teams to work separately but easily weave their work together when appropriate.
Using JSON as the return type also imparts a degree of resilience to your software interfaces. With JSON, you will only bind to a function that returns a character string. If the database team changes the data-set returned by the function, those changes will be isolated from your logic that places the call to the database. In essence, using JSON as the lingua franca between the client and data layers allows us to move the data representation abstraction out of the network layer and into the respective logic at either end of the network.
But Wait, There’s More…!
Figuring out how to move our SELECT statements into stored functions is a big step. But, is that all that is required? Of course not. Data-layer languages are great for manipulating data, but handling other tasks that require complex or esoteric logic is outside of their design parameters. Here’s some examples:
We need to be able to analyze all of that ‘unstructured’ data that we’re going to store in the database. After all, it’s not good enough to just store the data; we have to be able to work with it too. How will database logic understand what a video is? It’s not a regular structured data type like a character string or a number.
How will we control all of those fancy robots and IOT devices that have infiltrated our lives? This type of apparatus is usually integrated through the use of a device driver. Database programming languages are not designed to interface to device drivers.
So, as you can see, we also need some way to extend the logical capability of the database. We need a smarter database that can do more with all of the data and logic it’s now going to be storing. That, my friends, will be the subject of our next installment of ‘The Modern Mainframe Architecture to the Rescue’.