Business applications evolve very fast. The functional requirements are more sophisticated and we need to manage more non-relational data (photos, documents, images, videos …).
This need increases by several orders of magnitude the volume of information to be handled, as well as the complexity at software development and, above all, systems operations.
Data growth
Traditionally, business applications consisted mainly of some type of user interface (with some kind of more or less sophisticated forms technology) that allowed different users to interact with the environment to enter and consult data. In one way or another, data ended up in a traditional relational database (Oracle, SQL Server, DB2, Informix, …). The complexity came from the fact that, depending on the application and the company, some of these tables could have millions of rows and the query of data by very varied criteria resulted in very sensitive criteria for optimizing the queries (the famous ‘query plan’ ). In the most operational aspects, the headache was the ability to recover in critical situations (backup, replication, disaster recovery procedures, …).
That has remained so until relatively recently (a few years), when those applications were becoming more sophisticated and needed to cover other business demands. It was not enough to save all the data of our client in his client record at the application, we also had to save, for example, the contract document signed between the parties and had to be accessible from the application itself; or the mail messages, with all their attachments, that we have exchanged with the client or supplier in a certain business process. This requirement has been solved from the SW development department of the companies, typically by one of three alternative ways:
- We store this data in a file service and associate it in the database through a link. This is reasonably simple but it gives quite a few management problems, and also technical ones. We have two repositories to manage (which synchronize backups, for example). If the data is sensitive, we will need specific security policies in two media (database and operating system) that are not integrated in a particularly simple way; we also complicate the transactional consistency (canceling a complete transaction in database due to a failure is simple, if we also have data in a filesystem, things get more complicated).
- We store this data in a document management system or similar. In many aspects it is quite close to the previous scenario, with the advantage that a document manager system provides more and better management services, but also a greater complexity in operations and administration (we must manage, patch, upgrade two complex systems).
- We store this non-relational data in the database. This is, at the level of software engineering, the simplest; unique interface (SQL), transactional consistency, a data-type (BLOB) that allows to handle objects of any nature. In many cases it is the option chosen by many customers. In particular, where the decision is directed or influenced by the software engineering team.
This last scenario leads us to the fact that these databases no longer only handle relational data, but that they must manage very high volumes of binary content. And, although the most advanced enterprise database technologies are capable of doing this, we quickly discovered that the cost of infrastructure and operation is skyrocketing. These critical environments require infrastructure of the highest quality and speed and that is paid for.
Object store. New storage paradigm
This leads us to look for alternatives and lately, with the explosion of the volume of data handled, the alternative of Object Stores for storing files and binary content has become very popular, where the Amazon AWS S3 service has become a de-facto standard. It seems sensible to move those binary contents to S3 or similar object storage with several clear and immediate benefits:
- Unlimited storage with virtually no management required.
- Much lower costs.
- Possibility of exploiting these contents in alternative scenarios (for example, advanced analytics, machine learning) without having a direct impact on the databases of the transactional systems.
- Simplification or elimination of conventional backup needs.
- Possibility (depending on the technology) to apply information retention policies that facilitate compliance with data retention regulations, so that the repository itself ensures the inalterability and prevents its deletion during the defined period.
The advantages are multiple; but there are also drawbacks. The main one is the fact that these systems have access through an API that, without being very complex, forces us to change the entire data access and persistence layer in our business applications to save and access the information in this repository. And that can be a tedious job, subject to errors and with a certain risk, proportional to the level of complexity and obsolescence of our applications. And almost more important: it involves diverting the resources of engineering and software development in our company (always scarce) to solve an internal problem of IT, which does not provide a direct functional value to the business user.
Databases meet object stores
In this context, at Tecknolab we have proposed to provide a solution that allows to move binary content to the different options of object store repositories, both in public Cloud services (Amazon AWS, Microsoft Azure, Google Cloud Platform) and , for an ‘on-premise’ deployment in a local datacenter, with the main object storage manufacturer alternatives (Hitachi HCP, Dell-EMC ECS, IBM COS, among others). With this service, called DBcloudbin, the configuration at the database is immediate and, what is more important, transparent for the application; with the same software, the application continues to access the data through the database using SQL as before, but in reality the system is responsible for reading the data that has been moved to the object storage and providing it to the application as if were in the database. This gives us all the benefits of having the data centralized in the database (single access, transactional consistency) but with the savings of using a much cheaper infrastructure for those data that do not need the access speed of a relational database . For more details of the solution, visit https://www.dbcloudbin.com/solution
