Engineering modern applications is a daunting task, and designing a robust data model for modern applications is one of the most difficult, albeit most important, parts of modern application engineering.
Failing to build a reasonable data structure can cause your application to fail in many bad ways – from performance issues to data integrity issues to data sovereignty and data security issues, to scalability issues. Poor data architecture can leave your app – and your company – in a bad state.
Building an appropriate data architecture is critical to the long-term success of all modern architectures. To help with your application update process, here are five rules to follow when engineering –or re-engineering— Your application data.
Use the correct type of database
The first and most important decision in designing your data is understanding what type of database you need to store and access your data. Will you need:
- Do you want to store highly structured data or simple data with key value?
- Do you want to continue using data permanently or only for a short period of time?
- Accessing data randomly or sequentially?
- Use a flat chart, a flexible chart or a simple flat file?
- Using a relational database that supports SQL queries?
You need answers to these questions to determine what type of database you need to use. Depending on these answers, you might choose an SQL database, a simple key-value store, an in-memory-resident cache, a simple object store, or a highly structured data store.
The type of database you select will determine what your database can ultimately do and how well it performs if using your application. Things as an integral part of your application such as defining your scalability and availability requirements are greatly influenced by your database choice.
Store data in the right place
Deceptively simple but important question is, where should the data be stored? Depending on the data and your application, do you need to store the data, for example, in the front end of the application or at the back end? Can you store the data locally to the consumer, or do you need to share the data with many other consumers?
Most of the data is stored at the back end. But some data must be stored at the edge or in the client. Storing data on the front end is often required to improve performance, availability, reliability, and scalability.
Think about expansion from the start
Modern applications must be able to scale to meet the growing needs of business customers. This is true for all businesses and all applications.
The hardest part of all is building an application that can scale to meet your growing needs of data warehouse scaling. Whether it’s scaling to increase the amount of data you need to store for your growing customer base, or scaling to allow more people to use your application simultaneously without degrading performance, data scaling is difficult unless you plan for it from the start.
However, it seems that most application architectures consider data scaling a side requirement that can be left for later. It’s something app developers think about once they build the main app architecture.
Adequate scaling of the power in the data structure at a later time is a very difficult task, and it becomes even more difficult as the size of your data set increases. By far, the easiest time to build scalability is in the beginning, before your application needs to scale. Waiting until later can make scaling more difficult, and possibly impossible, without significant data restructuring.
Distribute your data across services
A number of cloud experts suggest that centralizing your application data is the right model for managing a large data set for a large application. They argue that centralizing your data makes it easy to apply machine learning and other advanced analytics to get more useful information out of your data.
But this strategy is wrong. Central data is data that cannot be easily measured. The most effective way to scale your data is to decentralize it and store it within the individual service that owns the data. Your application, if it consists of dozens or hundreds of distributed services, will store your data in dozens or hundreds of distributed locations.
This model allows for easy expansion and supports a full service ownership model. Service ownership enables development teams to work more independently, and encourages stronger SLAs between services. This promotes higher quality services and makes data changes more secure and efficient through localization.
But what if your business needs to run analytics or machine learning on all this data? I still recommend the distributed data model described here. However, to make your data useful for analytics and machine learning, submit a copy of the relevant data to the back-end data warehouse system. In this data warehouse system, structure the data in a way appropriate for your analytics purposes, and use this version for your analytics and machine learning algorithms. This data warehouse version is separate and distinct from your application’s log data, which is still stored within individual services.
Distribute your data geographically
Finally, decide who will use the data, and where their geolocation will be. Determining data and user locations is becoming increasingly important as global trade presents increasing opportunities while regional data management constraints make global data management more difficult.
Before creating your data structure, you should answer these key questions:
- Is it important that your data is available globally, or will a regional version of the data be more important to your business? For example, do you want the same or different data available in the US and Germany? Many applications find that a combination of both models is important, and this answer is acceptable, as long as you know which data should be global and which should be regional.
- Do you have regional restrictions on what data you can store and where you can store it? Some regions have restrictions that prevent customer data from leaving the country in which the customer resides. Others have limitations on data that can be transferred across state and regional borders. Some regions have stricter privacy restrictions than others. What data restrictions apply to which parts of your data?
- For data that is shared across regions, how important is it to display exactly the same data in each region? In other words, does the data have to be completely synchronized between regions? Different models place different burdens on your data set. The final consistency model has very different performance characteristics than sour-compatibleTransactional Integration Model.
The answers to these questions will determine whether you provide global or regional data, where that data can and cannot be used, and when and how data is synced between regions.
Data architecture is an important part of designing a modern application that is large-scale, highly available, and universally accessible. Errors in your data structure can cause scaling, availability, and even legal compliance issues. Changing the data structure after your application has grown is both difficult and painful. It is much easier to process master data requirements up front.
By following these five rules early in your data engineering process, you can avoid serious problems in the future.