Memospheres: March 2024

My first job was in finance. I spent 80% of my daily life on the phone with Microsoft Excel, analyzed at least several million data in Excel, and wrote or updated reports. Most of the data was stored and shared in a denormalized format by the IT department, and I was able to execute queries with read-only access. At that time, I had no knowledge of optimization, so I reduced the volume of output using a lot of GROUP BY and subqueries.

Eventually, I was called by the IT department because of frequent use of inefficient and expensive queries, and I got invited to the meeting to share the usage cases. During the meeting, for the first time, I saw stored procedures that produced denormalized tables, which I used often, through the screen of the DBA with the tool "Toad". I still remember my first question at that time. "How do you remember all these table relationships?" He answered I will be like that naturally if I work in his field. After that meeting, I received the part of the PDF files about the ER diagram including the tables and got read-only access about some normalized tables.

Starting with this, there are more stories, but I will cover them in other articles. What I want to share here is the efficiency of denormalization. In many articles, denormalization lowers complexity and increases understanding table relationships. Of course, this is not wrong. However, in the real situation, it would be good to think about this advantage once more and design denormalization.

Query Complexity: Most denormalized tables have their own purpose. If a new query is built within the table for other purposes, the complexity of the query is sometimes increased. (If I remember correctly, this is why the IT department pinged me.)

Increased understanding of table relationships: Most users who use denormalized tables do not have the authority to view the source code. If there is no specific information like ERD and how to use the table, it takes a lot of time and cost to understand the table relationships without the help of an expert. (If I had ERD, at least, at the time, I would have made some table access requests first instead of creating and executing inefficient queries.)

Additional possible Cost: In addition to increasing costs associated with storage space, indexing management and maintenance costs can also be incurred.

If someone asks me how to solve the above problem, I would like to answer like this: "The good practice is to continuously monitor and collect use cases from users about how to use those tables and optimize them."

When I write this, big data is a trend, and there are many use cases of denormalization. In addition, there are many tools such as NoSQL and columnar databases to optimize and manage denormalized data sets. Personally, denormalization can become a more general data design practice instead of normalization in the future.

Memospheres

Sunday, March 3, 2024

First meet with Normalization data