“Data is a precious thing and will last longer than the systems themselves”, says the inventor of the World Wide Web (Spotfire Blogging Team, as cited by Tim Berners-Lee, 2013).
Speaking of systems, today’s modern society uses them across all acts of life for many purposes. One area in particular where they are widely used, is the globally recognized multidisciplinary field of study called, Modeling & Simulation (M&S). This field spans across a multitude of business verticals in almost every aspect of life. According to Banks and Sokolowski (2012), the field of M&S serves as a means of analyzing, assessing data to provide information for decision-makers, and/or teaching and training. To do this, both scholars and business professionals alike must identify a system in which they seek to model.
What is a system?
A system is an assembly or collection of different elements (i.e. people, hardware facilities, etc.) that together produce results not obtainable by the elements alone (Banks and Sokolowski, 2012). So synergistic, you got to love it! But what does a system have to do with this discussion? In order to analyze the rapidly growing volume of data generated throughout the economies across the world, practitioners often use data models as the foundational construct that enables people to access and analyze critical data associated with their respective domains of knowledge; basically, the systems in which they operate within such paradigms. With data models on our mind, let’s discuss the significance of their existence.
Data Models – What are they?
Once considered a field tucked away deep within the engineering and computer science disciplines, data modeling—in association with M&S—is now a big part of our lives whether we know it or not. To put this in perspective, generally speaking, have you used Google lately? What about online banking via a mobile application? That’s right, all of these examples use some form of a data model to generate the answer to your desired inquires. Without them, we would probably have tons of data/information/knowledge haphazardly floating in the cloud of the World Wide Web without any logical structure to make sense of it all. Even worst, we could still be using manual (i.e. manpower intensive) solutions that are prone to error such as using financial logbooks to track metrics associated with our money operations. On a basic level, I think you get the picture now. However, let’s move past the small talk and dive deeper into some of the major concepts associated with data modeling and why they are useful to our society today.
According to Visual Paradigm (2020), Data modeling is a technique to document a software system using entity relationship diagrams (ER Diagram), which is a representation of the data structures in a table for a company’s database. Such tables represent some form of a system associated with the process of events linked together and/or in isolation to operate a business (i.e. Human Resources, Finance, Inventory, Logistics, etc.). According to Guru99 (2020), there are three main types of data models: 1) Conceptual, 2) Logical, and 3) Physical. These models differ in purpose and often used for different target audiences. Let’s take a closer look at these models.
Conceptual Data Model
A conceptual data model is commonly referred to as a high-level (i.e. summary) model that defines ‘what’ the system contains. According to Guru99 (2020), the purpose is to organize, scope and define business concepts and rules. The information to build the model comes from business requirements aligned operational needs. Below is a diagram that shows an example of what a typical conceptual data model consists of.
The conceptual design models the business objects that should exist in a system and the relationship between them (Visual Paradigm, 2020). This helps paint the bigger picture of the systems and the objects associated. Visual Paradigm (2020) also points out that although the conceptual model defines what entities exist, they DO NOT outline the tables associated. More specifically, there is no cardinality using attributes of a business object table (i.e. one-to-one, one-to-many, many-to-many relationships) to show relations between objects in conceptual data models. Instead, relationships between objects (i.e. cardinality) are defined but table attributes are rather integrated into the design of the logical and physical data models for further clarity. Lastly, this model is considered the simplest of the data models given its conceptual state.
Conceptual Data Model – Key Takeaways:
This model establishes the entities, their attributes, and their relationships using a high-level visual depiction of business objects associated with business needs;
Front-line Business Stakeholders, Data Architects, and Business Analysts often lead the design of conceptual models.
Logical Data Model
According to Guru99 (2020), logical data models define ‘how’ the system should be used regardless of the database management system (DBMS) specifications. The purpose is to design a technical map of rules and data structures without regard to a specific technology. Although it’s good practice to consider the potential technology obstacles beforehand—to ensure the design of the model will be interoperable—it’s not mandatory. Furthermore, a logical data model aids in the development of the conceptual model by defining explicitly the columns in each entity and introducing operational and transactional entities (Visual Paradigm, 2020). Below is an example of a logical data model diagram for your review.
This model defines the columns in each entity and introduces operational and transactional entities as table columns within a business object; Relational links in the form of cardinality between the attributes of a table are not setup in this model. Instead, similar to the conceptual data model, cardinality is only established to show the relationship between business objects, not the relationships between the attributes within the objects (i.e. table columns).
This model is often designed by Data Architects and Business Analysts.
Physical Data Model
The physical data model describes ‘how’ the system will be implemented using a specific DBMS technology (Guru99, 2020). The purpose of this data model is to implement the design within a specific technology or network of technologies to later access and analyze the data. According to scholars at Visual Paradigm, the physical data model represents the actual design blueprint of a relational database. A physical data model elaborates on the logical data model by assigning each column with type, length, nullable, etc. (2020). This model is the most complex as it is designed, implemented, and employed to evaluate both qualitative and quantitative measures for oversight/management of the real-world system it’s modeled after; thus, using real data from processes linked to the system(s) in order to regularly test the hypothesis it was originally designed for. Below is an example of a physical data model.
This model further defines the business objects within the logical data model by assigning each table column with a data type, length, nullable, etc. in effort to set the parameters of the relational database structure. The structure of the physical data model should resemble the actual database management system it is being implemented in.
This model is often designed by Data Architects and Business Analysts.
This model is used as the blueprint of a relational database (Visual Paradigm, 2020); the focused implementation technology could possibly be a number of different options such as a spreadsheet, XML files, NoSQL storage, a relational DBMS, and/or a combination of them all.
For the sake of this discussion, let’s focus on a relational database management system (DBMS) as it relates to data models. Do the keywords ‘entity relationship’ ring any bells now that we laid-out the fundamentals for data models? If so, awesome! If not, check out the brief overview below on entity relationships and how they relate to data models.
Entity Relationship Diagram (ERD)
An entity relationship diagram (ERD) can be described as a visual depiction of relational data structures, which represent a model of a system. An ERD provides users with an outline of tables (i.e. entities) in a database and the relationships between them using universal database symbology. Furthermore, it’s common to hear across the community-of-practice that ERDs are considered a best practice for good database design. Lastly, it’s good to know that ERDs are composed of three main components.
What are the three components of an Entity Relationship Diagram (ERD)?
The three components of an ERD are:
Entities (i.e. objects represented as tables that we want to store information about such as a person, place, thing, or event),
Attributes (i.e. represented as columns in a table that provide facts about the entity), and
Relationships (i.e. shows the link(s) between the entities).
As outlined in the previously discussed physical data model below, we can highlight the three components of the ERD. First, entities can be identified as the objects visually displayed as the ‘Customer’ table, ‘Purchase_Order’ table, ‘Product’ table, ‘Order_Product’ table, and the ‘Supplier’ table. Next, the information presented inside of the tables are known as the attributes (e.g. Name, Address, Gender, Date_of_Birth) associated with the entity (e.g. ‘Customer’ table). To end on a relational-note, the solid and dotted-lines represent the relationship between the entities. In particular, these relationships are traditionally called cardinality. According to Visual Paradigm (2020), cardinality defines the possible number of occurrences in one entity which is associated with the number of occurrences in another.
3 Common Cardinality Relationships: One-to-One, One-to-Many, and Many-to-Many
According to SmartDraw (2020), here’s an example of the three commonly used cardinality relationships.
Source: SmartDraw (2020) | Image: 3 Common Cardinality Relationships
How will Data Modeling & Analysis help?
To support my ‘hypothetical’ working company (i.e. McCormick & Company Inc.), I will use a logical, conceptual, and physical data model that’s aligned to the specific business needs. In particular, I will use the following physical data model (i.e. entity relationship diagram (ERD)) as the foundation for further analysis to identify company challenges/opportunities associated with company business operations. As outlined below, this physical data model provides an overview of my working company’s data structure as it relates to ‘Financials’.
Source: Corey Seamster (2020) | Image: Power BI Desktop – MKC Financials (Physical ERD)
More specifically, entities can be recognized by the objects named, ‘Balance_Sheet’, ‘Cash_Flows’, ‘Income_Statement’, and ‘Stock_Prices’. Within these entities are, attributes. For instance, in the ‘Income_Statement’ entity, you will see attributes in the form of columns such as ‘EPS (Basic)’, ‘Gross Profit %’, ‘Gross Profit $’, ‘Net Income’, etc. The relationships between these entities are displayed using lines between them. As an example, the primary link between all entities in this physical data model is the ‘YearID’ entity and/or ‘YearID/Year’ attributes. The ‘YearID’ entity is in the middle of the ERD.
Looking ahead, while conducting more research and analysis on my working company, I will add more data to this model to better align my data mining efforts to the targeted area of business operations that present the biggest challenge(s)/opportunities. At the moment, the analysis of the financial system presents both challenges and opportunities associated with Inventory. For that reason, my next move will be to add an ‘Inventory’ entity so I can further investigate the data trail associated. Overall, data modeling will benefit my working company because I will be able to present a strategic and tactical-level blueprint for all levels of target audiences (i.e. Executives, Management, Analysts, SMEs, etc.). These blueprints will enable me to educate the client on the process associated with my proposed business intelligence (BI) solution. This will help me visually depict entity relations across their data to support the challenge(s)/opportunities identified in my analysis. Additionally, it will serve as a catalyst to further engage with the working company’s business stakeholders in order to truly understand their business requirements for data model refinement and later implementation efforts.
With all things considered, the data used to develop the model and the “inputs” to the simulation are in effect what validate and verify the construct of the model and the content of the simulation outputs (Banks and Sokolowski, 2012). Additionally, the validation and verification of the model are directly linked to the research and understanding of what is being studied (i.e. the original purpose for creating the model) ((Banks and Sokolowski, 2012).
Data models are being created and implemented on a daily basis in our modern-day society. Not only can they provide a strategic outline of data structures designed to emulate the systems they seek to learn more about, they also bring great value to the tactical level users that seek near real-time access to data via user-interfaces that operate within the construct of businesses that some of us call, databases. For all such reasons, Business Intelligence professionals will definitely explore and be exposed to the concepts presented in this discussion. If data is to be understood, it must be modeled in a way that’s interpretable. When it’s your time to create a data model, have some fun and don’t forget all the cool fundamental facts listed above while on your journey.