In this post, we will compare requirements engineering for traditional and BigData Business Intelligence.
1. Introduction
The objective of this document is to present an argument and obtain feedback on the argument, as part of a research project.
The argument is that the requirements engineering for business intelligence for traditional data systems and those for big data systems share many similarities but are also very different in many aspects.
2. Abstract
The requirements engineering process used on any project depends on a number of factors like company culture, risks, technology, etc. We will typically use a waterfall requirements engineering approach for a high profile, value and risks type of projects.
Traditional data systems try to have a one to one match with data objects in the data system and actual objects or events in reality. This requires systems that ensure data are consistent and stable. Traditional business intelligence systems take a snapshot of the data at regular intervals to allow data analyst to obtain trend data which is stored in a data warehouse. Because we cannot go back in time to x, the data in the data warehouse, our requirements engineering process needs to be thorough enough to determine what needs to be in the data warehouse without overwhelming available processing and storage constraints.
The continuous drop in cost to process and store data per gigabyte, allowed business to store much more data than what is required for their operational needs, also referred to as big data. The processing and storage requirement for this data is quite different, for instance, data does not need to be available immediately, consistent, complete as opposed to that of a traditional data system. The availability of this new data technology introduced new business models and influenced the lives of ordinary citizens. The detailed data available in the big data systems allowed business intelligence technologies to create predictive models of human and other entities. We can take these models and place them into production and monitor the outcome to obtain a competitive advantage. The reduced time from introducing a change in production and measuring the outcome as well as having a detailed historical view on the data means that we can take a much more iterative approach to requirements engineering.
3. Requirements Engineering
3.1. What is Engineering?
"Scientists investigate that which is; Engineers create that which has never been." - Albert Einstein
Engineering is the discipline of creating and following a process, the creation can be an artefact like a document, a process, algorithm, mechanical device, software, etc. The creation of one engineering process can be used as a component in another engineering process. Example, we can engineer a process to capture requirements for a big data system, this process can be utilized in a requirements engineering process for a big data system [15].
The engineering process consists of many activities like requirements gathering, design, scheduling, execution, testing, maintenance, etc.
3.2. What is Requirements Engineering?
We use the requirements engineering process to identify capabilities, characteristics and quality factors of the system that we want to create so that the delivered system has value and utility for all the involved stakeholders [28]. In simple terms, we need to understand what the final result should achieve to utilize an opportunity and address a number of issues.
Requirements engineering is a form of communication between the various stakeholders to understand and agree to the value and priority of each requirement.
Requirements engineering does not specify the implementation or design. However, new requirements do out of the design, testing, deployment, operational usage and other activities that follow the requirements. Thus the requirements engineering process is continuous, iterative, dynamic, interactive and never complete.
3.3. Why is Requirement Engineering Important?
As much as 53% of all technical investments go into failed projects or project overrun, the sited root cause in many of these cases are inadequate or no requirements engineering at all [28]. The requirements engineering process is important because it provides the basis of all the work that follow: design, development, testing, deployment, operational, maintenance, support and disposal of any system that is to be built [28].
The focus many times is only on the implementation (the so called "real work" or "solution"), the customers and project managers feel that if code is generated, there is progress. The problem is that without requirements engineering process, the development team has no feeling what the business envision with the product and what the "real requirements" are. [28].
Misunderstanding of requirements result in wasted time and effort in development leading to overcomplicated systems, rework and frustration [28] as what is being built does not align with the business vision and does not solve the actual problem. Wasted time and effort also means financial loss as expensive resources has been dedicated to the "solution" that does not solve the "real problem" [28]. Without real requirements, the test team can only test what is presented to them and cannot verify that any of the real requirements have been met.
There is a substantial difference between the "stated" requirements and the "real" requirements. Stated requirements are requirements provided by a customer at the beginning of a project stating their needs, examples of this include request for information, proposal, quote or statement of work. Real requirements include verified needs of the customer for a particular system or capability [28]. In many cases, it is useful to inspect the system audit logs of actual user usage to determine the difference between what a system owner thought users would use and what is actually being used.
In many developed software systems, as much as 45% of functionality is never used by users. This means that we need to ensure we understand what the true and minimum requirements per iteration are, anything more is too much.
This is in everyone's best interest as it reduces risk, cost, schedule, complexity, etc. [28, p 85]
Identifying real requirements is a dynamic, interactive and iterative process, supported by methods, techniques and tools [28].
3.4. Planning
One of the activities associated with requirements engineering is planning. Most people do agree that a little bit of planning goes a long way. A requirements engineering plan is a foreign concept for most people, but has great power and effect. A requirements plan defines how requirements will be evolved to obtain "real" requirements as well how the requirements activities will be addressed [28].
The plan starts by understanding the corporate culture, business models and involvement, stakeholder availability, who owns the intellectual property, risks and estimated profits involved just to mention a few [26] [11]. We may opt for a highly iterative requirements engineering process where we learn as we go along. This may be possible in a social media company with a lenient corporate culture, sense of adventure and humour and where stakeholders are readily available to clarify any requirements that is not clear during development. On the other side of the spectrum, we may select a waterfall type requirements engineering process with a lot of requirements tests for a mining company where the cost of a mistake is prohibitively high and stakeholders are difficult to get hold of.
Understanding what the required engineering process must deliver in terms of completeness, correctness and risks involved help to select the correct and relevant requirements engineering approach rather than selecting what worked for a recent project or just following in the latest trend.
3.5. Types of Requirements
The following list contains various requirement categories, the list is not exhaustive. Many requirements fall within one or more of the requirement categories [28].
- Business: The business requirements are the reason why we are developing the system in the first place. Describe how the business will use the system, make money, save money, improve business effectiveness, advertisement, etc. Many problems arise when this process is completed and communicated to the rest of the stakeholders.
- User: These requirements describe the profile of the users that will use the system as well as their requirements.
- Business rules: These requirements form the core of the functional requirements.
- Functional: What the system must do
- Design: Separating the requirements engineering process from the design activities is not always possible as a new system may build on top of existing systems or need to interface with them.
- Performance: Describe the required response and scaling requirements.
- Interface: Describing the physical and software interfaces between the various systems.
- User interface: Look and feel, flows, error handling and branding.
- Testability: Describe how the software will be tested. Example, the need for automated tests and simulators. It may also be a requirement to be able to test the system as a whole or components of it in production.
- Deployment: Describe required features of the deployment like maximum down time, where and when the deployment can be done, etc.
- Support: Describe the profile of the support personnel as well as service level agreements, etc.
- Training: Describe required training for support and users of the system.
- Development process: How the software development process should look like. Example, availability of source code, source control, builds, etc.
- Operational: Described the environment on which the new system should work on. The number of concurrent users, etc.
- Monitor: Indicate what performance counters should be measured and the escalation process.
3.6. The Role of Requirements Engineer
The requirements engineer needs to work collaboratively with all the stakeholders like customers, users, and system architects and designers to identify the real requirements for a planned system or software development effort to define the problem that needs to be solved. He needs to review stated customer needs, study business objectives, collaborate with customers and users to evolve business requirements and prioritise them, involve technical leads and architects [28].
The requirements engineer also needs to work effectively with stakeholders like customers and users to manage new and changed requirements so that the project stays under control. He creates a mechanism to control changes. We should welcome changes to the requirement that further clarify the requirement, but does not affect cost, schedule or functionality. According to industry experience, a 20% change to the requirements will double project development cost [28].
The requirements engineer needs to promote the reuse of artefacts to achieve repeatability, by using design patterns and extracting templates from previous completed work [28].
The requirements engineer is responsible to work with business owners to envision a growth path starting with the first release or version of a product through a set of staged releases to the ultimate system or product. No matter how much discussion and testing is done, there will always be missing requirements that will only be discovered until the system is in production [28].
The requirements engineer must also advise stakeholders on methods, techniques, technology and automated tools that are available to best support requirements-related project work and activities. Teams should extensively use tools, processes and techniques they are familiar with to reduce the negative effect of first learning new processes and acquiring new skills. Teams should explore, experiment and familiarize themselves with tools and techniques they are not accustomed to [28].
The requirements engineer should facilitate the use of metrics to measure, track, and control requirements-related project work activities and results. The things that are measured and tracked and that management pays attention to are the ones that improve. We should not measure for the sake of measurement, but use it to evaluate project work and take corrective action [28].
People skills is an absolute must for the requirements engineer. He must be able to facilitate discussions and mediate conflicts. He must ensure that ideas and approaches are discussed as a team as better ideas and approaches will be created than he would have doing it on his own. The requirement engineer does not need to be an expert in everything, his expertise should be to facilitate and mediate [28].
The requirements engineer should study the domain of the area in which the system or software will be used. Without understanding the usage domain, there is no way that the requirements engineer will be able to effectively explain to the software development team what they need to build [28].
3.7. Understanding Business Involvement
The requirements engineer must also understand the business and who should be held accountable for business and technical decisions.
A Harvard study done by Ross and Weill [18] indicated that most key IT business decisions are conveniently handed over to technology executives as they do not realize that these decisions pose a business challenge and not only a technology challenge. Subsequently, they did not take responsibility for the organizational and business process changes required by the technology.
IT executives are the right people to make decisions about architecture, standards, technology, etc. But should not be left alone to make decisions to determine how IT impacts the organization's business strategy. Where necessary, it is important that the IT department should work with non-technical management and provide them with the options and trade-offs so that senior management can make informed decisions.
3.7.1. How Much Should be Spent on IT
Many IT project do not provide any certainty on return on investment. Many executives look at industry trends and standards to determine if they are spending too much or little on IT. These companies cannot build a strategic platform because they are not adequately funded [18].
Successful companies actually determine the strategic role IT play within the organization to determine an adequate funding level to achieve the strategic business objectives. These companies have matched their spending level to their strategic objectives, not industry benchmarks [18].
3.7.2. Which Business Process should Receive our IT Dollars
In many organizations, there are as many ideas as there are people in an organization. Naturally, not all of them are equally important. It is essential that senior management prioritise IT projects so that projects with a significant impact on the company's success and those that provide the biggest benefit are at the top [18].
If this does not happen, the IT department will try to do everything and complete nothing [18].
3.7.3. Which IT Capabilities Need to be Companywide?
It is appealing to centralise IT capabilities and standards across an organization to ensure cost savings and strategic benefits. The centralised approach can also inhibit the flexibility of individual business units and responsiveness to certain customers, as well as create resistance from certain business units [18].
If this decision is left to IT managers, they will either choose to be very lenient or very strict with the centralization [18].
Senior management should make the decision in regards to where capabilities should be centralised and standardized and where it should be placed within a business unit, keeping in mind the crucial trade-o s and strategic business objectives [18].
3.7.4. How Good do our IT Services Really Need to be?
Product characteristics like number of features, reliability, responsiveness and data accessibility comes at a cost. Senior management must decide how much they are willing to spend on various services and features [18].
The decision regarding the appropriate levels of IT services should be made by senior business managers. Left to themselves, IT units will go for the highest service level, creating Cadillac service when a Buick will be sufficient, this is because IT units are measured on things like how many times a services goes down. It is common that the costs of higher level of services are taken into consideration for IT systems and never represented separately [18].
IT people should provide a menu of service offerings with prices to assist business to understand what they are paying for. Business managers should consult with IT managers to establish suitable levels of service at a price that the business can afford [18].
3.7.5. What Security and Privacy Risks will we Accept?
Increased security comes at a financial, inconvenience and reduced systems interoperability cost [18].
It is senior management's responsibility to decide if the need is for more security and accept the cost of their way of doing business or to attract new customers and keeping customers happy [18].
Almost all managers and executives will state that security and business flexibility is of equal importance. Once the opportunity reveals itself, the same managers will ignore security and privacy concerns to ensure the closing of a deal [1].
3.7.6. Whom Do We Blame if an IT Initiative Fails?
IT systems by themselves have no value, they need to be integrated into the business and be used by users [18].
Senior management needs to assign a business executive to take responsibility for realizing business benefits from an IT project [18].
We can build a technical elegant and architecturally pure system, but if it is not used, there is no value [18].
4. Requirements Engineering for Traditional Data Systems
When we think about traditional system, we normally think about 3 tiers system with a frontend, backend and a highly structured database [3]. For the past 40 years, this has been the way businesses and organization have been creating data systems. These system are built with a specific business purpose in mind where the data for entities are well defined with specific properties and relationship to each other.
Typical operational requirements for these systems included:
- Inserted data should be immediately available to everyone who have access to the system.
- Data entities can be queried and updated on almost all of their properties.
- Reports can be built from the data entries.
- The data in the system represent the current state correctly.
With well-defined data requirements and known relationships and attributes in the data model, it makes sense to store data using a relational database with a detailed data schema most of the time.
4.1. Traditional Data Systems - The Classical Approach
Up and until the late 90s, the waterfall approach was the dominant approach for systems engineering and implementation. The reasons for this were as follow: access to computer software and equipment was expensive and scarce, also the cost of a change to the existing data structures required extensive downtime as well as the distribution of any change in the software was prohibitively expensive. This necessitated thorough, complete and correct upfront system requirements engineering and then design that first focused to create a complete and correct data structure and when completed moved to features.
4.2. Traditional Data Systems - A More Modern Approach
From around the late 90s, system design and implementation changed to a more iterative approach as the cost of change to a data systems and the distribution cost of software fell dramatically with the availability of data systems that allow data schema changes with no or near no-impact to users of the system in production as well as the general availability of the Internet to almost everyone.
Many of the newer relational data systems started to support complex data types like blobs, XML, JSON and geospatial objects, allowing for a more dynamic schema.
With this new approach, software developers focused on adding features first and applied changes to the data schema where and when needed. Businesses capitalized on the iterative approach using lean start-up models to generated income when a product reached a minimum value proposition point [2] [12]. The Dollars today principle.
4.3. Consistency Model
Another very important requirement for many traditional systems is that of integrity and accuracy, which require a system that is consistent and stable. For example, when doing an electronic funds transfer from one account to another, all of the operations must succeed for the transaction to be successful, otherwise all the operations must be rolled back. To achieve this, the consistency model in traditional data systems use the ACID (Atomic, Consistent, Isolated, Durable) constancy model, which means that once a transaction is complete, the data in the system is consistent and stable on disk and memory [4]. The cost associated with investigating and fixing information and people not trusting the system can become prohibitively expensive.
4.4. Volume
Traditional data systems only process and store the information that it needs to fulfil its purpose, all other information is ignored. We had large databases for quite some time, the international conference on Very Large Databases has been running since 1975, but this does not make any large database a big data system [25].
It does however make sense to only store data that is actually being used as the maintenance on systems that contain unused data can be very expensive.
4.5. Velocity
Traditional data systems makes use of pessimistic data concurrency and does a lot of pro-cessing on data it receives. It is wise to only process and store only the data that is required to function in the operational environment.
4.6. What are ACID Transactions?
The ACID acronym stands for the following [4]:
- Atomic: All the operations that make a complete transaction needs to succeed, if this is not the case, the transaction needs to rollback.
- Consistent: With the completion of every transaction, the data system is in a structurally sound condition.
- Isolated: Transactions do not compete with each other to access data. Access to data is controlled by the data system to ensure that it appears as if transactions run sequentially.
- Durable: After applying a transaction, the result is permanent. This is in the event of errors.
5. Big Data Systems
There are a diverse number of definitions for the term big data, many referring to some aspect of the technology, lack of data schema, data volume, etc. In this document, we refer to big data not as a technology, standard or implementation, but as an industry term for an approach to data storage and processing. [10]
The reduced cost of data storage, processing and Internet accessibility brought new business opportunities, business models and inevitable a new set of requirements to store, process and analyse data.
Examples of these businesses include Google and Facebook which is not traditional types of businesses. In these businesses, the consumer uses services in exchange to be exposed to advertisements. The paying customers of these businesses are advertisers who pay when one of the consumers click through to their website. The most relevant advertisement for a specific users will ensure the click through. It is important for these businesses to know and understand each consumer as best they can and they do this by collecting as much data on each consumer as possible.
The requirements for these systems are very different than that of traditional data systems, especially now that the consumers are not paying for any of the provided services. For example, when a Facebook user posts a message, there is no obligation on Facebook to ensure that the post is immediately available to any or all of that user's friends.
Utilizing the features provided by many of the traditional data systems are too restrictive and do not provide any business value.
5.1. Stateless
Big data systems does not seek to primarily keep a current state of values, but to receive and append all new data to the existing data store with a date and time stamped.
Examples of these include devices installed in a motor vehicle that report a wide variety of data bits that needs to be stored.
5.2. Variety
The complexity and cost associated to build and constantly modifying a traditional data system to accommodate the storage of all the various data schema permutations it can receive is prohibitive expensive. It is also difficult to ensure that the traditional systems perform well under very high load when it needs to store and process data it was never intended to receive.
Big data systems are built with the intent to store data with a wide variety of permutations of data schemas effectively. The data that comes into the big data system is stored in a data lake [6]. The data is mapped and reduced to some general extent that make sense to allow the retrieval of data based on the time the data was generated at the device and received by the big data system. This makes the big data system better suited for storing data that may be needed or analyzed in the future.
This loosely coupled data schema means more responsibility is placed on developers to extract data and process it. This does not mean that there is no data schema at all.
One drawback of the wide variety of schemas is that it is not effective to search for data using general and complex queries.
An example where it would be appropriate to use a big data system to store data is that of a vehicle tracking company with various tracking devices. The capabilities of the tracking devices ranges from units that reports basic location data like date, time, longitude, latitude and ignition-on and -off to devices that provides a rich data set that includes accelerometer data, engine probe integration and environment data. Storing this data in a traditional system does not make sense.
5.3. Volume
In traditional data systems, the system only process and stores the data it needs to achieve its system goals. All other data is ignored. In a big data system, we want to keep as much relevant information as possible. Having enough storage capacity is important.
5.4. Velocity
Big data systems accepts all the data pushed to it, thus it needs to store data really fast.
5.5. Consistency
In big data systems, the usage of the ACID transactions are far too restrictive and pessimistic than what is required of the big data system. Scale and resilience is much more important in big data systems and favours using the BASE (Basic Availability, Soft state, Eventual consistency) consistency model.
5.6. Duplication
Data can be duplicated, missing, be out of sequence or late. In traditional data system, there will be requirement to filter out duplicate sets of data, whereas in a big data system, the duplicate data will be stored to assist in investigating issues relating to devices that create duplicate sets of data.
5.7. What are BASE Transactions?
The BASE acronym stands for the following [4]:
- Basic Available: The data system appears to be working most of the time.
- Soft-state: The data does not need to be write-consistent, nor does any of the replicas of the data system need to be in a mutually consistent state.
- Eventual consistency: The data is eventually consistent.
6. Should Big Data System Replace Traditional Systems?
Big data systems should by no means be seen as a replacement for traditional systems. We suggest that big data systems should rather coexist with other traditional systems as each data system fulfil different data storage and access requirements.
Traditional data systems and big data systems fulfil very different business and technology requirements. When applying the SOLID principles [9] to an enterprise system, it makes sense to have separate systems. This means that a traditional system will only contain and process data that it actually uses and there is no need to save interesting data or data that we may need for analysis in the future as it is already stored in the big data system.
Building a single big data system to replace one or more traditional data systems usually becomes a complexity nightmare that is difficult to optimize and manage, resulting in complexity that cannot be maintained.
7. Traditional Business Intelligence
Business intelligences is a term that refers to and includes a diverse number of approaches to analyse data and interpret it. Activities and disciplines include data processing, querying, reporting, analytics and data mining [14].
Business intelligence is mostly used for strategic decision making, cost cutting and identifying new business opportunities [14].
7.1. Data Warehousing
Traditional operational data systems contains information about the current state of a particular domain and has been built for operational requirements and undergoes frequent changes as data is inserted and updated in the data system. A very important requirement for almost all reports used for strategic decision making, is to understand the data variation over time, this time variance data is normally stored separately in a data warehouse [17].
The term "data warehouse" was first used by Bill Inmon in 1990, which provides a subject orientated, integrated, time-variant, and non-volatile sets of data. The data warehouse collects, summarize and consolidate data from various data sources on a regular interval to provide a generalized data snapshot views over time [17].
The requirements for production data systems and data warehouses differ as follow [17]:
| Date warehouse | Operational system |
Data | Historical overview | Current state |
Users | Executives, managers and analysts | Clerk and data administrators |
Usage | Strategic | Run the business |
Data | Overview - Summarized and consolidated | Primitive and detailed data |
Users | Small number of users | As many as possible users |
Queries | Complex queries | Simplistic queries |
Access | Read only access | Read and Write access |
Transactions | Does not require transactions | Require transactions |
Recovery | Does not require recovery | Require recovery |
Concurrency | Does not require concurrency controls | Require concurrency controls |
7.2. Analytics
The data analytics process can range from doing a simple report on how certain values changed over time to more complex processing like statistical analysis, complex multidimensional analysis and data mining (knowledge discovery) [17] [14]. An example of a simple report is that of customer base growth versus profit growth over a specific time period.
Data analytics is the process of discovering new and confirming existing useful and interesting patterns and relationships. This allows business to make knowledgeable decisions [8].
The data warehouse is used for all analytical queries that require historical trending.
An example of such analytics activity would be for a stock warehouse where we will need to determine the best stock levels per product is, so that we can avoid the scenario where we have to write off stock that we cannot sell or lose customers to our competitors because we cannot sufficiently supply them the specified products.
7.3. Requirements Engineering
We only have time variant data from the moment we start processing data into the data warehouse. This means it is required from us to have a thorough upfront understanding of the business we are in, the particular needs of our organization as well as our unique value proposition to our customers and other business interaction. This will allow us to anticipate important future required needs to analyse certain business aspects, estimate sufficient data storage intervals, as well as keep associated storage and other resource constraints requirements in mind.
Most traditional analytical data systems contains summarized and aggregated data, which means that we can do the following type of analytics: [25] [5]
-
Descriptive analytics
- How have we been doing?
- What was the effect of some of the decisions we have been making?
-
Forecasting analytics
- Where will we be in the foreseeable future if we carry on doing what we have been doing?
This in turn means that we need to make large business impact decisions to be able to measure the resulting effect of any applied change [25] [5]. These big decisions of course come with big risks.
8. Big Data Business Intelligence
With the access to a large variety and volume of data in big data systems, it is possible to find general trends for individual type of items, like products, people, devices, etc. With this information, we can create models for typical behaviour of these items in different settings.
There are many examples of patterns in our daily lives: people are creatures of habit, machines behave in a certain fashion before they cease to work, rain falls during a certain time, etc.
By analyzing these patterns, we can build models that allow us do the following type of analytics [25] [5]:
- Descriptive: What have the individuals done thus far?
- Predictive: How will the individual act in this particular circumstance?
- Prescriptive: What do we need to do with the individual?
An example of a prediction models can be found in the customer retention where we need to predict which customers might leave. These models can be refined to only contact those customers that we can persuade to stay on and leave those that will cancel their contract because we are contacting them [25].
With big data analytics, the focus is to take all the relevant information we have obtained and link it to as much external data as possible. External data includes map, calendar and weather data. This means that we not only have big data, but also now big permutations of data [26].
8.1. Experimentation
We can apply the models in small iterative approaches where we build models that are used in operational systems to optimize performance. Once these models are in operation, we are able to measure the effect of the change. Thus, we are able run small controlled experiments in our live environment and measure the outcome. With this approach, we can allow the accumulative effect of many positive changes make a big impact. Example, we can measure the outcome of an advertisement, campaign, etc.
We can even do experiments on a specific population of people by measuring the impact we had on them and monitor their behaviour in respect to a control group on which we did not do the experiment, confirming that what we did was the cause of the effect or not.
Big data analytics cannot show in all cases that one thing causes another, but it can show that there is a link. Using this relational insights in conjunction with other research projects may prove helpful in finding causation.
8.2. Model Accuracy and Consistency
The models build from analytical data does not always need to be 100% correct all of the time. One stock market trader mentioned through his own analysis, his models only need to be correct 50% of the time to yield an average 80% pro t [7].
A lot of time and computation effort goes into finding models with a high accuracy. It is important to understand at what point the model is sufficient to work with or when we should stop improving the model. We stop improving the model as the value obtained by improving the model further is minimal or we may miss an opportunity to use the model. Another reason to stop improving the predictive model is when the model is memorizing all past events and not creating a general prediction model. The predictions created by models on average are normally better and more constant than the predictions made by people [26] [25].
It is not always possible to provide absolute value and truth, because the value and truth many times depend on how you measure it. Many business are more interested in consistent models to ensure consistent measurements and results. This keeps investors and regulatory bodies at ease and businesses running [26].
8.3. Change
We are living in a constantly changing world. The effect of a lobby process by a consumers' rights groups can change the way people shop and signup for service. Example, in Europe, when it became law that consumers had the right to port their mobile phone numbers from one GSM mobile phone service provider to another, many of the previous methods used to reduce customer churn used by Telenor did not work anymore. They created new models for this new environment [25].
We need to ensure that our predictive models are regularly updated to be current.
8.4. Big Data Analytics Out flow
It is possible that we build new businesses as spin-offs from the data models we build.
8.5. Data Warehouse
Data from the big data systems is also written into a data warehouse to save time querying aggregated data. The really great thing about big data systems are that they contain a historical and detail set of information, meaning that we are able to rebuild the data sets in the data warehouse, if they are incorrect.
Due to this, we do not need to have any aggregated data available in the data warehouse for just in case we need it. This means that we can take a more iterative approach in our requirement engineering for data analytics to only build go through the process when there is a need for it.
8.6. Market Research
Tradition market research recorded people's opinions and then tried to determine the correlation between their opinions to what they are actually doing.
Traditional market research has to a large extent been replaced with experimentation combined with big data business intelligence systems to create a new market research platform. This platform allows researchers to measure the impact of any small change in approach to customers.
This also places researchers in a position where they can see what works and what not effectively in different environments.
8.7. Engineering Process
Some companies like MineRP use big data business intelligence throughout the engineering process. They take business requirements and overlay it on sensor, safety, regulatory, legal, and other mining data to extract mining requirements, example, how do we need to mine to extract maximum profit over 20 years. These requirements are used as input into another business intelligence process that does the design and scheduling using design models and experience from other mines. The sensor and measurement data of the actual mining process is fed back into the big data system to create a planned versus actual model. This experience data is again used on other projects [26].
8.8. Privacy and Ethics
Tracing and understanding people's behaviour has become incredibly easy with the publicly available meta-data as well as corporate collected data. [19].
More emphasis has been placed on company's moral obligation to secure data and to use it in an ethical fashion [25] [5].
The private data collected from individuals are so valuable to companies that these companies are willing to provide a multitude of services to these individuals free of charge. Most individuals do not understand the consequences of exchanging their personal information for free services and there are almost no legal boundaries that limit the usage and exploitation of this data [13].
8.9. Change in the Workplace
Big data business intelligence changed the way we are doing business and employ people. Uber can be thought of as a digital boss or supervisor where a software layer has been added to an occupation [16].
In our modern day, people are employed or not employed based on data analysis using online social media platforms taking into account their online activities and associations [16].
8.10. Affect Our Everyday lives
Algorithms that are being used by big data business intelligence collecting social media data to determine our credit rating, interest rate when we borrow money, if we are fit for a job, etc. Unfortunately these algorithms is not always fair and it is an upward battle to correct mistakes made [16]. In effect, human judgements are replaced with algorithms [23].
There are many unregulated data brokers that create profiles of us without our consent, or without us being aware of it. We cannot even correct or review the data, affecting our ability to rent, borrow, etc. [16].
8.11. Audit Trails
Big data business intelligence can be used to forensically recreate any event from information like text files, tracing devices and even photos [20].
8.12. Fraudulent Data
Big data companies like Google collects data from various data sources. One of these sources are crowd source like Google's My Business and Map Maker [22].
One example of such a scam is where individuals and businesses utilize big data platforms for personal gain, these scammers convince Google that their businesses exists at a number of locations although in reality it does not exists [22].
When a user searches for a local business, the scammers will quote a low price. When the business shows up to do the work, they charge much more. Google does not have a business incentive to x the issue entirely as it is not currently losing money from the scammers [22].
Reputation is an important social means that we use to trust each other in our society. The way we manage reputation these days involves technology. We use the review from systems like eBay, Amazon, Uber, Facebook, etc. It has become incredibly easy to game these systems to boost their reputation and perceived credibility. One way of increasing reputation is by creating or buying fake followers by companies who create them, then using these followers to write positive reviews [24].
9. How are Requirements Engineering the Same for Traditional and Big Data Business Intelligence?
The purpose of requirements engineering is to understand the opportunity or problem we are facing. The requirements engineering process depends on a number of factors like company culture, costs, risks technology, etc. example, can we deliver in a number of small phases or do we only have one opportunity to get everything working. Once we understand the opportunity or problem, we will know if it can or should be solved using traditional analytics, big data analytics or a combination of both.
9.1. Business Case
Many projects start with enthusiasm because of a great idea that provides value to the customer. Unfortunately, the customer is not always willing to pay for the product or service, meaning the project fails.
The requirements engineer needs to work with the business to create products and services that serves the client's need for which they are willing to pay.
9.2. Data Analytics Require People
Although many advertisements for data analytic systems may lead many to believe that analyzing the data is sufficient, it is not the case. The data is created and managed by people and requires specialized skills. These skills include business people with a deep understanding of the organization business model, product managers with a good understanding of the product being sold, sales and marketing people that understand the customer they are selling to as well as technical people that know how to translate business requirements into a technology solution.
Only people can make strategic business and technology decisions as well as apply wisdom, data and algorithms cannot do that by themselves. Computer algorithms, at best, can assist in providing a good answer to a question, but it cannot tell us that we are not asking the right questions.
9.3. Time Consuming
Collecting and preparing data for the data analysis as well as confirming the results of the analysis is most always very time consuming because of the volume of data that needs to be worked through [27].
It is therefore important to follow the requirements engineering process to understand what the business is doing and what type of analytical results will provide actual business value.
9.4. Data Centric vs Business Centric
With a data centric approach, we may obtain results that make sense from a data point of view but not from a business point of view. Example, we may tell a mine that the majority of their accidents happen underground. We may present information already known and just interesting but of no business value [27].
Data by itself is not worth much without a business plan. Companies that collect and store data readily available, expose their customers to data leak risk and themselves to serious security and accountability risks [21].
We need to collaborate with business to first understand what information will provide business value and makes sense [27].
10. How Do Requirements Engineering for Traditional and Big Data Business Intelligence Differ?
The requirements engineering process for the business intelligence processes on traditional data systems and big data systems differ on business models, granularity and iterations length.
10.1. Representation of Reality
Traditional data systems, in essence, model the current reality on a one to one basis. Every object and action is modelled in the system with one entry in the storage, it is thus a requirement that the data is kept consistent and correct with the current state of reality.
Big data systems collect as much possible data as it can on objects and actions and provide to provide a historical view. The main objective is to model behaviour which allows us to predict future behaviour.
10.2. Granularity
With big data systems, we have fine granular detailed data available from which we can build models and predict behaviours for individual types of people or objects. With traditional data systems and warehousing, we only have general trending data, which means that we can only provide a general forecast per type of product or person [25] [5].
10.3. Iterations
10.3.1. Traditional Data Systems
The iteration lengths from applying a change in a business environment until we are able to measure the effect is many days to weeks or months. Implementing long term strategic changes requires a well experienced and skill business person with a good gut feeling of the changes he or she is about to apply in the business without any measurement immediately available.
It takes vision and perseverance to be a successful business owner only utilizing traditional data systems.
10.3.2. Big Data Business Intelligence
Big data technology and business intelligence brought with it a number of new opportunities and challenges in terms of requirements engineering. One of the opportunities big data systems brought with it is allowing for very small iterations of product and software development and deployment, where we can take ideas into production in a matter of hours and measure the effect almost immediately.
This shorter iterations and concept of fail fast has the disadvantage that a great idea may be discarded, when actually a small amount of commitment over an extended time period is required to ensure the success of the idea.
Many people have the false idea that shorter iterations means less planning. Designing and implementing the business around the technology in many cases still require a waterfall type of approach. Example, when we create a fitness service and provide users with a wearable device like a smart watch, ensuring that the devices are available does takes some planning.
The shorter iterations do have the advantage that we can release a product when we have reached the minimal value proposition of a product, meaning that we have a shorter time to market and collect user feedback. We also minimize our risk by focusing the development effort on what users are actually using and are willing to pay for, than spending effort on nice to have or nice to know features.
10.4. Business Models
The business models for business intelligence using traditional data systems and that of big data differs quite dramatically. Traditional businesses models will require from us to know the current state of our business and the information needs to be correct, we will use traditional business intelligence systems, example, who is currently in arrears with the account or which of my vehicles should be serviced. Predictive business models that requires from us to make predictions based on passed information, example, which customers will buy our product, who will churn, which transaction are likely to be fraudulent, etc. The data collected is worth more than the service provided [13].
Traditional business intelligence systems normally form part of a primary income generating products and services. It is normally from these systems that we create invoices from. Example, companies like Google will create an invoice for actual click through events.
Predictive businesses are used to optimize and streamline traditional businesses. In many instances, the data collected is worth so much that many of these businesses are willing to give services and devices to people for free, just to be able to collect and analyse the data. We are truly moving into a data economy [11].
11. Conclusion
Requirements engineering for any system does not need to be a long, tedious and complex process. Done right, we can, within a few short days, determine what is expected to realize business value. Requirements engineering enables all the stakeholders to work together towards developing the same service or product. The result is a consistent product or service that is being marketed, sold, implemented and supported.
The business maturity, business model, corporate culture, technology capability, risks and opportunities, drives the type of requirements engineering process we select to use [11].
The requirements engineering for traditional and big data business intelligence systems share many commonalities and also differ in many aspects.
12. References
- [1] Balabit. Balabit CSI Report. In: (2015).
URL: https://pages.balabit.com/rs/855-UZV-853/images/Balabit-CSI-Survey.pdf (visited on 04/01/2016). - [2] Stave Banks. Why the Lean Start-Up Changes Everything. In: (May 2013).
URL: http://host.uniroma3.it/facolta/economia/db/materiali/insegnamenti/611_8959.pdf - [3] H. M. Chen et al. \Big Data System Development: An Embedded Case Study with a Global Outsourcing Firm. In: Big Data Software Engineering (BIGDSE), 2015 IEEE/ACM 1st International Workshop on. May 2015, pp. 44{50. doi: 10.1109/BIGDSE.2015.15.
- [4] John Cook. ACID versus BASE for database transactions. 2009.
URL: http://www.johndcook.com/blog/2009/07/06/brewer-cap-theorem-base/ (visited on 04/20/2016). - [5] Thomas Davenport. Big data at work: Dispelling the myths, uncovering the opportunities. Boston Massachusetts: Harvard Business Press, 2014. ISBN: 1422168166.
- [6] Marin Fowler. DataLake. In: (2015).
URL: http://martinfowler.com/bliki/DataLake.html (visited on 04/20/2016). - [7] D Garnet-Benet. Interview with D Garnet-Benet (Solutions Architect, Consultant). Feb. 17, 2016.
- [8]Lynn Greiner. What is Data Analysis and Data Mining. 2011.
URL: http://www.dbta.com/editorial/trends-and-applications/what-is-data-analysis-and-datamining-73503.aspx (visited on 04/20/2016). - [9] W. Haoyu and Z. Haili. Basic Design Principles in Software Engineering. In: Computa-tional and Information Sciences (ICCIS), 2012 Fourth International Conference on. Aug. 2012, pp. 1251{1254. doi: 10.1109/ICCIS.2012.91.
- [10] D Ives. Interview with D Ives (Director, Karabina). Jan. 7, 2016.
- [11] Q de Kok. Interview with Q de Kok (Product Development Manager, Altech Netstar).
- [12] Marko Lepp•anen and Laura Hokkanen. Four Patterns for Internal Startups. In: Pro-ceedings of the 20th European Conference on Pattern Languages of Programs. EuroPLoP '15. Kaufbeuren, Germany: ACM, 2015, 5:1{5:10. isbn: 978-1-4503-3847-9. doi: 10.1145/ 2855321.2855327.
URL: http://0-doi.acm.org.innopac.up.ac.za/10.1145/2855321.2855327. - [13] Rebecca Lipman. Online privacy and the invisible market for our data.
URL: http://poseidon01.ssrn.com/delivery.php?ID=710087064097088098072117012077082122015017&EXT=pdf (visited on 04/20/2016). - [14] Ryan Mulcahy. Business Intelligence De nition and Solutions. 2007.
URL: http://www.cio.com/article/2439504/business-intelligence/business-intelligence-definition-and-solutions.html (visited on 04/20/2016). - [15] University of New South Wales Department of Engineering. What is Engineering. 2016.
URL: http://www.engineering.unsw.edu.au/about-us/what-is-engineering (visited on 04/20/2016). - [16] Frank Pasquale. Algorithms are producing profiles of you. What do they say? You probably don't have the right to know. Aug. 2015.
URL: https://aeon.co/essays/judge-jury-and-executioner-the-unaccountable-algorithm (visited on 04/20/2016). - [17] Tutorials Point. Data Warehousing - Overview.
URL: http://www.tutorialspoint.com/dwh/dwh_overview.htm (visited on 04/20/2016). - [18] Jeanne W Ross and Peter Weill. Six IT decisions your IT people shouldn't make ". In: (Nov. 2002). URL: https://www.researchgate.net/profile/Peter_Weill/publication/11043243_Six_IT_decisions_your_IT_people_shouldn't_make/links/00b49518ae03f7653a000000.pdf
- [19] Bruce Schneier. Bruce Schneier Blogs.
URL: http://www.schneier.com (visited on 04/20/2016). - [20] Bruce Schneier. Cheating in Marathon Running. Apr. 2016.
URL: https://www.schneier.com/blog/archives/2016/04/cheating_in_mar.html (visited on 04/20/2016). - [21] Bruce Schneier. Data Is a Toxic Asset. Mar. 2016.
URL: https://www.schneier.com/blog/archives/2016/03/data_is_a_toxic.html visited on 04/20/2016). - [22] Bruce Schneier. Exploiting Google Maps for Fraud. Feb. 2016.
URL: https://www.schneier.com/blog/archives/2016/02/exploiting_goog.html (visited on 04/20/2016). - [23] Bruce Schneier. Replacing Judgment with Algorithms. Feb. 2016.
URL: https://www.schneier.com/blog/archives/2016/01/replacing_judgm.html (visited on 04/20/2016). - [24] Bruce Schneier. Reputation in the Information Age. Nov. 2015.
URL: https://www.schneier.com/blog/archives/2015/11/reputation_in_t.html (visited on 04/20/2016). - [25] Eric Siegel. Predictive analytics: the power to predict who will click, buy, lie, or die. Hoboken, N.J: Wiley, 2013. ISBN: 1118356853.
- [26] E Strydom. Interview with E Strydom (Vise President Sales, MineRP).
- [27] P Vermaak. Interview with P Vermaak (SAP data analytics consultant). Nov. 15, 2015.
- [28] Ralph Young. The requirements engineering handbook. Boston: Artech House, 2004. ISBN: 978-1580532662.
I a software development manager with 20 years of experience building enterprise systems. I have a passion working with people and technology, this ensure that the software created is working as expected and on time.
I love the outdoors, especially with my wife and children.
I have a master's degree in phycology and I am busy with my master's degree in computer science.