Big Data Research Papers 2014 Dodge
Xiaolong Jin | Benjamin W. Wah | Xueqi Cheng | Yuanzhuo Wang
© 2015 Elsevier Inc. In recent years, the rapid development of Internet, Internet of Things, and Cloud Computing have led to the explosive growth of data in almost every industry and business area. Big data has rapidly developed into a hot topic that attracts extensive attention from academia, industry, and governments around the world. In this position paper, we first briefly introduce the concept of big data, including its definition, features, and value. We then identify from different perspectives the significance and opportunities that big data brings to us. Next, we present representative big data initiatives all over the world. We describe the grand challenges (namely, data complexity, computational complexity, and system complexity), as well as possible solutions to address these challenges. Finally, we conclude the paper by presenting several suggestions on carrying out big data projects.
Jae Gil Lee | Minseo Kang
© 2015 Elsevier Inc. Geospatial big data refers to spatial data sets exceeding capacity of current computing systems. A significant portion of big data is actually geospatial data, and the size of such data is growing rapidly at least by 20% every year. In this paper, we explore the challenges and opportunities which geospatial big data brought us. Several case studies are introduced to show the importance and benefits of the analytics of geospatial big data, including fuel and time saving, revenue increase, urban planning, and health care. Then, we introduce new emerging platforms for sharing the collected geospatial big data and for tracking human mobility via mobile devices. The researchers in academia and industry have spent a lot of efforts to improve the value of geospatial big data as well as take advantage of its value. Along the same line, we present our current research activities toward the analytics of geospatial big data, especially on interactive analytics of real-time or dynamic data.
Omar Y. Al-Jarrah | Paul D. Yoo | Sami Muhaidat | George K. Karagiannidis | Kamal Taha
© 2015 Elsevier Inc. With the emerging technologies and all associated devices, it is predicted that massive amount of data will be created in the next few years - in fact, as much as 90% of current data were created in the last couple of years - a trend that will continue for the foreseeable future. Sustainable computing studies the process by which computer engineer/scientist designs computers and associated subsystems efficiently and effectively with minimal impact on the environment. However, current intelligent machine-learning systems are performance driven - the focus is on the predictive/classification accuracy, based on known properties learned from the training samples. For instance, most machine-learning-based nonparametric models are known to require high computational cost in order to find the global optima. With the learning task in a large dataset, the number of hidden nodes within the network will therefore increase significantly, which eventually leads to an exponential rise in computational complexity. This paper thus reviews the theoretical and experimental data-modeling literature, in large-scale data-intensive fields, relating to: (1) model efficiency, including computational requirements in learning, and data-intensive areas' structure and design, and introduces (2) new algorithmic approaches with the least memory requirements and processi ng to minimize computational cost, while maintaining/improving its predictive/classification accuracy and stability.
Pekka Pääkkönen | Daniel Pakkala
© 2015 The Authors. Many business cases exploiting big data have been realised in recent years; Twitter, LinkedIn, and Facebook are examples of companies in the social networking domain. Other big data use cases have focused on capturing of value from streaming of movies (Netflix), monitoring of network traffic, or improvement of processes in the manufacturing industry. Also, implementation architectures of the use cases have been published. However, conceptual work integrating the approaches into one coherent reference architecture has been limited. The contribution of this paper is technology independent reference architecture for big data systems, which is based on analysis of published implementation architectures of big data use cases. An additional contribution is classification of related implementation technologies and products/services, which is based on analysis of the published use cases and survey of related work. The reference architecture and associated classification are aimed for facilitating architecture design and selection of technologies or commercial solutions, when constructing big data systems.
Panagiotis D. Diamantoulakis | Vasileios M. Kapinas | George K. Karagiannidis
© 2015 Elsevier Inc. The smart electricity grid enables a two-way flow of power and data between suppliers and consumers in order to facilitate the power flow optimization in terms of economic efficiency, reliability and sustainability. This infrastructure permits the consumers and the micro-energy producers to take a more active role in the electricity market and the dynamic energy management (DEM). The most important challenge in a smart grid (SG) is how to take advantage of the users' participation in order to reduce the cost of power. However, effective DEM depends critically on load and renewable production forecasting. This calls for intelligent methods and solutions for the real-time exploitation of large volumes of data generated by the vast amount of smart meters. Hence, robust data analytics, high performance computing, efficient data network management, and cloud computing techniques are critical towards the optimized operation of SGs. This research aims to highlight the big data issues and challenges faced by the DEM employed in SG networks. It also provides a brief description of the most commonly used data processing methods in the literature, and proposes a promising direction for future research in the field.
Tao Huang | Liang Lan | Xuexian Fang | Peng An | Junxia Min | Fudi Wang
© 2015 Elsevier Inc. With the development of smart devices and cloud computing, more and more public health data can be collected from various sources and can be analyzed in an unprecedented way. The huge social and academic impact of such developments caused a worldwide buzz for big data. In this review article, we summarized the latest applications of Big Data in health sciences, including the recommendation systems in healthcare, Internet-based epidemic surveillance, sensor-based health conditions and food safety monitoring, Genome-Wide Association Studies (GWAS) and expression Quantitative Trait Loci (eQTL), inferring air quality using big data and metabolomics and ionomics for nutritionists. We also reviewed the latest technologies of big data collection, storage, transferring, and the state-of-the-art analytical me thods, such as Hadoop distributed file system, MapReduce, recommendation system, deep learning and network Analysis. At last, we discussed the future perspectives of health sciences in the era of Big Data.
Shaokun Fan | Raymond Y.K. Lau | J. Leon Zhao
© 2015 Elsevier Inc. Big data analytics have been embraced as a disruptive technology that will reshape business intelligence, which is a domain that relies on data analytics to gain business insights for better decision-making. Rooted in the recent literature, we investigate the landscape of big data analytics through the lens of a marketing mix framework in this paper. We identify the data sources, methods, and applications related to five important marketing perspectives, namely people, product, place, price, and promotion, that lay the foundation for marketing intelligence. We then discuss several challenging research issues and future directions of research in big data analytics and marketing related business intelligence in general.
H. V. Jagadish
© 2015 Elsevier Inc. As Big Data inexorably draws attention from every segment of society, it has also suffered from many characterizations that are incorrect. This article explores a few of the more common myths about Big Data, and exposes the underlying truths.
Hongbo Zou | Yongen Yu | Wei Tang | Hsuan Wei Michelle Chen
© 2014 Elsevier Inc.. Increasingly larger scale applications are generating an unprecedented amount of data. However, the increasing gap between computation and I/O capacity on High End Computing machines makes a severe bottleneck for data analysis. Instead of moving data from its source to the output storage, in-situ analytics processes output data while simulations are running. However, in-situ data analysis incurs much more computing resource contentions with simulations. Such contentions severely damage the performance of simulation on HPE. Since different data processing strategies have different impact on performance and cost, there is a consequent need for flexibility in the location of data analytics. In this paper, we explore and analyze several potential data-analytics placement strategies along the I/O path. To find out the best strategy to reduce data movement in given situation, we propose a flexible data analytics (FlexAnalytics) framework in this paper. Based on this framework, a FlexAnalytics prototype system is developed for analytics placement. FlexAnalytics system enhances the scalability and flexibility of current I/O stack on HEC platforms and is useful for data pre-processing, runtime data analysis and visualization, as well as for large-scale data transfer. Two use cases - scientific data compression and remote visualization - have been applied in the study to verify the performance of FlexAnalytics. Experimental results demonstrate that FlexAnalytics framework increases data transition bandwidth and improves the application end-to-end transfer performance.
Chia Wei Lee | Kuang Yu Hsieh | Sun Yuan Hsieh | Hung Chang Hsiao
© 2014 Elsevier Inc.. Cloud computing is a type of parallel distributed computing system that has become a frequently used computer application. MapReduce is an effective programming model used in cloud computing and large-scale data-parallel applications. Hadoop is an open-source implementation of the MapReduce model, and is usually used for data-intensive applications such as data mining and web indexing. The current Hadoop implementation assumes that every node in a cluster has the same computing capacity and that the tasks are data-local, which may increase extra overhead and reduce MapReduce performance. This paper proposes a data placement algorithm to resolve the unbalanced node workload problem. The proposed method can dynamically adapt and balance data stored in each node based on the computing capacity of each node in a heterogeneous Hadoop cluster. The proposed method can reduce data transfer time to achieve improved Hadoop performance. The experimental results show that the dynamic data placement policy can decrease the time of execution and improve Hadoop performance in a heterogeneous cluster.
Yiming Qin | Hari Krishna Yalamanchili | Jing Qin | Bin Yan | Junwen Wang
© 2015 Elsevier Inc. DNA, RNA and protein are three major kinds of biological macromolecules with up to billions of basic elements in such biological organisms as human or mouse. They function at molecular, cellular and organismal levels individually and interactively. Traditional assays on such macromolecules are largely experimentally based, which are usually time consuming and laborious. In the past few years, high-throughput technologies, such as microarray and next-generation sequencing (NGS), were developed. Consequently, large genomic datasets are being generated and computational tools to analyzing these data are in urgent demand. This paper reviews several state-of-the-art high-throughput methodologies, representative projects, available databases and bioinformatics tools at different molecular levels. Finally, challenges and perspectives in processing genomic big data are discussed.
Kostas Kolomvatsos | Christos Anagnostopoulos | Stathes Hadjiefthymiades
© 2015 Elsevier Inc.. Big data analytics is the key research subject for future data driven decision making applications. Due to the large amount of data, progressive analytics could provide an efficient way for querying big data clusters. Each cluster contains only a piece of the examined data. Continuous queries over these data sources require intelligent mechanisms to result the final outcome (query response) in the minimum time with the maximum performance. A Query Controller (QC) is responsible to manage continuous/sequential queries and return the final outcome to users or applications. In this paper, we propose a mechanism that can be adopted by the QC. The proposed mechanism is capable of managing partial results retrieved by a number of processors each one responsible for each cluster. Each processor executes a query over a specific cluster of data. Our mechanism adopts two sequential decision making models for handling the incoming partial results. The first model is based on a finite horizon time-optimized model and the second one is based on an infinite horizon optimally scheduled model. We provide mathematical formulations for solving the discussed problem and present simulation results. Through a large number of experiments, we reveal the advantages of the proposed models and give numerical results comparing them with a deterministic model. These results indicate that the proposed models can efficiently reduce the required time for returning the final outcome to the user/application while keeping the quality of the aggregated result at high levels.
Mohammadhossein Barkhordari | Mahdi Niamanesh
© 2015 Elsevier Inc. Healthcare network information growth follows an exponential pattern, and current database management systems cannot adequately manage this huge amount of data. It is necessary to use a "big data" solution for healthcare problems. One of the most important problems in healthcare is finding Patient Similarity (PaSi). Current methods for finding PaSi are not adaptive and do not support all data sources, nor can they fulfill user requirements for a query tool. In this paper, we propose a scalable and distributable method to solve PaSi problems over MapReduce architecture. ScaDiPaSi, supports storage and retrieval of all kinds of data sources in a timely manner. The dynamic nature of the proposed method helps users to define conditions on all entered fields. Our evaluation shows that we can use this method with high confidence and low execution time.
Sherif Sakr | Amal Elgammal
© 2016 Elsevier Inc. With the increasing volumes of information gathered via patient monitoring systems, physicians have been put on increasing pressure for making sophisticated analytical decisions that exploit the various types of data that is being gathered per patient. This phenomenon of continuously growing datasets is arising and gaining momentum in several application domains to what is now recognized in the business community as the Big Data challenge. In this article, we define and discuss some of the major challenges in the healthcare systems which can be effectively tackled by the recent advancement in ICT technologies. In particular, we focus on sensing technologies, cloud of computing, internet-of-things and big data analytics systems as emerging technologies which are made possible by the remarkable progress in various aspects including network communication speed, computational capabilities and data storage capacities that provide various advantages and characteristics that can contribute towards improving the efficiency and effectiveness of healthcare services. In addition, we describe the architectural components of our proposed framework, SmartHealth, for big data analytics services and describe its various applications in the healthcare domain.
Pietro Colombo | Elena Ferrari
© 2015 Elsevier Inc.. Big Data is an emerging phenomenon that is rapidly changing business models and work styles . Big Data platforms allow the storage and analysis of high volumes of data with heterogeneous format from different sources. This integrated analysis allows the derivation of properties and correlations among data that can then be used for a variety of purposes, such as making predictions that can profitably affect decision processes. As a matter of fact, nowadays Big Data analytics are generally considered an asset for making business decisions. Big Data platforms have been specifically designed to support advanced form of analytics satisfying strict performance and scalability requirements. However, no proper consideration has been devoted so far to data protection. Indeed, although the analyzed data often include personal and sensitive information, with relevant threats to privacy implied by the analysis, so far Big Data platforms integrate quite basic form of access control, and no support for privacy policies. Although the potential benefits of data analysis are manifold, the lack of proper data protection mechanisms may prevent the adoption of Big Data analytics by several companies. This motivates the fundamental need to integrate privacy and security awareness into Big Data platforms. In this paper, we do a first step to achieve this ambitious goal, discussing research issues related to the definition of a framework that supports the integration of privacy aware access control features into existing Big Data platforms.
Quan Zou | Sifa Xie | Ziyu Lin | Meihong Wu | Ying Ju
© 2016 Elsevier Inc. Classification with imbalanced class distributions is a major problem in machine learning. Researchers have given considerable attention to the applications in many real-world scenarios. Although several works have utilized the area under the receiver operating characteristic (ROC) curve to select potentially optimal classifiers in imbalanced classifications, limited studies have been devoted to finding the classification threshold for testing or unknown datasets. In general, the classification threshold is simply set to 0.5, which is usually unsuitable for an imbalanced clas sification. In this study, we analyze the drawbacks of using ROC as the sole measure of imbalance in data classification problems. In addition, a novel framework for finding the best classification threshold is proposed. Experiments with SCOP v.1.53 data reveal that, with the default threshold set to 0.5, our proposed framework demonstrated a 20.63% improvement in terms of F-score compared with that of more commonly used methods. The findings suggest that the proposed framework is both effective and efficient. A web server and software tools are available via http://datamining.xmu.edu.cn/prht/ or http://prht.sinaapp.com/.
Lee Sael | Inah Jeon | U. Kang
© 2015 Elsevier Inc. Tensors, or multi dimensional arrays, are receiving significant attention due to the various types of data that can be modeled by them; examples include call graphs (sender, receiver, time), knowledge bases (subject, verb, object), 3-dimensional web graphs augmented with anchor texts, to name a few. Scalable tensor mining aims to extract important patterns and anomalies from a large amount of tensor data. In this paper, we provide an overview of scalable tensor mining. We first present main algorithms for tensor mining, and their scalable versions. Next, we describe success stories of using tensors for interesting data mining problems including higher order web analysis, knowledge base mining, network traffic analysis, citation analysis, and sensor data analysis. Finally, we discuss interesting future research directions for scalable tensor mining.
Xiaoyong Li | Yijie Wang | Xiaoling Li | Xiaowei Wang | Jie Yu
© 2014 Elsevier Inc. The skyline query as an important aspect of big data management, has received considerable attention from the database community, due to its importance in many applications including multi-criteria decision making, preference answering, and so forth. Moreover, the uncertain data from many applications have become increasing distributed, which makes the central assembly of data at one location for storage and query infeasible and inefficient. The lack of global knowledge and the computational complexity derived from the introduction of the data uncertainty make the skyline query over distributed uncertain data extremely chall enging. Although many efforts have addressed the skyline query problem over various distributed scenarios, existing studies still lack the approaches to efficiently process the query. In this paper, we extensively study the distributed probabilistic skyline query problem and propose an efficient approach GDPS to address the problem with an optimized iterative feedback mechanism based on the grid summary. Furthermore, many strategies for further optimizing the query are also proposed, including the optimization strategies for the local pruning, tuple selecting and the server pruning. Extensive experiments on real and synthetic data sets have been conducted to verify the effectiveness and efficiency of our approach by comparing with the state-of-the-art approaches.
Tian Guo | Thanasis G. Papaioannou | Karl Aberer
© 2014 Elsevier Inc. As the number of sensors that pervade our lives increases (e.g., environmental sensors, phone sensors, etc.), the efficient management of massive amount of sensor data is becoming increasingly important. The infinite nature of sensor data poses a serious challenge for query processing even in a cloud infrastructure. Traditional raw sensor data management systems based on relational databases lack scalability to accommodate large-scale sensor data efficiently. Thus, distributed key-value stores in the cloud are becoming a prime tool to manage sensor data. Model-view sensor data management, which stores the sensor data in the form of modeled segments, brings the additional advantages of data compression and value interpolation. However, currently there are no techniques for indexing and/or query optimization of the model-view sensor data in the cloud; full table scan is needed for query processing in the worst case. In this paper, we propose an innovative index for modeled segments in key-value stores, namely KVI-index. KVI-index consists of two interval indices on the time and sensor value dimensions respectively, each of which has an in-memory search tree and a secondary list materialized in the key-value store. Then, we introduce a KVI-index-Scan-MapReduce hybrid approach to perform efficient query processing upon modeled data streams. As proved by a series of experiments at a private cloud infrastructure, our approach outperforms in query-response time and index-updating efficiency both Hadoop-based parallel processing of the raw sensor data and multiple alternative indexing approaches of model-view data.
Zhijiang Chen | Guobin Xu | Vivek Mahalingam | Linqiang Ge | James Nguyen | Wei Yu | Chao Lu
© 2015 Elsevier Inc. Critical infrastructure systems perform functions and missions that are essential for our national economy, health, and security. These functions are vital to commerce, government, and society and are closely interrelated with people's lives. To provide highly secured critical infrastructure systems, a scalable, reliable and robust threat monitoring and detection system should be developed to efficiently mitigate cyber threats. In addition, big data from threat monitoring systems pose serious challenges for cyber operations because an ever growing number of devices in the system and the amount of complex monitoring data collected from critical infrastructure systems require scalable methods to capture, store, manage, and process the big data. To address these challenges, in this paper, we propose a cloud computing based network monitoring and threat detection system to make critical infrastructure systems secure. Our proposed system consists of three main components: monitoring agents, cloud infrastructure, and an operation center. To build our proposed system, we use both Hadoop MapReduce and Spark to speed up data processing by separating and processing data streams concurrently. With a real-world data set, we conducted real-world experiments to evaluate the effectiveness of our developed network monitoring and threat detection system in terms of network monitoring, threat detection, and system performance. Our empirical data indicates that the proposed system can efficiently monitor network activities, find abnormal behaviors, and detect network threats to protect critical infrastructure systems.
If you are looking for some of the most influential research papers that revolutionised the way how we gather, aggregate, analyze and store increasing volumes of data in a short span of 10 years, you are in the right place! These papers were shortlisted, based on recommendations by big data enthusiasts and experts around the globe from various social media channels. In case we’ve missed out any important paper, please let us know.
MapReduce: Simplified Data Processing on Large Clusters
This paper presents MapReduce, a programming model and its implementation for large-scale distributed clusters. The main idea is to have a general execution model for codes that need to process a large amount of data over hundreds of machines.
The Google File System
It presents Google File System, a scalable distributed file system for large distributed data-intensive applications, which provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients.
Bigtable: A Distributed Storage System for Structured Data
This paper presents the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and the design and implementation of Bigtable.
Dynamo: Amazon’s Highly Available Key-value Store
This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon’s core services use to provide an “always-on” experience.
The Chubby lock service for loosely-coupled distributed systems
Chubby is a distributed lock service; it does a lot of the hard parts of building distributed systems and provides its users with a familiar interface (writing files, taking a lock, file permissions). The paper describes it, focusing on the API rather than the implementation details.
Chukwa: A large-scale monitoring system
This paper describes the design and initial implementation of Chukwa, a data collection system for monitoring and analyzing large distributed systems. Chukwa is built on top of Hadoop, an open source distributed filesystem and MapReduce implementation, and inherits Hadoop’s scalability and robustness.
Cassandra – A Decentralized Structured Storage System
Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of failure.
HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads
There are two schools of thought regarding what technology to use for data analysis. Proponents of parallel databases argue that the strong emphasis on performance and efficiency of parallel databases makes them well-suited to perform such analysis. On the other hand, others argue that MapReduce-based systems are better suited due to their superior scalability, fault tolerance, and flexibility to handle unstructured data. This paper explores the feasibility of building a hybrid system.
S4: Distributed Stream Computing Platform.
This paper outlines the S4 architecture in detail, describes various applications, including real-life deployments, to show that the S4 design is surprisingly flexible and lends itself to run in large clusters built with commodity hardware.
Dremel: Interactive Analysis of Web-Scale Datasets
This paper describes the architecture and implementation of Dremel, a scalable, interactive ad-hoc query system for analysis of read-only nested data, and explains how it complements MapReduce-based computing.
Large-scale Incremental Processing Using Distributed Transactions and Notifications
Percolator is a system for incrementally processing updates to a large data set, and deployed it to create the Google web search index. This indexing system based on incremental processing replaced Google’s batch-based indexing system.
Pregel: A System for Large-Scale Graph Processing
This paper presents a computational model suitable to solve many practical computing problems that concerns large graphs.
Spanner: Google’s Globally-Distributed Database
It explains about Spanner, Google’s scalable, multi-version, globally-distributed, and synchronously-replicated database. It is the first system to distribute data at global scale and sup-port externally-consistent distributed transactions.
Shark: Fast Data Analysis Using Coarse-grained Distributed Memory
Shark is a research data analysis system built on a novel coarse-grained distributed shared-memory abstraction. Shark marries query processing with deep data analysis, providing a unified system for easy data manipulation using SQL and pushing sophisticated analysis closer to data.
The PageRank Citation Ranking: Bringing Order to the Web
This paper describes PageRank, a method for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them.
A Few Useful Things to Know about Machine Learning
This paper summarizes twelve key lessons that machine learning researchers and practitioners have learned, which include pitfalls to avoid, important issues to focus on, and answers to common questions.
This paper describes a method of building a forest of uncorrelated trees using a CART like procedure, combined with randomized node optimization and bagging. In addition, it combines several ingredients, which form the basis of the modern practice of random forests.
A Relational Model of Data for Large Shared Data Banks
Written by EF Codd in 1970, this paper was a breakthrough in Relational Data Base systems. He was the man who first conceived of the relational model for database management.
Map-Reduce for Machine Learning on Multicore
The paper focuses on developing a general and exact technique for parallel programming of a large class of machine learning algorithms for multicore processors. The central idea is to allow a future programmer or user to speed up machine learning applications by “throwing more cores” at the problem rather than search for specialized optimizations.
Megastore: Providing Scalable, Highly Available Storage for Interactive Services
This paper describes Megastore, a storage system developed to blend the scalability of a NoSQL datastore with the convenience of a traditional RDBMS in a novel way.
Finding a needle in Haystack: Facebook’s photo storage
This paper describes Haystack, an object storage system optimized for Facebook’s Photos application. Facebook currently stores over 260 billion images, which translates to over 20 petabytes of data.
Spark: Cluster Computing with Working Sets
This paper focuses on applications that reuse a working set of data across multiple parallel operations and proposes a new framework called Spark that supports these applications while retaining the scalability and fault tolerance of MapReduce.
The Unified Logging Infrastructure for Data Analytics at Twitter
This paper presents Twitter’s production logging infrastructure and its evolution from application-specific logging to a unified “client events” log format, where messages are captured in common, well-formatted, flexible Thrift messages.
F1: A Distributed SQL Database That Scales
F1 is a distributed relational database system built at Google to support the AdWords business. F1 is a hybrid database that combines high availability, the scalability of NoSQL systems like Bigtable, and the consistency and usability of traditional SQL databases.
MLbase: A Distributed Machine-learning System
This paper presents MLbase, a novel system harnessing the power of machine learning for both end-users and ML researchers.
Scalable Progressive Analytics on Big Data in the Cloud
This paper presents a new approach that gives more control to data scientists to carefully choose from a huge variety of sampling strategies in a domain-specific manner.
Big data: The next frontier for innovation, competition, and productivity
This is paper one of the most referenced documents in the world of Big Data. It describes current and potential applications of Big Data.
The Promise and Peril of Big Data
This paper summarizes the insights of the Eighteenth Annual Roundtable on Information Technology, which sought to understand the implications of the emergence of “Big Data” and new techniques of inferential analysis.
TDWI Checklist Report: Big Data Analytics
This paper provides six guidelines on implementing Big Data Analytics. It helps you take the first steps toward achieving a lasting competitive edge with analytics.