Download “Data Preparation in the Big Data Era,” a new free report to help you manage the challenges of data cleaning and preparation.
Data is growing at an exponential rate worldwide, with huge business opportunities and challenges for every industry. In 2016, global Internet traffic will reach 90 exabytes per month, according to a recent Cisco report. The ability to manage and analyze an unprecedented amount of data will be the key to success for every industry.
To exploit the benefits of a big data strategy, a key question is how to translate all of that data into useful knowledge. To meet this challenge, a company first needs to have a clear picture of their strategic knowledge assets, such as their area of expertise, core competencies, and intellectual property.
Having a clear picture of the business model and the relationships with distributors, suppliers, and customers is extremely useful in order to design a tactical and strategic decision-making process. The true potential value of big data is only gained when placed in a business context, where data analysis drives better decisions — otherwise, it’s just data.
In a new O’Reilly report Data Preparation in the Big Data Era, we provide a step-by-step guide to manage the challenges of data cleaning and preparation — critical steps before effective data analysis can begin. We explore the common problems of data preparation and the different steps involved, including data cleaning, combination, and transformation. You’ll also learn about new products that deal with problem of data variety at scale, including Tamr’s solution, which curates data at scale using a combination of machine learning and expert feedback.
This free report begins by discussing the importance of identifying and exploring your business question. We define various terms used in the data preparation phase, such as raw data, technically correct data, consistent data, tidy data, and aggregated or compressed data. Explore how to select and correct raw data errors, with normalization or string approximation techniques and the value in having a data strategy.
This post is a collaboration between O’Reilly and Tamr. See our statement of editorial independence.
Public domain image on article and category pages via Wikimedia Commons.