September 14, 2017

Data Curation & Data Quality
8:00 AM - 12:00 PM
60 Washington Square South RM 406
New York, NY 00000
Dave Loshin, Data Curation & Data Quality

Bio: David Loshin, president of Knowledge Integrity, Inc, (, is a recognized thought leader and expert consultant in the areas of analytics, big data, data governance, data quality, master data management, and business intelligence. Along with consulting on numerous data management projects over the past 15 years, David is also a prolific author regarding business intelligence best practices, as the author of numerous books and papers on data management, including the recently published “Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph,” the second edition of “Business Intelligence – The Savvy Manager’s Guide,” as well as other books and articles on data quality, master data management, big data, and data governance. David is a frequent invited speaker at conferences, web seminars, and sponsored web sites and channels including, and share additional content at his notes and articles at Abstract: Data Curation and Data Quality Data Scientists report that they spend 50% of their time preparing data for analysis, and it is the least interesting part of their jobs (Strata 2016 Data Scientist Survey)


Data Curation  & Data Quality


Data Curation is recognized as a critical activity to improve data analytics efficiency. "Curation" is the process of assembling, organizing, and managing a collection of objects. By extension, data asset curation is the process of assembling, organizing, and managing a collection of data assets. We should adjust this definition, however, to account for context and purpose within a community of data consumers, since data assets that are not shared (or positioned to be shared) would not be subjected to data curation. Therefore, a better definition (or description) of data asset curation would be the process of assembling, organizing, and managing a collection of data assets for the purpose of expanding data accessibility and sharing among a community of data consumers. In this talk, we will explore ideas and methods that help achieve the objectives of data asset curation, including: Simplifying discoverability of existing data assets. Providing details of data asset structure. Providing details of data asset semantics. Capturing the provenance of any enhancements or modifications applied to a curated data asset. Providing a means of sharing information about the curated data assets. Establishing standards for transformation and integration of shared data domains. Finally, we will discuss how the practices of data curation will facilitate improved data quality.