An evolving term that describes any voluminous amount of structured, semi-structured and unstructured data that have the potential to be mined for information.
- Data that would take too much time and cost too much money to load into relational databases for analysis (typically petabytes and exabytes of data).
- Extensive datasets/collections/linked data primarily characterized by big volume, extensive variety, high velocity (creation and use), and/or variability that together require a scalable architecture for efficient data storage, manipulation, and analysis. In general, the size is beyond the ability of typical database software tools to capture, store, manage and analyze. It is assumed that as technology advances over time, the size of datasets that qualify as big data will increase. Also the definition can vary by sector, depending on what kind of software tools are commonly available and what sizes of datasets are common in a particular industry. With those caveats, big data in many sectors today will range from a few dozen terabytes to multiple petabytes (thousands of terabytes).
- McKinsey Global Institute – Big data: the next frontier for innovation, competition and productivity as quoted by the TC3+ in their October 2013 consultation document: Capitalizing on Big Data: Towards a Policy Framework for Advancing Digital Scholarship in Canada.