Unstructured data

Unsorted records captured from Nazi Germany at the U.S. National Archives Military Records Center in Alexandria, Virginia, 1956

Unstructured data (or unstructured information) is information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in fielded form in databases or annotated (semantically tagged) in documents.

In 1998, Merrill Lynch said "unstructured data comprises the vast majority of data found in an organization, some estimates run as high as 80%."[1] It's unclear what the source of this number is, but nonetheless it is accepted by some.[2] Other sources have reported similar or higher percentages of unstructured data.[3][4][5]

As of 2012, IDC and Dell EMC project that data will grow to 40 zettabytes by 2020, resulting in a 50-fold growth from the beginning of 2010.[6] More recently, IDC and Seagate predict that the global datasphere will grow to 163 zettabytes by 2025 [7] and majority of that will be unstructured. The Computer World magazine states that unstructured information might account for more than 70–80% of all data in organizations.[1]

  1. ^ Shilakes, Christopher C.; Tylman, Julie (16 Nov 1998). "Enterprise Information Portals" (PDF). Merrill Lynch. Archived from the original (PDF) on 24 July 2011.
  2. ^ Grimes, Seth (1 August 2008). "Unstructured Data and the 80 Percent Rule". Breakthrough Analysis - Bridgepoints. Clarabridge.
  3. ^ Gandomi, Amir; Haider, Murtaza (April 2015). "Beyond the hype: Big data concepts, methods, and analytics". International Journal of Information Management. 35 (2): 137–144. doi:10.1016/j.ijinfomgt.2014.10.007. ISSN 0268-4012.
  4. ^ "The biggest data challenges that you might not even know you have - Watson". Watson. 2016-05-25. Retrieved 2018-10-02.
  5. ^ "Structured vs. Unstructured Data". www.datamation.com. Retrieved 2018-10-02.
  6. ^ "EMC News Press Release: New Digital Universe Study Reveals Big Data Gap: Less Than 1% of World's Data is Analyzed; Less Than 20% is Protected". www.emc.com. EMC Corporation. December 2012.
  7. ^ "Trends | Seagate US". Seagate.com. Retrieved 2018-10-01.

Developed by StudentB