Data lake

Example of a database that can be used by a data lake (in this case structured data)

A data lake is a system or repository of data stored in its natural/raw format,[1] usually object blobs or files. A data lake is usually a single store of data including raw copies of source system data, sensor data, social data etc.,[2] and transformed data used for tasks such as reporting, visualization, advanced analytics, and machine learning. A data lake can include structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs), and binary data (images, audio, video).[3] A data lake can be established "on premises" (within an organization's data centers) or "in the cloud" (using cloud services of like Amazon web services, Microsoft Azure, Google Cloud Platform or Oracle Cloud).

  1. ^ "The growing importance of big data quality". The Data Roundtable. 21 November 2016. Retrieved 1 June 2020.
  2. ^ "What is a data lake?". aws.amazon.com. Retrieved 12 October 2020.
  3. ^ Campbell, Chris. "Top Five Differences between DataWarehouses and Data Lakes". Blue-Granite.com. Archived from the original on 14 March 2016.

Developed by StudentB