The term «data pipeline» refers to a collection of processes that gather raw data and convert it into an appropriate format to be utilized by software applications. Pipelines can be real-time or batch-based. They can be used on-premises or in the cloud and their tools are open source or commercial.

Data pipelines are similar to physical pipelines that carry water from the river to your home. They transfer data from one layer to the next (data lakes or warehouses) similar to how a physical pipe brings water from the river to a house. This allows analytics and insights from the data. In the past, data transfer required manual procedures like daily uploads of files or lengthy wait times to get insights. Data pipelines replace manual procedures and enable companies to transfer data more efficiently and without risk.

Accelerate development by using an online data pipeline

A virtual data pipe can save a lot of cost on infrastructure including storage in the datacenter or in remote offices. It also reduces network, hardware and administration costs for non-production or test environments. It also can save time due to automated More Info about data rooms for better practice data refresh, masking, role based access control and customization of databases and integration.

IBM InfoSphere Virtual Data Pipeline (VDP) is a multi-cloud copy-management solution that decouples test and development environments from production infrastructures. It uses patented snapshot and changed-block tracking technology to capture application-consistent copies of databases and other files. Users can mount masked, fast virtual copies of databases in non-production environments. Users can begin testing in just minutes. This is particularly useful for speeding up DevOps and agile methods as in accelerating time to market.