Platform Migration: Data Centers to Cloud Architectures

Prajwal V. Atreyas, S. Yamuna, Pramod Khadse, and S.B. Prapulla

With the current trend where organizations are moving towards cloud services and hybrid cloud technologies, the objective of this study is to develop a seamless data pipeline to perform data integration as part of platform migration, i.e. from data centers to cloud architecture. The proposed methodology is to implement these jobs by employing the Extract-Transform-Load (ETL) procedures to develop interfaces in Talend Open Studio, viz., a data integration tool. First, the data is extracted from multiple sources, such as, databases and flat files. Then, multiple transformations such as filtering, sorting and joining are done on the data. Finally, the transformed data is loaded into the staging tables of the Enterprise Data Warehouse. This is achieved by migrating the interfaces from the tool currently in use, IBM Infosphere DataStage, to re-create the functionalities. The comparison between the features of the two tools, Talend and DataStage, resulted in the identification of the pros and cons of each tool. It was inferred that Talend is equivalent to DataStage in most of the cases but with enhancements and tweaks in Talend, the execution time of few interfaces were reduced by half.

Keywords: Talend, DataStage, Cloud migration, Data integration, ETL.


