Amazon made a couple of announcements today at AWS re:Invent in Las Vegas that helps move data management toward a future without the need for extract transform load, or ETL.
ETL is the bane of every data scientist and team as they try to get data into shape to put it to work. As AWS CEO Adam Selipsky explained, you may have data in a number of different places like your application usage data in a database and customer reviews in your data lake. Putting them together has been a significant challenge up until now.
AWS introduced Aurora zero-ETL integration with Amazon Redshift to give customers using the Aurora database and the Redshift data warehouse the ability to move data without having to perform ETL on it.
“We’ve been working for a few years now and building integrations between our services to make it easier to do analytics and machine learning without having to deal with ETL,” Selipsky told the re:Invent audience.
“But what if we could do more? What if we could eliminate ETL entirely? That would be a world we would all love. This is our vision, what we’re calling a zero ETL future. And in this future, data integration is no longer a manual effort,” he said. “So today I’m excited to announce the preview of a fully managed new ETL-free integration between Aurora and Redshift.”
While he was at it, he announced a similar integration between Amazon Redshift and Apache Spark, the popular open source big data processing platform. It offers comparable ability to move data between the two platforms without having to extract, transform and load first.
The Redshift-Aurora integration is in preview. The Redshift-Apache Spark integration is available now across all regions.
We read at: techcrunch.com