What is the difference between Google Cloud Dataflow and Google Cloud Dataproc?
Here are three main points to consider while trying to choose between Dataproc and Dataflow
Provisioning
Dataproc - Manual provisioning of clusters
Dataflow - Serverless. Automatic provisioning of clustersHadoop Dependencies
Dataproc should be used if the processing has any dependencies to tools in the Hadoop ecosystem.Portability
Dataflow/Beam provides a clear separation between processing logic and the underlying execution engine. This helps with portability across different execution engines that support the Beam runtime, i.e. the same pipeline code can run seamlessly on either Dataflow, Spark or Flink.
Comments
Post a Comment