What is the difference between Google Cloud Dataflow and Google Cloud Dataproc?

 Here are three main points to consider while trying to choose between Dataproc and Dataflow

  • Provisioning
    Dataproc - Manual provisioning of clusters
    Dataflow - Serverless. Automatic provisioning of clusters

  • Hadoop Dependencies
    Dataproc should be used if the processing has any dependencies to tools in the Hadoop ecosystem.

  • Portability
    Dataflow/Beam provides a clear separation between processing logic and the underlying execution engine. This helps with portability across different execution engines that support the Beam runtime, i.e. the same pipeline code can run seamlessly on either Dataflow, Spark or Flink.




Comments

Popular posts from this blog

SQL basic interview question

gsutil Vs Storage Transfer Service Vs Transfer Appliance