Member-only story

Automate deploying Apache Spark on Oracle Cloud Infrastructure

Johan Louwers

--

Apache Spark

When working with Apache Spark, you may often require rapid development environments. Solutions akin to Vagrant are excellent for installing it on your local machine within a virtual environment. Alternatively, you might want to swiftly deploy the same environment on Oracle Cloud to be closer to your data platform.

Apache Spark is a powerful open-source data processing framework designed for big data analytics and machine learning. It provides a fast and flexible way to handle large-scale data processing tasks, offering significant speed improvements over traditional data processing frameworks like Hadoop MapReduce. With its ability to process data in-memory and support for various programming languages, Spark enables organizations to efficiently analyze, manipulate, and derive insights from massive datasets, ultimately driving smarter business decisions and innovation.

Though managed Apache Spark solutions exist, there’s often a preference for deploying short-lived environments. Ideally, such deployment should involve a high degree of automation. The following script can assist you in deploying Apache Spark on Oracle Linux 8 on OCI. It’s important to note that this script is tailored for machines running on OCI as it leverages Ksplice and the backend provided as part of OCI.

Ksplice is a technology that allows organizations to apply critical security patches and updates to the Linux kernel without the need for system reboots. This means that businesses can maintain the security and stability of their Linux systems while minimizing downtime and disruptions to operations. By seamlessly integrating patches into the running kernel, Ksplice ensures that systems remain protected against vulnerabilities without interrupting ongoing processes or services. This enables businesses to stay agile and responsive to security threats while ensuring continuous availability of their critical infrastructure.

Bash scripting to the rescue

The below ‘quick and dirty’ bash script can be used as part of the autoation deployment for you dev node when working with Apache Spark. It is not perfect, however it will give you a quick master / worker node combination on a Oracle Linux 8 VM in Oracle Cloud.

--

--

No responses yet