Apache Airflow is a platform for analytics and workflow applications. It manages multi-stage pipelines that consist of tasks that run on various schedules, using both cron-like time-based jobs and event-driven execution models.
Using Apache Airflow, an organization can design, implement, schedule, monitor, and maintain highly available pipelines that include batch processing, continuous processing, staging, and other custom scheduling strategies.
Before getting any further, we assume that you have a basic understanding of how the Linux shell works and how we send commands to it. You also need to know how to run terminal commands under root privileges in the safe way using
In this article, we will show you to install Apache Airflow on Ubuntu 20.04 (codename Focal Fossa). The installation instructions is applicable to any Linux distro based on Ubuntu or Debian, such as Linux Mint, ElementaryOS or Pop! OS.
The standard package manager for Python is
pip . It allows you to install and manage packages that aren’t part of the Python standard packages. Since Airflow is an Apache product, you’re going to need
pip to install the software. In order to install
pip, sequentially run the following commands in a terminal emulator.
sudo apt-get install software-properties-common sudo apt-add-repository universe sudo apt-get update sudo apt-get install python-setuptools sudo apt install python3-pip
Once the installation is complete, verify the installation by checking the pip version:
The version number may vary, but it will look something like this:
pip 9.0.1 from /usr/lib/python3/dist-packages (python 3.6)
Please note that recent Debian/Ubuntu versions have modified pip to use the “User Scheme” by default, which may come as a surprise for a few users who is already familiar with the old, global way.
Install Airflow and its dependencies on Ubuntu
Before installing Apache Airflow, make sure you have the necessary dependencies installed. Airflow uses sqlite as its default database, but you can use something more scalable like PostgreSQL or MySQL if you like.
However, if you’re just starting out and just wants to learn the basics, you can stick with sqlite to keep things simple. In that case, you can skip this part
In order to install addition support packages, run the following commands:
sudo apt-get install libmysqlclient-dev sudo apt-get install libssl-dev sudo apt-get install libkrb5-dev
Once you’ve had all Airflow depencies on your system, now you can begin installing the software itself.
First of all, Airflow needs a home directory where it stores all its settings, configurations. It is usually set to
~/airflow using the following command:
After that, install Apache Airflow using the command below:
pip3 install apache-airflow pip3 install typing_extensions
By the time the two commands above completes, Airflow is now installed on your system. But you cannot access its interface just yet because the Airflow service is not running. In order to start Airflow, you need to initialize its database, starts its server and scheduler by executing the following commands.
Code language: PHP (php)
# initialize the database airflow initdb # start the web server, default port is 8080 airflow webserver -p 8080
Now open up another terminal window and run the following command to start the Airflow scheduler, as it must run in a separate process.
Code language: PHP (php)
# start the scheduler, should be ran in a separate window airflow scheduler
And that’s it — Apache Airflow is now running! To verify, open a web browser and go to
localhost:8080 . You should be able to see the following screen once logged in.
We hope that the information above is useful to you. If you’re interested in advanced source editing in Visual Studio Code, check out our post on how to enable/disable word wrap in VSCode, How to use LaTeX in VSCode or how to automatically indent your code in Visual Studio Code.