A bit of background
In one of the web apps that I developed using Flask as the back end, we ran into a situation where there was a need to schedule a method. To detail this particular method, (1) It had to run once a day (2) It was not a super critical method, in the sense, the application will not break if it does not run (3) It invoked a couple of third-party APIs to fetch data (4) It had a lot of complex calculations and made use of pandas dataframes.
I explored multiple options, first, scheduling a kubernetes cron job, as the solution itself is deployed on a cloud VM within a microk8s cluster. The second option I looked at was to use celery and rabbitmq. The third option was to use a third-party package like APScheduler or FlaskAPScheduler. The last option was Linux’s plain old cron tab.
The cron tab was the one that I chose finally. If you are interested to look at how I implemented it and the supporting code for the same, feel free to jump to the implementation section below. If you are interested to know why I chose the Cron tab, and not the rest, please continue to read further.
Pros and Cons of each of the options
Kubernetes Cron Jobs
This was the first option that I explored, the reason being all the other components are deployed in multiple pods. K8s cron jobs was a natural choice as it fits the bill very well. The configuration was pretty straightforward. The only block was to access the method that was deployed in another pod. It was not just a single mthod but a bunch of methods doing a lot of work. When I went through multiple SO posts, what I understood was this design was an anti-pattern. (Disclaimer: I’m not a devops specialist yet :))
This was another natural choice as celery is a task queue to execute long-running jobs. I used celery/rabbitmq in one of my earlier projects that used django, so I was a lot familiar. I dropped that option as I had less time to complete the task altogether. Installing celery, rabbitmq in a microk8s environment, setting it up, and making it work just for a single job seemed to be overkill.
This one seemed to be quite popular when it came to executing scheduled jobs. In the case of flask, there was a FlaskAPScheduler. On drilling down, the problem that I may run into was scheduling had to be done within the flask app, which means the control is not outside which is a big no-no. Another point where it may create a problem is when the application had to be scaled. We will run into duplicate issues as the flask application was deployed in a pod.
Simple, neat, and very effective. No extra fanfare. Control was outside of the microk8s environment. And we will not run into any of the problems mentioned above.
My 2 cents while deciding on an approach
Whether the chosen approach is an optimal one is something that I keep as my primary decision point. It should neither be overkill nor too primitive.
The time I have to implement the approach is my secondary decision point.
The learning curve of the approach plays an equally important role. If someone has never used task queues, then picking up Celery for such a scenario may be difficult.
Keeping in mind when the application will scale, and to what extent is another deciding factor. If the application will only have an increase of 2% - 5% increase in customer base for the next 'x' months, or if the application will happily survive in the current deployment mode for the next 'x' months, I consider those while taking a decision.
How configurable the solution is, will also help in the long run.
Implementing the solution
Alright, let's dive into some action mode. Explaining anything with an example always helps. I'm a movie buff, so let me build a simple flask application that gives me movie recommendations on a daily basis.
Some fundamentals of the application:
Will use postgresql as the database
Will use TMDB APIs to get popular movies
Retrieved popular movies will be stored in a table in the database
Code for the same will be available in my GitHub repo
Typically, methods we write in a flask app have a route. To make use of a method in crontab, we cannot have a public route, as we cannot work outside of the request context. Instead, the trick is to create a custom command that can then be invoked from a crontab or a normal command line terminal too. How do we do that?
By using app.cli.command() decorator
Navigate to the directory where you want the flask application to reside. Create & activate virtual environment (assuming virtualenv is already installed)
py -m venv hashnodevenv source hashnodevenv/bin/activate
pip install Flask
Create a file app.py. Write a simple helloworld method as the root route. This method is just to ensure that things work fine in the first place.
from flask import Flask app = Flask(__name__) def hello_world(): return 'Hello World!'
Test that the development environment is working. Switch to the terminal window. Navigate to the folder where the flask app is present, set the FLASK_APP variable and run the app.
export FLASK_APP=app.py flask run
If everything is fine so far, you should see the text Hello World when you navigate to http://127.0.0.1:5000/ on your browser
Create a database (assuming postgresql is already installed)
CREATE DATABASE hashnode_movie_db;
Create a schema in the database
CREATE SCHEMA movie_schema;
Create required table
CREATE TABLE movie_schema.movierecommendation ( id bigint NOT NULL, original_title text COLLATE pg_catalog."default", overview text COLLATE pg_catalog."default", vote_average double precision, vote_count smallint )
Write methods to fetch movie list using TMDB API. Note: Please check my github repo for complete code.
def get_movie_recommendations(): # get Db connection conn = get_db_connection() # Open a cursor to perform database operations cur = conn.cursor() # Clear the table completely cur.execute('DELETE FROM movie_schema.movierecommendation;') conn.commit()
Set a crontab invoking above method. In your linux machine, issue the following
crontab -e 10 22 * * * cd <path where your code resides> && <virtual environment path>/bin/flask get_movie_recommendations>>movierecommendation.log 2>&1
That's it! It starts running once at 10:10 PM everyday.
This is a very simple and elegant way to schedule methods that reside in a flask app. This may not suit all scenarios, but when it does, it does very well.