Have you ever needed to run just one function from a Python file? For example, you might have functions for different tasks like uploading a machine learning model or running tests. If you want to run these functions individually — let's say only the end-to-end test — and are unsure about the best approach, this article might be useful for you 🤗

Let me start directly with an example to clarify what I mean:

Let's say we have the following main.py file:

# main.py
def compile_pipeline():
    """
    Function to compile a data processing pipeline.
    """
    # Your code to compile the pipeline goes here
    print("Compiling the data processing pipeline...")


def start_e2e_test():
    """
    Function to start an end-to-end test.
    """
    # Your code to start end-to-end test goes here
    print("Starting end-to-end test...")


def schedule_ml_pipeline():
    """
    Function to schedule a machine learning pipeline.
    """
    # Your code to schedule the machine learning pipeline goes here
    print("Scheduling the machine learning pipeline...")

We have three different functions that we want to use in our Github Actions as a part of CI/CD workflow. How can we run each function separately in the workflow.yaml file?

In Python, we usually do the following to start a specific function from the terminal:

def compile_pipeline():
    """
    Function to compile a data processing pipeline.
    """
    # Your code to compile the pipeline goes here
    print("Compiling the data processing pipeline...")


def start_e2e_test():
    """
    Function to start an end-to-end test.
    """
    # Your code to start end-to-end test goes here
    print("Starting end-to-end test...")


def schedule_ml_pipeline():
    """
    Function to schedule a machine learning pipeline.
    """
    # Your code to schedule the machine learning pipeline goes here
    print("Scheduling the machine learning pipeline...")


if __name__ == "__main__":
    compile_pipeline()

Then we can access this function when we run the following in our terminal python main.py

Now, What about if we want to access the other functions? Maybe not from the terminal but from a workflow .yaml file? Do we need to put all these functions inside different files and do the following?

This seems tedious right?

Well, We could think of the following:

import argparse

def compile_pipeline():
    """
    Function to compile a data processing pipeline.
    """
    # Your code to compile the pipeline goes here
    print("Compiling the data processing pipeline...")


def start_e2e_test():
    """
    Function to start an end-to-end test.
    """
    # Your code to start end-to-end test goes here
    print("Starting end-to-end test...")


def schedule_ml_pipeline():
    """
    Function to schedule a machine learning pipeline.
    """
    # Your code to schedule the machine learning pipeline goes here
    print("Scheduling the machine learning pipeline...")


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Run specific functions.')
    parser.add_argument('function', choices=['compile_pipeline', 'start_e2e_test', 'schedule_ml_pipeline'], help='Function to run')
    args = parser.parse_args()

    if args.function == 'compile_pipeline':
        compile_pipeline()
    elif args.function == 'start_e2e_test':
        start_e2e_test()
    elif args.function == 'schedule_ml_pipeline':
        schedule_ml_pipeline()

Now it is possible to run the different functions from the command line if we specify the name of the function as an argument. However, it is still inconvenient since we need to add the name of the new function every time and it is making the code unmaintable.

Solution

Luckily Python has a cool trick up its sleeves, You can use the globals() function from Python😍

globals() is a built-in Python function that returns a dictionary representing the global symbol table (or namespace) of the current module. It essentially provides access to all the global variables defined within the module.

So how can we use globals to call the different functions inside our file easily ?

We can replace the entire previous logic with one simple line:

import argparse

def compile_pipeline():
    """
    Function to compile a data processing pipeline.
    """
    # Your code to compile the pipeline goes here
    print("Compiling the data processing pipeline...")

def start_e2e_test():
    """
    Function to start an end-to-end test.
    """
    # Your code to start end-to-end test goes here
    print("Starting end-to-end test...")

def schedule_ml_pipeline():
    """
    Function to schedule a machine learning pipeline.
    """
    # Your code to schedule the machine learning pipeline goes here
    print("Scheduling the machine learning pipeline...")

if __name__ == "__main__":
    function_name = sys.argv[1]
    globals()[function_name]()

Now we can run whatever function we would like using the following 😍

python main.py <function_name> , e.g. python main.py start_e2e_test

It is as easy as this !! Yes, I was also surprised like you. Now you might ask

What about if our function takes argument parameters?

It took me some time to figure this one out but here it is, we need to adjust the code as follows:

import argparse
import sys
def compile_pipeline(arg1):
    """
    Function to compile a data processing pipeline.
    """
    # Your code to compile the pipeline goes here
    print("Compiling the data processing pipeline...")
    print(f"Found parameter {arg1}")

def start_e2e_test():
    """
    Function to start an end-to-end test.
    """
    # Your code to start end-to-end test goes here
    print("Starting end-to-end test...")

def schedule_ml_pipeline():
    """
    Function to schedule a machine learning pipeline.
    """
    # Your code to schedule the machine learning pipeline goes here
    print("Scheduling the machine learning pipeline...")

if __name__ == "__main__":

    # Assuming at least one argument is provided (function name)
    if len(sys.argv) > 1:
    # similar to before 
        function_name = sys.argv[1]
        if function_name in globals() and callable(globals()[function_name]):
            # Call the function with additional arguments if provided
            if len(sys.argv) > 2:
                function_args = sys.argv[2:]
                globals()[function_name](*function_args)
            else:
                globals()[function_name]()
        else:
            print("Function not found or not callable:", function_name)
    else:
        print("No function name provided.")

The extended code will now check if arguments are given, it assumes the first argument is the name of the function to call. It verifies that the specified function exists in the global namespace and is callable. If additional arguments are provided, they're passed to the function.

We can now run the code using the following:

python main.py compile_pipeline pipeline_parameter

CI/CD Integration

We can easily integrate these steps into your workflow using the following:

# workflow.yaml file inside your .github folder for triggering Github Actions    
# step in your CI/CD Workflow
        - name: Compile Pipeline
        working-directory: ./pipelines
        run: python main.py compile_pipeline pipeline_parameter

This also work if you are using poetry as a dependency manager or inside a bash script or a Makefile .

That's it!!

I hope you enjoy this Python trick as much as I do and it helped you shorten the code necessary for starting different automation steps ❤️

Enjoyed This Story?

Subscribe for free to get notified when I publish a new story.

If you have any questions, please feel free to ask them in the comments and I'll try my best to answer everyone ❤

If this article provided you with the solution, you were seeking, you can support me on my personal account. Your support is always appreciated ❤

In Plain English 🚀

Thank you for being a part of the In Plain English community! Before you go: