• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

Python backup script -- please critique

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

Stratus_ss

Overclockix Snake Charming Senior, Alt OS Content
Joined
Jan 24, 2006
Location
South Dakota
So I am trying to get out of the mindset of procedural mindset and start working more with objects. I have written a small script which takes a mysql backup, moves it to a remote mount and then checks the size of the backup versus the last backup on the mount point. When it is done it will email a short blurb regarding some info.

I would love some feedback from the more experienced programmers out there as I think peer review is the best way to get better. Before anyone asks, I have not read "Code Complete" yet, it is on my list of books to read in the next little while, so I realize that this will be no way close to being perfect. None-the-less constructive criticism is welcome!

Code:
#!/usr/bin/python
# Owner: stratus_ss
# Date Created: March 21, 2014
# Date Updated: Apr 1, 2014
# Primary Function: Backs up the a database, then transfers it to a mount point
# After the transfer is completed, it will compare the file size to determine the upload
# Was completed successfully

# Secondary Function: This script will also compare versus the last month's backups
# to determine if they are exact same. If so, this is an indication of a problem as
# The database is expected to grow each month, even if only slightly

import sys
import os
import shutil
import datetime
import time
import smtplib
import socket


class initial_operations:

    #This class will set the variables that are generic to all processes
    def __init__(self):
        #This will parse the command line arguments and echo out the usage
        try:
            self.DATABASE_NAME = sys.argv[1]
            self.CRED_FILE = sys.argv[2]
            self.SEND_TO_ADDRESS = sys.argv[3]
            self.SEND_TO_NAME = self.SEND_TO_ADDRESS.split("@")[1]
            self.COPY_FILE = sys.argv[4]
            if "yes" in self.COPY_FILE:
                self.COPY_LOCATION = sys.argv[5]
        except:
            print("You did not specify a database name, this is required for proper backup")
            print("USAGE: ./backup_mysql_database.py <database name> <credentials file> <email to report to> <copy_file=yes/no> <location>")
            sys.exit()
        self.START_TIME = str(datetime.datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d_%H:%M'))
        self.TODAYS_DATE = str(datetime.datetime.fromtimestamp(time.time()).strftime('%Y%m%d'))
        self.BACKUP_DIR = "/DATA/backups/%s" % self.DATABASE_NAME
        os.chdir(self.BACKUP_DIR)
        #For most servers, the local backups should be kept. For the servers in the databases_to_clean
        #list, we only want to keep a single backup between intervals that this script runs
        self.TODAYS_BACKUP_FILE = "%s_%s.sql.gz" % (self.DATABASE_NAME, self.TODAYS_DATE)
        self.HOSTNAME = socket.gethostname()

    def cleanBackups(self, database_name):
        #This function exists to clean the database backup directory if required
        #At the time of creation only a single database required purging
        databases_to_clean = ["EVD_CARGO"]
        if database_name.upper() in databases_to_clean:
            for FILES in os.listdir("."):
                if os.path.isfile(FILES):
                    os.remove(FILES)


class email:

    def __init__(self, backup_instance, message_type):
        #This will setup the email messages for sending
        #It pulls the variables created earlier to fill in the details
        self.message = """From: MYSql Backups <backups@%s>
To: %s <%s>
Subject: Backups completed on %s

The backups are now complete.
""" % (create_variables.HOSTNAME, create_variables.SEND_TO_NAME, create_variables.SEND_TO_ADDRESS, create_variables.HOSTNAME)
        if "failed" in message_type:
            self.message += """However, the backup size from the previous month is within 500 bytes which may
indicate a problem.
Last Month's backup: %s   ---> %s
This Month's backup: %s   ---> %s""" % (backup_instance.LAST_MONTHS_BACKUP,
            backup_instance.LAST_MONTHS_BACKUP_SIZE, create_variables.TODAYS_BACKUP_FILE, backup_instance.TODAYS_BACKUP_SIZE)
        else:
            self.message += """The backup is not the same size as last month.
This Month's backup: %s   ---> %s""" % (create_variables.TODAYS_BACKUP_FILE, backup_instance.TODAYS_BACKUP_SIZE)
        self.send_email()

    def send_email(self):
        sender = create_variables.HOSTNAME
        receivers = [create_variables.SEND_TO_ADDRESS]
        try:
            smtp_mail = smtplib.SMTP('localhost')
            smtp_mail.sendmail(sender, receivers, self.message)
            print("email sent")
        except Exception, error:
            print("email failed to send")
            print(error)


class create_Backups:

    def databaseActivities(self):
        #This function does the mysql dump and gets the file size of the backup for later checking
        print(("Process started at: %s" % create_variables.START_TIME))
        os.popen("mysqldump --defaults-extra-file=%s -n -R %s |gzip -c > %s " % (create_variables.CRED_FILE,
         create_variables.DATABASE_NAME, create_variables.TODAYS_BACKUP_FILE))
        END_TIME = str(datetime.datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d_%H:%M'))
        print(("Process completed at: %s" % END_TIME))
        self.TODAYS_BACKUP_SIZE = os.path.getsize(create_variables.TODAYS_BACKUP_FILE)

    def fileCheck(self):
        #This section will move the file into the specified location
        #It checks the file size versus the previous backup, if the last backup
        #size is within 500 bytes, there is assumed to be a problem as for our usage
        #databases should always be either growing or contracting.
        if hasattr(create_variables, "COPY_LOCATION"):
            shutil.copy(create_variables.TODAYS_BACKUP_FILE, create_variables.COPY_LOCATION)
            #Find the file after it is moved to the remote location and get the file size
            self.MOUNT_FILE = "%s/%s" % (create_variables.COPY_LOCATION, create_variables.TODAYS_BACKUP_FILE)
            self.MOUNT_FILE_SIZE = os.path.getsize(self.MOUNT_FILE)
            if (self.MOUNT_FILE_SIZE - self.TODAYS_BACKUP_SIZE) != 0:
                print((self.MOUNT_FILE_SIZE, self.TODAYS_BACKUP_SIZE))
            else:
                print("The file sizes are the same")
                #Filenames look like this: database_20140122.sql.gz
                os.chdir(create_variables.COPY_LOCATION)
                latest_backup_date = 0
                self.LAST_MONTHS_BACKUP = ''
                for backup in os.listdir("."):
                    #cycle through the backups in the remote directory, exclude today's backup
                    #Track only the last backup for comparison
                    if create_variables.DATABASE_NAME in backup and not create_variables.TODAYS_BACKUP_FILE in backup:
                        backup_date = int(backup.split(create_variables.DATABASE_NAME + "_")[1].split(".")[0])
                        if backup_date > latest_backup_date:
                            latest_backup_date = backup_date
                            self.LAST_MONTHS_BACKUP = backup
                self.LAST_MONTHS_BACKUP_SIZE = os.path.getsize(self.LAST_MONTHS_BACKUP)
                if (self.LAST_MONTHS_BACKUP_SIZE - self.TODAYS_BACKUP_SIZE) < 500:
                    print("WARNING!!! Backups are the same size as last month")
                    email(self, "failed")
                else:
                    print("File sizes are different, emailing 'everything is ok' message")
                    email(self, "passed")


create_variables = initial_operations()
create_variables.cleanBackups(create_variables.DATABASE_NAME)

run_backups = create_Backups()
run_backups.databaseActivities()
run_backups.fileCheck()
 
Instead of using argv[], you might look into something like this to make your command line syntax a little more friendly. I'm sure it'll live on cron and probably rarely be invoked manually, but having the ability to put switches and their values rather than ordered arguments is always a nice addition to any program requiring multiple parameters. (ie, program.py -d DATABASENAME -c CRED_NAME, etc).
 
Thanks for the feedback. I actually did look at argparse but for how the script was evolving I wasnt really sure that it was the way I wanted to go.

Is the only real advantage when you invoke the script? I could not find a compelling use for Argparse for this use case
 
Yeah, that's probably the only real use case is to keep someone from switching up two arguments. No benefit other than making it goofproof if someone other than you or a scheduler is running it.
 
I'm not seeing any code where you are benefiting from using object-oriented coding here. This is often the case with one-off scripts. You're writing them to do a very specific task that doesn't fit a generic pattern, so much of the code isn't really reusable.

Looking at the initial_operations class I see two functions that have nothing in common. By which I mean, the self object is not even referenced in cleanBackups(), and it's only used as a place to store variables in __init__(). You could move all of the code in __init__() to a module-level function (called parse_args() or something) and have that return a dict, or you could just put that code at the module-level in the script, since it's run unconditionally every time the script is run anyway. The latter option is my preference.

The create_Backups class similarly has two functions that really aren't related. They share one common piece of data, self.TODAYS_BACKUP_SIZE which is easily discoverable from create_variables.TODAYS_BACKUP_FILE, which you use directly in both functions. Or would be easily discoverable if create_variables was available in the functions' scopes. More on that later.

The email class could also be a single module-level function or, perhaps more cleanly, two functions one of which formats the body of the message and returns a string while the other handles sending the email.

The email and the create_Backups classes reference create_variables, which shouldn't be available in their scope. That variable should exist in the global scope when the script is executed, but the classes functions won't know that it exists. If you try to run the code these will raise an error each time.

Aside from the overall structure, which is largely a matter of opinion and personal taste, a couple of little things:
- Use datetime.datetime.now() to get the current datetime, rather than datetime.datetime.fromtimestamp(...etc...). I usually have a line like START_TIME = datetime.datetime.now() at the start of my scripts and reference it every time I need to use the "current" time. This ensures that all files I make have the same timestamp.
- Consider using the whole datetime in your backup file name, instead of just the date. Sometimes, especially during troubleshooting, you might run the script multiple times in the same day. This might be more hassle than it's worth because you'd need to tweak the fileCheck() function to match.
- Use os.path.join() to concatenate paths instead of a string join. i.e.: Instead of self.MOUNT_FILE = "%s/%s" % ...

Overall not bad, especially since you aren't a career programmer (I assume.) Since you asked for critique, I critiqued it as I'd critique code from one of my team members, which is fairly harshly. For what you're doing, it will probably work fine as long as you sort out the scoping issue with your functions and the create_variables data. You might consider taking a timestamp at the top of the script, and one once it's finished with the backup, and using those two to report the total execution time. It can turn out to be very useful to have in your logfiles both to know how long the script normally takes to run, and to see how it scales once the amount of data starts to increase.
 
Last edited:
Back