This utility function has been designed to help you perform Data Augmentations in the easiest and fastest way on your TFRecords Dataset.
There are two ways to utilize the Data Augmentations.
a) Conventional Data Augmentation
b) Data Augmentation for Every Training Epoch
However, this is something that depends on your style / way of coding. I'll come to that later.
The two major utilities provided by quick_ml library are =>
Let's describe them in detail.
Define augmentations will be used to define all the augmentations you intend to use. Begin by importing the define_augmentations function from quick_ml.
Define all the augmentations you intend to use by the following function call.
flip_left_right - Random Flip the images left or right. (True or False)
hue - Set a random hue value to be added to the image.
contrast - Add a random contrast value to the image.
brightness - Add a random brightness value to the image.
random_crop - Random crop the image
random_saturation - Add random saturation to the image.
random_zoom - Add random zooming effect to the image.
flip_up_down - Random Flip the image up or down.
random_rotation - Add random rotation to the image.
random_shear - Add random shear to the image.
random_shift - Add random shift to the image.
Upon successful execution, returns the defined augmentations along with their values assigned.
Use this to define callbacks values during model training.
Note :- This is different from the callbacks utility which is a different standalone feature. For model training with augmentations, you need to use the callback feature in the augments module of quick_ml. Else, for training without augmentations, you can use callbacks feature of the quick_ml package.
To begin, you need to import define_callbacks
define_callbacks supports learning_rate scheduler as a callback. In upcoming versions, additions will be made.
lr_scheduler - choose among 'rampup', 'stepped_lrfn', 'step_decay', 'simple_lrfn'. Default None.
Doesn't return anything. Defines the callbacks values.
Use this for single model training with augmentations.
To begin with, use the following line of code.
After that, the following function call shall help you to begin with model training.
model - model object created. Preferred way of model creation is through create_model method of quick_ml package.
GCS_DS_PATH - The GCS_DS_PATH of the labeled TFRECORDS DATASET
train_tfrec_path - path of the training tfrecords file(s)
val_tfrec_path - path of the validation tfrecords file(s)
batch_size - Set the batch size. Preferred batch_size * strategy.num_replicas_in_sync
epochs - the number of epochs for which the model has to be trained.
steps_per_epoch - steps per epoch value. Preferred, NUM_IMAGES // BATCH_SIZE
plot - True or False value. Whether or not to plot the model training curve.
Doesn't return anything. Trains the model with augmented images for every epoch.
Use this to get the models training report. To begin, use the following line of code.
This helps you obtain a models training report, which would be similar to the image shown below.
Table Description :-
1) Model Name -> Name of the model trained on the dataset
2) Top 1 Accuracy -> The last accuracy score on training dataset
3) Top 3 Accuracy -> The average of the last 3 accuracy scores on training dataset
4) Val Top 1 Accuracy -> The last validation accuracy score on validation dataset
5) Val Top 3 Accuracy -> The average of the last 3 validation accuracy scores on validation dataset
To obtain the models training report, call the get_models_training_report function with the appropriate argument values.
Arguments Description ->
models - list of models to obtain the training report on. eg. models = ['VGG16', 'EfficientNetB7', 'InceptionV3', 'ResNet50']
tpu - The TPU instance
n_class - number of classes in the Dataset
GCS_DS_PATH - GCS_DS_PATH of the labelled TFRecords Dataset.
train_tfrec_path - the path for the training tfrecords file(s).
steps_per_epoch - The number of steps per epoch for training. Preferred. NUM_TRAINING_IMAGES//BATCH_SIZE
epochs - the number of training epochs for model training.
val_tfrec_path - the path for the validation tfrecords file(s).
classification_model - The classification model which you want to attach as the top to the pretrained model. The 'default' classification model has a Global Average Pooling2D followed by Dense layer with output nodes same as the number of classes for classification.
You can define your own classification_model (Sequential Model) and pass the model as an argument to the classification model.
class_model = tf.keras.Sequential([ tf.keras.layers(), tf.keras.layers() ]) get_models_training_report(models, tpu, n_class, traindata, steps_per_epoch, epochs, val_data, classification_model = class_model)
freeze - Whether or not you want to freeze the pretrained model weights. Default, False.
input_shape - Defines the input_shape of the images of the dataset. Default, [512,512,3]
input_shape - The image size of the training images. Default, [512,512,3]
activation - The activation function for the final Dense layer of your Classification model. Default, 'softmax'. For binary classification, change to 'sigmoid' with n_class = 1.
weights - The pretrained Model weights to be taken for consideration. Default, 'imagenet'. Support for 'noisy-student' coming soon.
optimizer - The optimizer for the model to converge while training. Default, 'adam'
loss - loss function to consider while training your deep learning model. Two options supported. 'Sparse Categorical CrossEntropy' & 'Binary Cross Entropy'. Default, 'Sparse Categorical CrossEntropy'.
metrics - The metric to be taken under consideration while training your deep learning model. Two options available. 'accuracy' & 'sparse_categorical_accuracy'. Use 'accuracy' as a metric while doing Binary Classification else 'sparse_categorical_accuracy'. Default, 'sparse_categorical_accuracy'.
plot - Plot the training curves of all the models for quick visualization. Feature Coming soon.
A Pandas Dataframe with a table output as shown above. You can save the function output in a variable and save the dataframe to your disk using .to_csv() method.
from quick_ml.augments import define_augmentations
define_augmentations(flip_left_right = False, hue = None, contrast = None, brightness = None, random_crop = None, random_saturation = None,random_zoom = None, flip_up_down = False, random_rotation = None, random_shear = None, random_shift = None)
from quick_ml.augments import define_callbacks
define_callbacks(lr_scheduler = None)
from quick_ml.augments import augment_and_train
augment_and_train(model, GCS_DS_PATH, train_tfrec_path, val_tfrec_path, batch_size, epochs, steps_per_epoch, plot = False)
from quick_ml.augments import get_models_training_report
get_models_training_report(models, tpu, n_class, GCS_DS_PATH, train_tfrec_path, steps_per_epoch, epochs, batch_size, val_tfrec_path, classification_model = 'default', freeze = False, input_shape = [512,512,3], activation = 'softmax', weights = 'imagenet', optimizer = 'adam', loss = 'sparse_categorical_crossentropy', metrics = 'sparse_categorical_accuracy', plot = False)