Over the past few years, and particularly during the widespread fires last year, the term fire season has taken hold. This reality, brought by climate change, is now part of our yearly cycle here in California. We're entering this year's fire season now. I feel apprehensive--will it be as bad as last year? Worse? It very well could be. So, is there anything machine learning could be used for to help us prepare?
What I'll do in this series of posts is walk through the creation of a home wildfire risk evaluation tool. One of the things that we can do to prepare for fire season is to prepare our homes, to reduce the risk to them if a fire comes through. This has broader benefits than just protecting our home--as one of the priorities of firefighters is protecting property, it frees up their time and resources to be spent elsewhere.
Many fire departments offer home inspections to educate people on what they can do to prepare their home and create defensible space. However, their resources are limited and they have not always been able to meet their inspection goals. An educational app to evaluate home wildfire risk, while not able to fully replicate the benefits of an in-person visit, could help fill this gap.
The first step in building this app will be training a program to distinguish safe vs. unsafe states of various home features. For example, it should be able to distinguish:
- decks with flammable debris on them vs. clean decks
- gutters clogged with plant material vs. clean gutters
- houses with combustible materials close to the walls vs. houses with cleared surroundings
It should also be able to detect things like:
- Large piles of combustible materials
- Certain kinds of attic / soffit vents, which should be covered with wire mesh
In this post, we'll use just one of these as an example: distinguishing between decks with debris on them and decks without. This is a simple image classification problem, one which neural nets are well suited for. Here, we'll use a pretrained image detection neural net, and fine tune it for this specific task. In this way, we'll take advantage of the pretrained model's existing ability to detect things like edges, gradients, textures, etc, and have it learn to apply these to the new task of recognizing debris on decks. This is called transfer learning.
To do this, we'll be using PyTorch and the fastai deep learning library, and folllowing the general approach layout out in the fastai course/book.
import fastbook from fastbook import * from fastai.vision.widgets import *
Having defined the task, the next step is to identify the dataset that will be used to train the model. In this case, I was not able to find an existing dataset suited to this purpose, so I went about building one. To do so, I simply searched using google and bing image search for pictures of decks with flammable debris like leaves and pine needles on them. This turned out to be time consuming, as these images were sparse and I was unable to find a search term that returned only suitable images, so I had to comb through manually. The following was carried out with 35 data points for debris on decks. On the other hand, collecting images of clean decks was easily done by downloading the first 100 images for the search 'wood deck'.
The image files are sorted in the file structure by house feature type and risk state--in this case, /images/deck/debris/ and /images/deck/reference/. Here we create a list of categories and create a path Object for their shared parent directory, for easy reference.
deck_types = 'reference','debris' path = Path('images/deck')
We can access the image file names using the fastai method get_image_files:
fns = get_image_files(path) fns
Fastai provides an abstraction called a DataBlock to prepare and organize data and for training. The DataBlock knows:
- how to access the data
- the type of data and predictions the model should expect
- how to read the features and labels from the data, in other words, x and y
- how to split the data into a training set and validation set
- any data manipulation (transforms) that should be performed in between loading the data and training the model
decks = DataBlock( blocks=(ImageBlock, CategoryBlock), get_items=get_image_files, splitter=RandomSplitter(valid_pct=0.2, seed=45), get_y=parent_label, item_tfms=Resize(128))
In this case, we're specifying that:
- the data will be images, and the predictions will be categories
- we'll get the data with the get_image_files method shown before
- we'll split the data into train and validation sets randomly
- we'll get y, the label, from the file's parent folder name
- we'll resize all the images to dimensions of 128 x 128
Using the DataBlock, we can then create DataLoaders, a PyTorch object containing our data in a form we can use to train the model. The dataloader allows the data to be iterated through, splits it into batches, allows for parallelization in training, and for applying data transformations by batch. We'll use the DataBlock's dataloader method, which will create the training and validation dataloaders:
dls = decks.dataloaders(path)
So now we've gotten the data in a form that's ready for training. Before we go further, we'll do things to improve the model's performance. One is add a data augmentation method to the DataBlock as a batch transform. Here we're using the fastai method aug_transforms. This takes the images in the batch and applies a variety of manipulations to them, including stretching, warping, cropping, rotation, and color alterations. These increase the variety of the data, hopefully leading to more a more robust model, able to recognize a broader variety of images. It's important, when doing this, that the transformations actually provide reasonable data, that you could actually expect to in production. We can add additional arguments to the datablock using the new method:
decks = decks.new(item_tfms=Resize(128), batch_tfms=aug_transforms(mult=2)) dls = decks.dataloaders(path) dls.train.show_batch(max_n=8, nrows=2, unique=True)
These seem like pretty reasonable images, so we'll use this when training our model.
The second thing we'll do is use RandomResizedCrop to get all our images to the same dimensions, rather than just resizing them. This will preserve their original aspect ratio (before doing the data augementation), and crop to a random piece of the image of the desired aspect ratio, and resize them to the desired dimensions. This provides some additional data augmentation, making the trained model more robust to variation in size and how much of the object to classify's features are captured. It also ensures that if there's any systematic variation in the image aspect ratio between the different categories, that these are not picked up as a feature for the model to detect and use for classification.
decks = decks.new( item_tfms=RandomResizedCrop(224, min_scale=0.5), batch_tfms=aug_transforms()) dls = decks.dataloaders(path) dls.train.show_batch(max_n=8, nrows=2, unique=True)