My quest to build the birdsong autolabeler continues.

Here's initial results showing how well a simple convolutional neural network (CNN) classifies birdsong syllables compared to my previous results obtained with a support vector machine (SVM).

svm v. cnn

I trained the CNN on the samples I had used previously to train the SVM. In three out of four birds whose song I tested, the CNN always achieved higher accuracy than the SVM. In the remaining bird, the SVM was able to obtain comparable accuracy, given enough training data. But notice that in all four cases the neural network obtained higher accuracies with less training data! Bottom line: the neural net lets you do less work.

I obtained these results with the script This script is a prototype of the generic "train" script for the package I'm developing, hybrid-vocal-classifier(HVC). If you really want to dig into the results in the shelve (database) files that script outputs, they're here, along with the csv files with training history and hdf5 files with the weights for each CNN trained. You can also test the CNNs trained on song from bird with ID 'gy6or6' on the audio files of his song in the HVC test directory.

If you just want to reproduce the figure, here's code to do so by pulling the summary results from my not-Github-pages site:

import json
import urllib.request

#from Anaconda
import numpy as np

    'bird 1':'gr41rd51',
    'bird 2':'gy6or6',
    'bird 3':'or60yw70',
    'bird 4':'bl26lb16'
BIRD_NAME_LIST = ['bird 1','bird 2','bird 3','bird 4']
NUM_SONGS_TO_TEST = list(range(3,16,3)) + [21,27,33,39]
REPLICATES = list(range(1, 6))
# i.e., [3,6,9,...15,21,27,33,39].

response = urllib.request.urlopen(DATA_URL).read()
results_dict = json.loads(response.decode('utf-8'))
#plot lin-SVM, SVM-RBF, and k-NN all together on same graph
fig = plt.figure()
for val,bird_name in enumerate(BIRD_NAME_LIST):
    pos = val+1
    ax = plt.subplot(2, 2, pos)
    results_key = BIRD_NAMES_DICT[bird_name]
    results = results_dict[results_key]

                 fmt='-k',label='SVM-RBF (Tach.)')
    svm_Tach_avg_acc = np.asarray(results['svm_Tach_test_avg_acc'])
    for ind,x_tick in enumerate(NUM_SONGS_TO_TEST):
        y = svm_Tach_avg_acc[:,ind]
        x = np.random.normal(x_tick,0.08, size=len(y))
        plt.plot(x, y, 'k.', alpha=0.3, markersize=8)

    #flatwindow neural net
                 fmt='-r',label='ANN (flatwindow)')
    flatwindow_avg_acc = results['flatwindow_avg_acc']
    for ind,x_tick in enumerate(NUM_SONGS_TO_TEST):
        y = np.asarray(flatwindow_avg_acc)[:,ind]
        x = np.random.normal(x_tick,0.08, size=len(y))
        plt.plot(x, y, 'r.', alpha=0.3, markersize=8)

    plt.xlabel('number of songs used')
    plt.ylabel('Average accuracy across labels\n (mean and std. dev.)')
plt.tight_layout(pad=0.4, w_pad=2.0, h_pad=3.0)