本项目基于STM32F746实现,鸟叫声音分类作为一种常见的环境音分类任务,也非常适合用于嵌入式AI应用的探索,并且在生态研究、鸟类保护、生物多样性监测都具有重要的现实意义。通过将鸟叫声音分类算法和模型压缩到小型设备中,可以将这些功能带到更多的场景和应用中,例如将鸟叫声音分类技术应用于智能鸟窝监控系统、无人机巡航监测系统等领域,用于评估生态系统的健康状态以及监测气候变化,也可以可以对鸟类的分布情况、迁徙路径、栖息地利用等进行监测和研究。
数据集
https://xeno-canto.org/ 这是一个致力于分享来自世界各地的鸟声的网站
原始下载
-
大小: 51.4 GB(55,284,289,304字节)
-
占用空间: 51.5 GB(55,297,847,296字节)
-
包含: 6,507个文件
-
你可以通过本段Python代码下载原始数据集:
训练集选择
鸟类具有很高的种间差异,我们选择的是四川省内及其附近的8种鸟类进行训练。
{
"Locustella": {
"sp": "chengi",
"ssp": "",
"en": "Sichuan Bush Warbler"
},
"Certhia": {
"sp": "tianquanensis",
"ssp": "",
"en": "Sichuan Treecreeper"
},
"Anser": {
"sp": "albifrons",
"ssp": "frontalis",
"en": "Greater White-fronted Goose"
},
"Tragopan": {
"sp": "caboti",
"ssp": "",
"en": "Cabots Tragopan"
},
"Chrysolophus": {
"sp": "amherstiae",
"ssp": "",
"en": "Lady Amhersts Pheasant"
},
"Tetraogallus": {
"sp": "himalayensis",
"ssp": "koslowi",
"en": "Himalayan Snowcock"
},
"Bambusicola": {
"sp": "thoracicus",
"ssp": "",
"en": "Chinese Bamboo Partridge"
},
"Arborophila": {
"sp": "brunneopectus",
"ssp": "",
"en": "Bar-backed Partridge"
}
}
数据预处理
先将数据分割为1000ms的训练样本,然后通过梅尔滤波器提取特征
神经网络训练
神经网络结构
- Input layer (3,168 features)
- Reshape layer (32 columns)
- 1D conv / pool layer (16 neurons, 3 kernel size, 1 layer)
- Dropout (rate 0.3)
- 1D conv / pool layer (32 neurons, 5 kernel size, 1 layer)
- Dropout (rate 0.3)
- Flatten layer
- Dense layer (64 neurons)
- Dropout (rate 0.3)
- Output layer (9 classes)
训练效果
Accuracy: 93.3%
Loss: 0.51
混淆矩阵
Anser | Arborophila | Bambusicola | Certhia | Chrysolophus | Locustella | Tetraogallus | Tragopan | noise | |
---|---|---|---|---|---|---|---|---|---|
Anser | 100% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
Arborophila | 2.2% | 92.4% | 0% | 0% | 1.1% | 1.1% | 0% | 1.1% | 2.2% |
Bambusicola | 4.8% | 0% | 90.5% | 0% | 0% | 0% | 0% | 0% | 4.8% |
Certhia | 0% | 0% | 0% | 88% | 4% | 8% | 0% | 0% | 0% |
Chrysolophus | 6.3% | 2.1% | 2.1% | 8.3% | 79.2% | 0% | 0% | 0% | 2.1% |
Locustella | 0% | 0% | 0% | 2.6% | 5.3% | 89.5% | 0% | 0% | 2.6% |
Tetraogallus | 0% | 0% | 0% | 0% | 0% | 0% | 95.5% | 0% | 4.5% |
Tragopan | 0% | 0% | 0% | 3.4% | 0% | 0% | 0% | 96.6% | 0% |
noise | 0% | 0% | 0% | 0.4% | 1.2% | 1.6% | 0% | 0% | 96.7% |
f1 score | 0.86 | 0.96 | 0.93 | 0.81 | 0.82 | 0.86 | 0.98 | 0.97 | 0.97 |
性能表现
Inferencing time: 25 ms.
Peak RAM usage: 9.5K
Flash usage: 85.2K