1. 语音类实现 (实现读取wav/pcm,STFT)
创始人
2024-03-01 19:20:12
0

[C++ 基于Eigen库实现CRN前向推理]

第一部分:WavFile.class (实现读取wav/pcm,实现STFT)

  • 前言:(Eigen库使用记录)
  • 第一部分:WavFile.class (实现读取wav/pcm,实现STFT)
  • 第二部分:Conv2d实现
  • 第三部分:TransposedConv2d实现 (mimo,padding,stride,dilation,kernel,outpadding)
  • 第四部分:NonLinearity (Sigmoid,Tanh,ReLU,ELU,Softplus)
  • 第五部分:LSTM
  • GITHUB仓库

1. 读写语音文件(wav/pcm)

1.1 wav文件读写

语音文件有很多后缀格式,这里只实现了两种后缀名,.wav和.pcm。

  • WAV文件遵守资源交换文件格式之规则,在文件的前44(或46)字节放置标头(header),使播放器或编辑器能够简单掌握文件的基本信息,其内容以区块(chunk)为最小单位,每一区块长度为4字节,而区块之上则由子区块包裹,每一子区块长度不拘,但须在前头先宣告标签及长度(字节)。
  • 标头的前3个区块记录文件格式及长度;
  • 接着第一个子区块包含8个区块,记录声道数量、采样率等信息;
  • 接着第二个子区块才是真正的音频资料,长度则视音频长度而定。
  • 内容如下表所示。须注意的是,每个区块的端序不尽相同,而音频内容本身则是采用小端序。

在这里插入图片描述

  • 读Wav文件
    图中所示是一个wav的标准格式,但wav还有其他不同的存储格式,有些有扩展块之类的,需要根据具体的语音文件来确定。比如我用python中soudfile生成的语音,其format_tag=3,format_length=18(通常为16)。由于我没用到其他格式的情况,直接在下方用if else讨论format_tag=1和=3的情况。
    总而言之,读取wav其实就是一个解析头文件的过程,把对应的头信息存下来,再去读取数据就好了。-
    读wav文件的具体代码给出(后面会给整体的h文件和cpp文件):

    void Wav_File::LoadWavFile(const char *file_path) {FILE *fp = fopen(file_path, "rb");if (fp) {fread(this->id_riff, sizeof(char), 4, fp);this->id_riff[4] = '\0';fread(&this->file_size, sizeof(int16_t), 2, fp);fread(this->id_wave, sizeof(char), 4, fp);this->id_wave[4] = '\0';fread(this->id_fmt, sizeof(char), 4, fp);this->id_fmt[4] = '\0';fread(&this->format_length, sizeof(int16_t), 2, fp);fread(&this->format_tag, sizeof(int16_t), 1, fp);fread(&this->channels, sizeof(int16_t), 1, fp);fread(&this->sample_rate, sizeof(int16_t), 2, fp);fread(&this->avg_bytes_sec, sizeof(int16_t), 2, fp);fread(&this->block_align, sizeof(int16_t), 1, fp);fread(&this->bits_per_sample, sizeof(int16_t), 1, fp);if (this->format_tag == 1) {fread(this->id_data, sizeof(char), 4, fp);this->id_data[4] = '\0';fread(&this->data_size, sizeof(int16_t), 2, fp);this->wav_size = this->data_size / sizeof(int16_t);this->ip_raw = (int16_t *) malloc(this->data_size);fread(this->ip_raw, sizeof(int16_t), this->wav_size, fp);this->data = (float_t *) malloc(this->wav_size * sizeof(float_t));for (int i = 0; i < this->wav_size; i++) {this->data[i] = (float_t) (this->ip_raw[i] * 1.0 / 32768);}} else if (this->format_tag == 3) {fread(&this->cb_size, sizeof(int16_t), 1, fp);fread(this->id_data, sizeof(char), 4, fp);this->id_data[4] = '\0';fread(&this->data_size, sizeof(int16_t), 2, fp);this->wav_size = this->data_size / sizeof(float);this->fp_raw = (float *) malloc(this->data_size);fread(this->fp_raw, sizeof(float), this->wav_size, fp);this->data = (float_t *) malloc(this->wav_size * sizeof(float_t));for (int i = 0; i < this->wav_size; i++) {this->data[i] = this->fp_raw[i];}}}fclose(fp);}
    
  • 写出Wav文件
    Wav文件写出和读入是完全对应的,把读到的信息反向写出就可以了。
    如果对数据有所操作,需要去找对应的变量进行修改,比如长度,需要更新总文件的file_size以及data chunk的data_size。
    具体代码给出:

    void Wav_File::WriteWavFile(const char *dest_path) {for (int i = 0; i < wav_size; i++) {if (format_tag == 1)this->ip_raw[i] = (int16_t) (this->data[i] * 32768);elsethis->fp_raw[i] = this->data[i];}FILE *fp = fopen(dest_path, "wb");if (fp) {fwrite(this->id_riff, sizeof(char), 4, fp);fwrite(&this->file_size, sizeof(int16_t), 2, fp);fwrite(this->id_wave, sizeof(char), 4, fp);fwrite(this->id_fmt, sizeof(char), 4, fp);fwrite(&this->format_length, sizeof(int16_t), 2, fp);fwrite(&this->format_tag, sizeof(int16_t), 1, fp);fwrite(&this->channels, sizeof(int16_t), 1, fp);fwrite(&this->sample_rate, sizeof(int16_t), 2, fp);fwrite(&this->avg_bytes_sec, sizeof(int16_t), 2, fp);fwrite(&this->block_align, sizeof(int16_t), 1, fp);fwrite(&this->bits_per_sample, sizeof(int16_t), 1, fp);if (this->format_tag == 3) {fwrite(&this->cb_size, sizeof(int16_t), 1, fp);}fwrite(this->id_data, sizeof(char), 4, fp);fwrite(&this->data_size, sizeof(int16_t), 2, fp);if (this->format_tag == 1) {fwrite(this->ip_raw, sizeof(int16_t), this->wav_size, fp);} else if (this->format_tag == 3) {fwrite(this->fp_raw, sizeof(float), this->wav_size, fp);}}fclose(fp);}
1.2 PCM文件读写

PCM文件没有格式,是直接存的数据,所以需要自己知道采样率,通道数,以及数据类型,然后直接读写就可以了,代码如下:

void Wav_File::LoadPcmFile(const char *file_path) {FILE *fp = fopen(file_path, "rb");// 计算文件长度fseek(fp, 0L, SEEK_END);this->wav_size = ftell(fp) / sizeof(float);fseek(fp, 0L, SEEK_SET);//申请内存空间 这边自己知道存储的是float32,所以申请float的内存空间this->fp_raw = (float *) malloc(this->wav_size * sizeof(float));fread(this->fp_raw, sizeof(float), this->wav_size, fp);fclose(fp);// 这部分是统一转换成float32存在data里,因为wav是用int16存的for (int i = 0; i < this->wav_size; i++) {this->data[i] = this->fp_raw[i];}
}void Wav_File::WritePcmFile(const char *dest_path) {for (int i = 0; i < this->wav_size; i++) {this->fp_raw[i] = this->data[i];}FILE *fp = fopen(dest_path, "wb");fwrite(this->fp_raw, sizeof(float), this->wav_size, fp);fclose(fp);
}

2. STFT实现

2.1 设置STFT参数,窗函数

这块主要设置窗长,窗移,窗函数类型等,并为频谱数据申请内存空间
提供了三种窗实现,hanning,hamming和sqrt窗(我自己叫的)。
可以看到有两份spec和mag,pha内存,spec是自定义复数类型,用于存储由蝶形算法生成的频谱(因为对FFT实现不是很熟悉,根据网上照猫画虎,只能对2^N窗长进行变换)。
而另一份是对应不同基的FFT,没搞懂是用的现成的,具体代码就不在这放了,可以去github上看。

void Wav_File::setSTFT(int64_t frame_size, int64_t frame_shift, const char *win_type) {this->frame_size = frame_size;this->frame_shift = frame_shift;this->fft_size = this->frame_size;this->frame_num = this->wav_size / frame_shift + 1;this->win_type = win_type;this->setWindow(this->win_type);this->spec = (Complex **) malloc((this->fft_size / 2 + 1) * sizeof(*spec));for (int i = 0; i < (this->fft_size / 2 + 1); i++)this->spec[i] = (Complex *) malloc(this->frame_num * sizeof(Complex));this->mag = (float_t **) malloc((this->fft_size / 2 + 1) * sizeof(*this->mag));for (int i = 0; i < (this->fft_size / 2 + 1); i++)this->mag[i] = (float_t *) malloc(frame_num * sizeof(float_t));this->phase = (float_t **) malloc((this->fft_size / 2 + 1) * sizeof(*this->phase));for (int i = 0; i < (this->fft_size / 2 + 1); i++)this->phase[i] = (float_t *) malloc(frame_num * sizeof(float_t));// nfftthis->spec_real = (float_t **) malloc(this->frame_num * sizeof(*spec_real));for (int i = 0; i < this->frame_num; i++)this->spec_real[i] = (float_t *) malloc((this->fft_size / 2 + 1) * sizeof(float_t));this->spec_imag = (float_t **) malloc(this->frame_num * sizeof(*spec_imag));for (int i = 0; i < this->frame_num; i++)this->spec_imag[i] = (float_t *) malloc((this->fft_size / 2 + 1) * sizeof(float_t));this->spec_mag = (float_t **) malloc(this->frame_num * sizeof(*spec_mag));for (int i = 0; i < this->frame_num; i++)this->spec_mag[i] = (float_t *) malloc((this->fft_size / 2 + 1) * sizeof(float_t));this->spec_pha = (float_t **) malloc(this->frame_num * sizeof(*spec_pha));for (int i = 0; i < this->frame_num; i++)this->spec_pha[i] = (float_t *) malloc((this->fft_size / 2 + 1) * sizeof(float_t));}void Wav_File::setWindow(const char *winType) {double_t a0, a1;this->window = (float_t *) malloc(this->frame_size * sizeof(float_t));int frac = this->frame_size - 1;if (strcmp(winType, "hamming") == 0) {for (int i = 0; i < this->frame_size; i++) {this->window[i] = float(0.540000000 - 0.460000000 * cos(PI * 2 * i / frac));}} else if (strcmp(winType, "hanning") == 0) {for (int i = 0; i < this->frame_size; i++) {this->window[i] = float(0.500000000 - 0.500000000 * cos(PI * 2 * i / frac));}} else if (strcmp(winType, "sqrt") == 0) {for (int i = 0; i < this->frame_size; i++) {this->window[i] = (float) sqrtf((0.5f - 0.5f * cosf(i / (float) (this->frame_size - 1) * 2 * PI)));}}
}
2.2 2^N版FFT
void Wav_File::STFT() {std::vector padding_data;for (int i = 0; i < this->frame_shift; i++)padding_data.push_back(this->data[this->frame_shift - i]);for (int i = 0; i < this->wav_size; i++)padding_data.push_back(this->data[i]);for (int i = 0; i < this->frame_shift; i++)padding_data.push_back(this->data[this->wav_size - i - 2]);this->real = (float_t *) malloc(sizeof(float_t) * this->frame_size);this->imag = (float_t *) malloc(sizeof(float_t) * this->frame_size);for (int i = 0; i < this->frame_num; i++) {for (int j = 0; j < this->frame_size; j++) {this->real[j] = padding_data[i * this->frame_shift + j] * this->window[j];this->imag[j] = 0.0;}FFT(this->real, this->imag, this->fft_size);for (int j = 0; j <= this->fft_size / 2; j++) {this->spec[j][i].real = this->real[j];this->spec[j][i].imag = this->imag[j];}}
}void Wav_File::ISTFT() {this->real = (float_t *) malloc(sizeof(float_t) * this->frame_size);this->imag = (float_t *) malloc(sizeof(float_t) * this->frame_size);auto *temp = (float_t *) malloc(sizeof(float_t) * (data_size + 2 * frame_shift));auto *ola = (float_t *) malloc(sizeof(float_t) * (data_size + 2 * frame_shift));for (int i = 0; i < this->frame_num; i++) {//��N_FFT/2+1����N_FFT(����Գ�)for (int j = 0; j < this->fft_size / 2 + 1; j++) {real[j] = spec[j][i].real;imag[j] = spec[j][i].imag;}for (int j = this->fft_size / 2 + 1; j < this->fft_size; j++) {real[j] = spec[this->fft_size - j][i].real;imag[j] = -spec[this->fft_size - j][i].imag;}IFFT(real, imag, this->fft_size);for (int j = 0; j < frame_size; j++) {temp[i * frame_shift + j] += (real[j] * this->window[j]);ola[i * frame_shift + j] += (this->window[j] * this->window[j]);}}for (int i = 0; i < (frame_num + 1) * frame_shift; i++) {if (temp[i] != 0) {temp[i] /= ola[i];}}for (int i = 0; i < (frame_num - 1) * frame_shift; i++)this->data[i] = temp[i + frame_shift];//    超出部分就保持原样,不降噪就好了
//    int sub = this->wav_size - (frame_num - 1) * frame_shift;
//    this->file_size -= sub;
//    this->wav_size -= sub;
//    this->data_size -= sub;
}void Wav_File::FFT(float_t param_real[], float_t param_imag[], int16_t param_n) {float_t treal, timag, ureal, uimag, arg;int32_t m, k, j, t, index1, index2;auto *wreal = (float_t *) malloc(frame_shift * sizeof(float_t));auto *wimag = (float_t *) malloc(frame_shift * sizeof(float_t));this->bitrp(param_real, param_imag, param_n);arg = -2 * PI / param_n;treal = cosf(arg);timag = sinf(arg);wreal[0] = 1.0;wimag[0] = 0.0;for (j = 1; j < param_n / 2; j++) {wreal[j] = wreal[j - 1] * treal - wimag[j - 1] * timag;wimag[j] = wreal[j - 1] * timag + wimag[j - 1] * treal;}for (m = 2; m <= param_n; m *= 2) {for (k = 0; k < param_n; k += m) {for (j = 0; j < m / 2; j++) {index1 = k + j;index2 = index1 + m / 2;t = param_n * j / m;treal = wreal[t] * param_real[index2] - wimag[t] * param_imag[index2];timag = wreal[t] * param_imag[index2] + wimag[t] * param_real[index2];ureal = param_real[index1];uimag = param_imag[index1];param_real[index1] = ureal + treal;param_imag[index1] = uimag + timag;param_real[index2] = ureal - treal;param_imag[index2] = uimag - timag;}}}free(wreal);free(wimag);
}void Wav_File::IFFT(float_t param_real[], float_t param_imag[], int16_t param_n) {float_t treal, timag, ureal, uimag, arg;int m, k, j, t, index1, index2;auto *wreal = (float_t *) malloc(frame_shift * sizeof(float_t));auto *wimag = (float_t *) malloc(frame_shift * sizeof(float_t));bitrp(param_real, imag, param_n);arg = 2 * PI / param_n;treal = cosf(arg);timag = sinf(arg);wreal[0] = 1.0;wimag[0] = 0.0;for (j = 1; j < param_n / 2; j++) {wreal[j] = wreal[j - 1] * treal - wimag[j - 1] * timag;wimag[j] = wreal[j - 1] * timag + wimag[j - 1] * treal;}for (m = 2; m <= param_n; m *= 2) {for (k = 0; k < param_n; k += m) {for (j = 0; j < m / 2; j++) {index1 = k + j;index2 = index1 + m / 2;t = param_n * j / m;treal = wreal[t] * param_real[index2] - wimag[t] * param_imag[index2];timag = wreal[t] * param_imag[index2] + wimag[t] * param_real[index2];ureal = param_real[index1];uimag = param_imag[index1];param_real[index1] = ureal + treal;param_imag[index1] = uimag + timag;param_real[index2] = ureal - treal;param_imag[index2] = uimag - timag;}}}for (j = 0; j < param_n; j++) {real[j] /= param_n;imag[j] /= param_n;}free(wreal);free(wimag);
}void Wav_File::bitrp(float_t param_real[], float_t param_imag[], int16_t param_n) {int i, j, a, b, p;for (i = 1, p = 0; i < param_n; i *= 2) {p++;}for (i = 0; i < param_n; i++) {a = i;b = 0;for (j = 0; j < p; j++) {b = (b << 1) + (a & 1);    // b = b * 2 + a % 2;a >>= 1;        // a = a / 2;}if (b > i) {std::swap(param_real[i], param_real[b]);std::swap(param_imag[i], param_imag[b]);}}
}
2.3 nfft

不放了 看github吧

3. 其他相关操作

3.1 能量归一化
float_t Wav_File::getNorm(float_t scale) {if (scale == 0) {float_t square = 0.0;for (int i = 0; i < this->wav_size; i++) {square += powf(this->data[i], 2);}scale = sqrtf((float_t) this->wav_size / square);for (int i = 0; i < this->wav_size; i++) {this->data[i] *= scale;}} else {for (int i = 0; i < this->wav_size; i++) {this->data[i] /= scale;}}return scale;
}
3.2 求幅度谱
void Wav_File::getMagnitude() {for (int i = 0; i < this->frame_num; i++) {for (int j = 0; j < this->fft_size / 2 + 1; j++) {this->spec_mag[i][j] = sqrtf(powf(spec_real[i][j], 2) + powf(spec_imag[i][j], 2));}}
}
3.3 求相位谱
void Wav_File::getPhase() {for (int i = 0; i < this->frame_num; i++) {for (int j = 0; j < this->fft_size / 2 + 1; j++) {this->spec_pha[i][j] = atan2f(spec_imag[i][j], spec_real[i][j]);}}
}
3.4 幅度谱相位谱还原频谱
void Wav_File::magToSpec() {for (int i = 0; i < this->frame_num; i++) {for (int j = 0; j < this->fft_size / 2 + 1; j++) {spec_real[i][j] = this->spec_mag[i][j] * cosf(this->spec_pha[i][j]);spec_imag[i][j] = this->spec_mag[i][j] * sinf(this->spec_pha[i][j]);}}
}

4. 参考链接

[1] WAV文件格式详解
[2] GITHUB源码

相关内容

热门资讯

常用商务英语口语   商务英语是以适应职场生活的语言要求为目的,内容涉及到商务活动的方方面面。下面是小编收集的常用商务...
六年级上册英语第一单元练习题   一、根据要求写单词。  1.dry(反义词)__________________  2.writ...
复活节英文怎么说 复活节英文怎么说?复活节的英语翻译是什么?复活节:Easter;"Easter,anniversar...
2008年北京奥运会主题曲 2008年北京奥运会(第29届夏季奥林匹克运动会),2008年8月8日到2008年8月24日在中华人...
英语道歉信 英语道歉信15篇  在日常生活中,道歉信的使用频率越来越高,通过道歉信,我们可以更好地解释事情发生的...
六年级英语专题训练(连词成句... 六年级英语专题训练(连词成句30题)  1. have,playhouse,many,I,toy,i...
上班迟到情况说明英语   每个人都或多或少的迟到过那么几次,因为各种原因,可能生病,可能因为交通堵车,可能是因为天气冷,有...
小学英语教学论文 小学英语教学论文范文  引导语:英语教育一直都是每个家长所器重的,那么有关小学英语教学论文要怎么写呢...
英语口语学习必看的方法技巧 英语口语学习必看的方法技巧如何才能说流利的英语? 说外语时,我们主要应做到四件事:理解、回答、提问、...
四级英语作文选:Birth ... 四级英语作文范文选:Birth controlSince the Chinese Governmen...
金融专业英语面试自我介绍 金融专业英语面试自我介绍3篇  金融专业的学生面试时,面试官要求用英语做自我介绍该怎么说。下面是小编...
我的李老师走了四年级英语日记... 我的李老师走了四年级英语日记带翻译  我上了五个学期的小学却换了六任老师,李老师是带我们班最长的语文...
小学三年级英语日记带翻译捡玉... 小学三年级英语日记带翻译捡玉米  今天,我和妈妈去外婆家,外婆家有刚剥的`玉米棒上带有玉米籽,好大的...
七年级英语优秀教学设计 七年级英语优秀教学设计  作为一位兢兢业业的人民教师,常常要写一份优秀的教学设计,教学设计是把教学原...
我的英语老师作文 我的英语老师作文(通用21篇)  在日常生活或是工作学习中,大家都有写作文的经历,对作文很是熟悉吧,...
英语老师教学经验总结 英语老师教学经验总结(通用19篇)  总结是指社会团体、企业单位和个人对某一阶段的学习、工作或其完成...
初一英语暑假作业答案 初一英语暑假作业答案  英语练习一(基础训练)第一题1.D2.H3.E4.F5.I6.A7.J8.C...
大学生的英语演讲稿 大学生的英语演讲稿范文(精选10篇)  使用正确的写作思路书写演讲稿会更加事半功倍。在现实社会中,越...
VOA美国之音英语学习网址 VOA美国之音英语学习推荐网址 美国之音网站已经成为语言学习最重要的资源站点,在互联网上还有若干网站...
商务英语期末试卷 Part I Term Translation (20%)Section A: Translate ...