Java实现Shazam声音识别算法的实例代码

https://www.jb51.net/article/147139.htm

Shazam算法采用傅里叶变换将时域信号转换为频域信号,并获得音频指纹,最后匹配指纹契合度来识别音频。这篇文章给大家介绍Java实现Shazam声音识别算法的实例代码,需要的朋友参考下吧

Shazam算法采用傅里叶变换将时域信号转换为频域信号,并获得音频指纹,最后匹配指纹契合度来识别音频。

1、AudioSystem获取音频

奈奎斯特-香农采样定理告诉我们,为了能捕获人类能听到的声音频率,我们的采样速率必须是人类听觉范围的两倍。人类能听到的声音频率范围大约在20Hz到20000Hz之间,所以在录制音频的时候采样率大多是44100Hz。这是大多数标准MPEG-1 的采样率。44100这个值最初来源于索尼,因为它可以允许音频在修改过的视频设备上以25帧(PAL)或者30帧( NTSC)每秒进行录制,而且也覆盖了专业录音设备的20000Hz带宽。所以当你在选择录音的频率时,选择44100Hz就好了。

定义音频格式:

123456789public static float sampleRate = 44100;public static int sampleSizeInBits = 16;public static int channels = 2; // doublepublic static boolean signed = true; // Indicates whether the data is signed or unsignedpublic static boolean bigEndian = true; // Indicates whether the audio data is stored in big-endian or little-endian orderpublic AudioFormat getFormat() {return new AudioFormat(sampleRate, sampleSizeInBits, channels, signed,bigEndian);}

调用麦克风获取音频,保存到out中

123456789101112131415161718public static ByteArrayOutputStream out = new ByteArrayOutputStream();1try {AudioFormat format = smartAuto.getFormat(); // Fill AudioFormat with the settingsDataLine.Info info = new DataLine.Info(TargetDataLine.class, format);startTime = new Date().getTime();System.out.println(startTime);SmartAuto.line = (TargetDataLine) AudioSystem.getLine(info);SmartAuto.line.open(format);SmartAuto.line.start();new FileAnalysis().getDataToOut("");while (smartAuto.running) {checkTime(startTime);}SmartAuto.line.stop();SmartAuto.line.close();} catch (Throwable e) {e.printStackTrace();}

获取到的out数据需要通过傅里叶变换,从时域信号转换为频域信号。

傅里叶变换

1234567891011121314151617181920212223242526272829303132333435public Complex[] fft(Complex[] x) {int n = x.length;// 因为exp(-2i*n*PI)=1,n=1时递归原点if (n == 1){return x;}// 如果信号数为奇数,使用dft计算if (n % 2 != 0) {return dft(x);}// 提取下标为偶数的原始信号值进行递归fft计算Complex[] even = new Complex[n / 2];for (int k = 0; k < n / 2; k++) {even[k] = x[2 * k];}Complex[] evenValue = fft(even);// 提取下标为奇数的原始信号值进行fft计算// 节约内存Complex[] odd = even;for (int k = 0; k < n / 2; k++) {odd[k] = x[2 * k + 1];}Complex[] oddValue = fft(odd);// 偶数+奇数Complex[] result = new Complex[n];for (int k = 0; k < n / 2; k++) {// 使用欧拉公式e^(-i*2pi*k/N) = cos(-2pi*k/N) + i*sin(-2pi*k/N)double p = -2 * k * Math.PI / n;Complex m = new Complex(Math.cos(p), Math.sin(p));result[k] = evenValue[k].add(m.multiply(oddValue[k]));// exp(-2*(k+n/2)*PI/n) 相当于 -exp(-2*k*PI/n),其中exp(-n*PI)=-1(欧拉公式);result[k + n / 2] = evenValue[k].subtract(m.multiply(oddValue[k]));}return result;}

计算out的频域值

123456789101112131415161718192021private void setFFTResult(){byte audio[] = SmartAuto.out.toByteArray();final int totalSize = audio.length;System.out.println("totalSize = " + totalSize);int chenkSize = 4;int amountPossible = totalSize/chenkSize;//When turning into frequency domain we'll need complex numbers: SmartAuto.results = new Complex[amountPossible][];DftOperate dfaOperate = new DftOperate();//For all the chunks: for(int times = 0;times < amountPossible; times++) {Complex[] complex = new Complex[chenkSize];for(int i = 0;i < chenkSize;i++) {//Put the time domain data into a complex number with imaginary part as 0: complex[i] = new Complex(audio[(times*chenkSize)+i], 0);}//Perform FFT analysis on the chunk: SmartAuto.results[times] = dfaOperate.fft(complex);}System.out.println("results = " + SmartAuto.results.toString());}

总结

以上所述是小编给大家介绍的Java实现Shazam声音识别算法的实例代码,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对脚本之家网站的支持!

作者: 执着小钟

执着小钟

发表评论