/*------------- Telecommunications & Signal Processing Lab ------------- McGill University Routine: CompAudio [options] AFileA [AFileB] Purpose: Compare audio files, printing statistics Description: This program gathers and prints statistics for one or two input audio files. The signal-to-noise ratio (SNR) of the second file relative to the first file is printed. For this calculation, the first audio file is used as the reference signal. The "noise" is the difference between sample values in the files. This program can also be invoked with just one file name. In that case, only the statistics for that file are printed. Multi-channel audio files are treated as if they were single channel files with the effective sampling frequency increased by a factor equal to the number of channels. For each file, the following statistical quantities are calculated and printed. Mean: Xm = SUM x(i) / N Standard deviation: sd = sqrt [ (SUM x(i)^2 - Xm^2) / (N-1) ] Max value: Xmax = max (x(i)) Min value: Xmin = min (x(i)) For data which is restricted to the range [-32768,+32767], two additional counts (if nonzero) are reported. Number of Overloads: Count of values taking on values -32768 or +32767, along with the number of such runs. For 16-bit data from a saturating A/D converter, the presence of such values is an indication of a clipped signal. Number of Anomalous Transitions: Dividing the 16-bit data range into 2 positive regions and 2 negative regions, an anomalous transition is a transition from a sample value in the most positive region directly to a sample value in the most negative region or vice-versa. A large number of such transitions is an indication of wrapped values or byte-swapped data. An optional delay range can be specified when comparing files. The samples in file B are delayed relative to those in file A by each of the delay values in the delay range. For each delay, the SNR with optimized gain factor (see below) SNR is calculated. For the delay corresponding to the largest SNR, the full regalia of file comparison values is reported. Conventional SNR: SUM xa(i)^2 SNR = ------------------------------------------- . SUM xa(i)^2 - 2 SUM xa(i)*xb(i) + SUM xb(i) The corresponding value in dB is printed. SNR with optimized gain factor: SNR = 1 / (1 - r^2) , where r is the (normalized) correlation coefficient, SUM xa(i)*xb(i) r = -------------------------------------- . sqrt [ (SUM xa(i)^2) * (SUM xb(i)^2) ] The SNR value in dB is printed. This SNR calculation corresponds to using an optimized gain factor Sf for file B, SUM xa(i)*xb(i) Sf = --------------- . SUM xb(i)^2 Segmental SNR: This is the average of SNR values calculated for segments of data. The segment length by default corresponds to 16 ms (128 samples at a sampling rate of 8000 Hz). However if the sampling rate is such that the segment length is less than 64 samples or more than 1024 samples, the segment length is set to 256 ssamples. For each segment, the SNR is calculated as SUM xa(i)^2 SS(k) = log10 (1 + --------------------------) . 0.01 + SUM [xa(i)-xb(i)]^2 The term 0.01 in the denominator prevents a divide by zero. This value is appropriate for data with values significantly larger than 0.01. The additive unity term discounts segments with SNR's less than unity. The final average segmental SNR is calculated as SSNR = 10 * log10 ( 10^[SUM SS(k) / N] - 1 ) dB. The subtraction of the unity term tends to compensate for the unity term in SS(k). If any of these SNR values is infinite, only the optimal gain factor is printed as part of the message (Sf is the optimized gain factor), "File A = Sf * File B". Options: The command line specifies options and file names. -d DL:DU, --delay=DL:DU Specify a delay range. Each delay in the delay range represents a delay of file B relative to file A. The default range is 0:0. -s SAMP, --segment=SAMP Segment length (in samples) to be used for calculating the segmental signal-to-noise ratio. The default is a length corresponding to 16 ms. -P PARMS, --parameters=PARMS Parameters to be used for headerless input files. This option may be given more than once. Each invocation applies to the files that follow the option. See the description of the environment variable RAWAUDIOFILE below for the format of the parameter specification. -h, --help Print a list of options and exit. -v, --version Print the version number and exit. Environment variables: RAWAUDIOFILE: This environment variable defines the data format for headerless or non-standard input audio files. The string consists of a list of parameters separated by commas. The form of the list is "Format, Start, Sfreq, Swapb, Nchan, ScaleF" Format: File data format The lowercase versions of these format specifiers cause a headerless file to be accepted only after checking for standard file headers; the uppercase versions cause a headerless file to be accepted without checking the file header. "undefined" - Headerless files will be rejected "mu-law8" or "MU-LAW8" - 8-bit mu-law data "A-law8" or "A-LAW8" - 8-bit A-law data "unsigned8" or "UNSIGNED8" - offset-binary 8-bit integer data "integer8" or "INTEGER8" - two's-complement 8-bit integer data "integer16" or "INTEGER16" - two's-complement 16-bit integer data "float32" or "FLOAT32" - 32-bit floating-point data "text" or "TEXT" - text data Start: byte offset to the start of data (integer value) Sfreq: sampling frequency in Hz (floating point number) Swapb: Data byte swap parameter "native" - no byte swapping "little-endian" - file data is in little-endian byte order "big-endian" - file data is in big-endian byte order "swap" - swap the data bytes as the data is read Nchan: number of channels The data consists of interleaved samples from Nchan channels ScaleF: Scale factor Scale factor applied to the data from the file The default values for the audio file parameters correspond to the following string. "undefined, 0, 8000., native, 1, 1.0" AUDIOPATH: This environment variable specifies a list of directories to be searched when opening the input audio files. Directories in the list are separated by colons (semicolons for MS-DOS). Author / version: P. Kabal / v1r11 1996/08/12 Copyright (C) 1996 ----------------------------------------------------------------------*/ static char rcsid[] = "$Id: CompAudio.c 1.43 1996/08/14 AFsp-V2R1 $"; #include /* prototype for exit */ #include #include #include "CompAudio.h" #ifndef EXIT_SUCCESS # define EXIT_SUCCESS 0 /* Normally in stdlib.h */ #endif #define DSEGTIME 16E-3 /* Segment size in seconds */ #define NSSEG_MAX 1024 #define NSSEG_MIN 64 #define NSSEG_MID 256 #define DSFREQ 8000. /* Default sampling frequency */ int main (argc, argv) int argc; const char *argv[]; { const char *NHparms[2]; const char *Fname[2]; char Fn[FILENAME_MAX+1]; int delayL, delayU, delayM; int Nfiles; AFILE *AFpA; AFILE *AFpB; long int NsampA, NsampB, NchanA, NchanB; float Sfreq, SfreqA, SfreqB; struct Stats_F StatsA, StatsB; struct Stats_T StatsT; long int Nsseg; /* Get the input parameters */ CAoptions (argc, argv, &delayL, &delayU, &Nsseg, NHparms, Fname); if (Fname[1] == NULL) Nfiles = 1; else Nfiles = 2; /* Open the input files */ if (NHparms[0] != NULL) AFsetNH (NHparms[0]); else AFsetNH ("$RAWAUDIOFILE"); FLpathList (Fname[0], "$AUDIOPATH", Fn); AFpA = AFopenRead (Fn, &NsampA, &NchanA, &SfreqA, stdout); if (Nfiles == 2) { if (NHparms[1] != NULL) AFsetNH (NHparms[1]); else AFsetNH ("$RAWAUDIOFILE"); FLpathList (Fname[1], "$AUDIOPATH", Fn); AFpB = AFopenRead (Fn, &NsampB, &NchanB, &SfreqB, stdout); if (NchanA != NchanB) UThalt ("%s: Unequal numbers of channels", PROGRAM); } /* Sampling frequency */ if (Nfiles == 2) { if (NsampA != NsampB) UTwarn ("%s - Number of samples differ, %ld : %ld", PROGRAM, NsampA, NsampB); if (SfreqA == 0.0) Sfreq = SfreqB; else if (SfreqB == 0.0) Sfreq = SfreqA; else Sfreq = 0.5 * (SfreqA + SfreqB); if (SfreqA != SfreqB && Sfreq > 0.0) UTwarn ("%s - Sampling frequencies differ, using %.0f", PROGRAM, Sfreq); } else Sfreq = SfreqA; if (Sfreq <= 0.0) { Sfreq = DSFREQ; UTwarn ("%s - Sampling frequency assumed to be %.0f Hz", PROGRAM, Sfreq); } /* Choose a block size which is a multiple of 16 ms long */ if (Nfiles == 2) { if (Nsseg == 0) { Nsseg = MSdNint (DSEGTIME * NchanA * Sfreq); if (Nsseg < NSSEG_MIN || Nsseg > NSSEG_MAX) { Nsseg = NSSEG_MID; UTwarn ("%s - Segment size set to %d samples", PROGRAM, Nsseg); } } } if (NsampA <= 0 && (Nfiles == 1 || NsampB <= 0)) exit (EXIT_SUCCESS); /* Individual file statistics */ CAstats (AFpA, &StatsA); if (Nfiles == 2) CAstats (AFpB, &StatsB); /* Find the cross file statistics over the delay range */ if (Nfiles == 2) CAcomp (AFpA, AFpB, Nsseg, delayL, delayU, &delayM, &StatsA, &StatsB, &StatsT); /* Close the files */ AFclose (AFpA); if (Nfiles == 2) AFclose (AFpB); /* File A statistics */ printf ("\n"); if (Nfiles ==2 && StatsT.Ndiff > 0) printf (" File A:\n"); CAprstat (&StatsA); /* File comparisons */ if (Nfiles == 2) { if (StatsT.Ndiff == 0) { /* Identical files */ if (delayL < delayU || delayL != 0) printf ("\n File A = File B (delay = %d)\n", delayM); else printf ("\n File A = File B\n"); } else { /* File B statistics */ printf (" File B:\n"); CAprstat (&StatsB); if (delayL < delayU) printf (" Best match at delay = %d\n", delayM); else if (delayL != 0) printf (" Delay = %d\n", delayM); CAprcorr (&StatsA, &StatsB, &StatsT); } } return EXIT_SUCCESS; }