Upsampling fixed to "power of 2" Sampling Frequency

There is an option in my Rivo+ to upsample the bitrate and sample frequency to a ‘fixed’ output.
To prevent rounding errors and phase distortion associated with fractional conversions it’s better to do it natively in “power of 2”.

My DAC can handle max 768kHz PCM but not all conversions are good because of the “power of 2”
Like 44.1kHz should be 705.6kHz @ output instead of 768kHz.

Is it possible (or new feature) to let the system checks the samplerate and output to MAX with respect to the “power of 2” so it outputs 705.6kHz instead of 768kHz for 44.1kHz signals and 48kHz signals will be 768kHz (and on) ?

Software wise it shouldn’t be that a big change but for a high-end dedicated streamer it is.
Please let me know what Dev Team thinks about it.

You should think about the hardware that could have 48 MHz oscillators for USB and 54 MHz for SBC. You should forget about 44.1 kHz sampling and work at 48 kHz or multiples.

44100 Hz and 48000 Hz are divisible by 300, you have no problems even with integers, they are like 147 and 160.

The 44100 Hz at 16 bits you can bring them to multiples of 48 kHz at 64 bits and then reduce them to 16 bits, 24 bits or in Float 32 (IEEE 754 with single precision, if you prefer also double).

There are precautions to take but you will use software and you will not have to think about it too much, try to choose the best one, but from SoX onwards you will not find easily audible problems.

Your DAC will do the same job, it will increase the sample rate it receives in input to offer you the best possible conversion, and when you feel differences it will be due to the use of different output filters, which vary depending on the input sampling rate.

Let the DAC do this job, not Volumio, it’s the best choice. For example the Burr-Brown PCM1716 (today Texas Instruments) uses 8 times oversampling and 8 levels to achieve higher precision, and the first USB DAC, the PCM2702, requires a 12 MHz oscillator and when used in an isochronous mode packets at 44.1 kHz it adds a noise not mentioned in the datasheet, essentially because there is a difference in the number of packets transmitted, but we are talking about the first USB DAC and maybe today it doesn’t make sense, and it certainly doesn’t make any sense for those like us who bought Volumio’s excellent hardware.

Thanks for your response but in my case i really can hear differences when upsampled to 32/768kHz.
It’s not night and day, very from, but on microlevel and instrument seperation there is.
What i suggest is that Volumio should upsample with factor 2 to an maximum rate which your DAC can handle.
Lets say your DAC can handle PCM 768kHz, the output should be automatically set to 705.6kHz for 44.1kHz material and not 768kHz.
For an streamer put in the high-end market it’s a big fault to me.
Now you are introducing errors in the digital domain.

Here we have a much bigger mistake in the language domain. I wrote to you not to do the oversampling via Volumio software. Of course there are the differences you describe, I feel them too, we use a small computer inside our devices that has little computing power and little RAM, a portion of these resources we use to carry out an additional workload within a limited margin of time, we can only do it in real time. The simplicity you suggest using is necessary but whatever you make Volumio (the software) do, even normalising the audio levels will have an impact on the quality of sound.

I imagine that you use a USB DAC, it is the best component to perform a hardware oversampling, but forget that it transferred to 44100 Hz perfectly, it only does it on average.

Isochronous transfers are used to transfer data in real-time between host and device. When an isochronous endpoint is set up by the host, the host allocates a specific amount of bandwidth to the isochronous endpoint.

The isochronous endpoints run at a rate of one transfer per 125 us or 8,000 transfers per second.

In USB Audio each transfer always carries a whole number of samples; alternating transfers carry 48 and 40 bytes (6 and 5 stereo samples), so that the average rate works out as 44.1 bytes per transfer.

For this reason I write that it is better to switch to 48 kHz, which you will have noticed is also an exact submultiple of the clock. You have to look at how it is made and how the hardware works at a low level to realise that this choice has already been made, it lacks someone to explain it to the music industry that prefers to sell the songs by weight rather than take care of quality. You have no idea how many plugins are used in production that resort to oversampling, in fact today the real limit is not the hardware.

For example (also this datasheet confirms your impressions, the DAC sounds better if you go in at 96 or 192kHz):

Used inside:

With this DAC you can hear huge differences by oversampling the output from the Volumio interface, if you read the manual find out why, will change also the output filter (see page 8). Having a device that sounds in 4 different ways does not match my concept of high fidelity.

As you can see, I fully understand that your chain sounds better when you have higher sample rates at the input, and I also understand that the software doesn’t always do a good job.

I’ve tested my devices with Volumio 3 and 4, and in both cases I can resample to exact multiples of 44.1 kHz, but if your Rivo Plus doesn’t display the values ​​in the drop-down menu, we’ll just have to wait for @Nerd or @Darmur to explain why. The only thing is, if your library was 48 kHz, you wouldn’t have encountered this problem.

But out of curiosity, what DAC did you connect to the Rivo plus?

Hey @Kingpin,

@Celona, thanks for the ping. Happy to weigh in on this one because there is a real engineering question buried in it that is worth separating from the marketing wrapper. Furthermore, being a nerd obsessed with physics, math, what not - in my usual OCD meaner.

A bit on terminology first

“Power of 2” is not quite the property you are after. 705.6 kHz is 16 x 44.1, and 16 happens to also be 2^4, so it is both. But what actually matters for the math is “integer multiple of the source rate”, not “power of two”. You could upsample 44.1 by 3 to 132.3, or by 5 to 220.5, and those would be just as clean mathematically as 705.6. The reason the “power of 2” framing comes up so often is historical: old converter chips implemented oversampling as cascaded 2x stages because half-band FIR filters are cheap to build that way, and DSP textbooks lean on power-of-two FFT block sizes. It is an implementation convenience, not a quality property.

So the request is really “always upsample to the nearest integer multiple of the source rate” - 44.1 source goes to 705.6, 48 source goes to 768, and so on. That is a legitimate and implementable feature, and worth discussing on its own merits.

The “introduces errors in the digital domain” claim

This is the bit that needs evidence rather than assertion, and the SRC4392 datasheet that was linked earlier in the thread is itself a useful piece of evidence. It lets us compare integer and non-integer conversion in measurable numbers from one device:

fSIN:fSOUT 44.1 → 44.1 THD+N -140 dB, DR 141 dB
fSIN:fSOUT 44.1 → 48 THD+N -140 dB, DR 141 dB (non-integer, 147:160)
fSIN:fSOUT 44.1 → 96 THD+N -140 dB, DR 141 dB (non-integer)
fSIN:fSOUT 44.1 → 192 THD+N -137 dB, DR 138 dB (non-integer)
fSIN:fSOUT 96 → 192 THD+N -137 dB, DR 138 dB (integer 1:2)

[TI SBFS029D, pages 4-5]

The non-integer 44.1 → 48 conversion measures identically to the synchronous 44.1 → 44.1 pass-through at -140 dB THD+N and 141 dB dynamic range. The integer 96 → 192 conversion measures slightly worse (-137 dB) than the non-integer 44.1 → 96. So on this hardware ASRC chip the integer vs non-integer distinction does not produce a measurable advantage in either THD+N or dynamic range.

These artefacts sit roughly 140 dB below full scale. For context, that is about 40 to 50 dB below the noise floor of even very good DAC analog stages, and 60+ dB below the threshold of human hearing in a quiet room. Modern software resamplers (SoX VHQ, r8brain, the better Linux ones) measure even cleaner than this 2007 chip. The “rounding errors and phase distortion associated with fractional conversions” exist mathematically but are not the audible artefacts in any modern chain.

The Musical Fidelity DAC linked as evidence

Worth flagging this because it cuts against the original argument rather than supporting it. The M3x DAC manual states explicitly on page 7:

“Upsampling is always on for PCM data up to 192kHz on any input.
Incoming sample rates up to 192kHz are resampled to 192kHz.”

And page 10 confirms the SRC converter is the SRC4392.

So whatever you send into this DAC at 44.1, it is non-integer resampled to 192 kHz internally using the exact chip whose measurements appear above. The DAC sounding different at 96 vs 192 input is real, but it is the result of the DAC selecting a different input filter profile (see page 8, where the front-panel filter button is documented), not the math of the conversion ratio. If your chain ends in this DAC or anything architecturally similar, the integer-ratio Volumio output is being undone milliseconds later by the DAC itself.

What you might actually be hearing

To be clear, I am not dismissing the impression. Reiss 2016 (JAES 64:6, open access) is a meta-analysis of 18 studies covering 400+ participants and 12,500+ trials, and it found a small but statistically significant ability of trained listeners to discriminate hi-res from CD-quality. So “audible difference exists at all” is on the table, in trained ears, on the right material. Meyer & Moran 2007 (JAES 55:9) did 554 ABX trials on hi-end systems and found chance-level performance, which is also data.

If you are reliably hearing a difference between 705.6 and 768 output from a 44.1 source, the candidate explanations in order of likelihood are:

  1. The DAC selects a different reconstruction filter at different rates. Many DACs do this without exposing it to the user. The SRC4392 itself has selectable steep vs slow roll-off filters with different ringing behaviour.

  2. Level mismatch. Even 0.2 dB is reliably perceived as “better”.

  3. Expectation bias. Sighted listening is famously susceptible.

  4. A genuine resampler implementation difference in the Rivo+ pipeline (worth investigating if measurable, but distinct from the ratio question).

If you want to know which of these is doing the work, the protocol is well established: capture both outputs, level-match to within 0.05 dB, ABX with foobar2000 or Lacinato, minimum 16 trials, look for 13/16 or better to claim p < 0.05. It is the same protocol the published studies used. Below that bar, the difference is plausibly noise.

On the feature itself

I am open to “upsample to nearest integer multiple of source rate” as a user-selectable preference. It is a sensible option to offer alongside the existing modes, and the implementation is not unreasonable. What I would push back on is framing the current behaviour as a fault or as introducing audible errors, because the measurement evidence (including the very datasheet linked above) does not support that framing. Let us discuss it as a preference rather than a defect, and we can look at where it fits the pipeline.

References for anyone who wants to dig in

  1. TI SRC4392 datasheet, SBFS029D, pages 4-5 (THD+N and dynamic range across all ratios), page 8 (filter characteristics), page 35 (group delay options).

  2. Musical Fidelity M3x DAC manual, pages 7, 8, 10 (always-on upsampling to 192, filter button, SRC4392 specified).

  3. Meyer, E. B. and Moran, D. R. (2007). “Audibility of a CD-Standard A/DA/A Loop Inserted into High-Resolution Audio Playback”. JAES 55:9, 775-779. https://www.aes.org/e-lib/browse.cfm?elib=14195

  4. Reiss, J. D. (2016). “A Meta-Analysis of High Resolution Audio Perceptual Evaluation”. JAES 64:6, 364-379. Open access: https://aes2.org/publications/elibrary-page/?id=18296

  5. Infinite Wave resampler comparisons (sweep, impulse, stopband measurements for SoX, SRC, r8brain and others): https://src.infinitewave.ca/

  6. Smith, J. O., “Digital Audio Resampling Home Page”, CCRMA Stanford: Digital Audio Resampling Home Page


@Darmur, Any further thoughts?

Kind Regards,

2 Likes

Thanks for the explanation @Celona / @Nerd / @Darmur.

My DAC is a Gustard X26III.
For PCM it accepts max 32/768kHz.

The Rivo+ is connected through an I2S cable to the DAC.

If you turn on NOS mode and stream at max rate it bypasses both filtering options in the custom FPGA DSP and ESS chip (that’s what told me).

When i was using HQPlayer i saw the “upsample to nearest integer multiple of source rate” in my DAC displaying the samplerate corresponding to the input.
That’s why i was wonder why it didn’t do it in the Rivo+.
Again thanks for your explanations!

Sorry for the long time it took me to reply. I set up a Raspberry Pi Zero 2 W with Volumio 4 (a quad-core 64-bit ARM Cortex-A53 processor clocked at 1GHz and only 512MB of SDRAM), logged in with ssh from Terminal, and generated a test file.

time sox -n -b 16 -c 2 -r 44100 16bit441.wav synth 3 sine 1000 vol -1dB

I haven’t found any particular problems; even with the less powerful hardware that Volumio 4 runs on, writing the file to the SD card takes up about 1% of the hardware resources.

Using a hat DAC with I2S interface I resampled the microSD test audio to every possible rate up to 384kHz, as follows:

sox 16bit441.wav -b 32 32bit384.wav rate -v -L 384k

Not satisfied, I played a web radio with the output set to 384 kHz at 32 bits, and in the meantime I repeated the same operation from the terminal, without noticing any problems.

On your hardware, this conversion should have a limited impact, about 10% of the resources. Sonically, I noticed more pronounced sibilance in the voices, but no problems.

In the test file I left 1 dB of headroom because it was necessary, however you may encounter problems if the loudness war has brought peaks exceeding this threshold into your library.

All you have to do is try it out. To use the minimum-phase filter, change -L to -M, of course the real final test will be with your ears.

I’ll come back later to finish this message; I have to run to the office now.