Archimago's Musings: MQA Core vs. Hi-Res Blind Test Part I: Procedure

Friday 15 September 2017

MQA Core vs. Hi-Res Blind Test Part I: Procedure

Introduction:

As you are probably aware, through the last 2 months on this blog, I've been collecting data to determine the audibility of decoded MQA versus the same piece of music originating from the "master" high resolution source. Please refer to the Test Invite article from mid July 2017.

Over the last couple of years, I have been curious about the MQA technique and followed the evolution of the "technology" as it has been touted as being the next step in digital music playback. Throughout the last 2 years, I have tried to develop an understanding of what it's doing beyond simply the superficial talk around being "revolutionary", "fundamentally changing the way we all enjoy music", nebulous claimed links to "neuroscience", or talk of bringing the "studio sound" to the consumer. If you look back, I have written articles looking at:

- Initial musings...

- The undecoded MQA file.

- Decoded vs. undecoded Explorer2 output.

- Dicussions of MQA's partially lossy nature.

- Comparison of TIDAL / MQA Decoding with "studio master" tracks.

- Mytek Brooklyn hardware MQA decoding.

- Discussion of digital filters and relevance to MQA.

- Comparison between Meridian Explorer2 vs. Mytek Brooklyn ("Authentication?").

- Dragonfly Black MQA "Rendering" and filter.

- Mytek Brooklyn MQA filters.

- The "full monty" of MQA filters from the Dragonfly Black MQA-enabled DAC.

Bit by bit, we've dissected and discussed pieces of the technology and claims that have been made. We have an understanding now of the types of upsampling digital filters being used to "render" for example. But one piece - an extremely important piece - so far has not been well explored. This is the "black box" of the MQA encoding / decoding step. Since we do not have access to the MQA encoder, what we can explore is the outcome in the form of a more systematic listening test beyond anecdotal comments. Over the years, stories from all kinds of people in the audiophile press that the technique simply "sounds better" have surfaced. Typically, this is described as an improvement by a large margin. For example, a few choice claims:

"An inconvenient truth: MQA sounds better!" (J. Darko)

"With MQA, I heard far more definition and separation between instruments and lines in the lighter beginning of the track." (J. V. Serinus)

"Whatever the provenance, a consistent factor in my auditioning of the decoded MQA files was a sense of ease to the sound. High frequencies were in no way dulled, but the treble was consistently sweet." (John Atkinson)

"We first listened to one of Peter’s spectacular opera recordings (Tosca) in the recording’s original 88.2kHz/24-bit format. Then, seconds into the MQA version, my jaw dropped—literally. MQA’s dramatic superiority made the original high-resolution file sound like a pale imitation of the performance, a view shared by Peter. Before the music even began the hall sounded larger, with the acoustic better defined. A glare was stripped away from instrumental textures, leaving behind a gorgeously liquid rendering of timbre. The positions of instruments on the stage, and of the musicians within the acoustic, were precisely defined. Even the applause at the end (it was a live recording) was vastly more realistic." (Robert Harley)

"The entire space of the recording opened up, unfolded?, into a more realistic-sounding space; more relaxed, more air, greater ease. Coupled with this improved spatial information, which I'd classify as RFO (Really Fucking Obvious), instruments took up a more solid position within this improved space and they sounded subtly ever that much sweeter." (Michael Lavorgna)

"One of the things we’re told is that a big advantage to MQA files is that they are corrected for time-domain smearing. How this translates to your ears is anybody’s guess, but on Babe I’m Gonna Leave You the A/B difference between the 16/44 and the 24/48 MQA version was not subtle to me. An increased clarity of individual instruments – guitars, and percussion in particular. Voices took on a more human quality in their room presence. Everything in the song seemed to have an increased jump factor to it, a more palpable, tangible musicality to the experience." (Rafe Arnott)

"DSD and high-resolution downloads never sound completely right or real to me. MQA does." (Herb Reichert)

Of course this is all on top of the advertising claims by Meridian & MQA. As discerning consumers, we can appreciate the tendency of advertisers to use some liberal "artistic licensing" in their claims.

I fully accept that the reviewers above are speaking about different music and with different gear. However, they are all dealing with and claiming that the differences heard once processed through the MQA technique is clearly different and presumably better - an endorsement that this "format" is worthy of the audiophile's attention.

It is on the basis of this background that I decided it was time to put these kinds of claims to the test. And who else to turn to for answers than to "real life" audiophiles out there in the wild. You... Ladies and gentlemen!

Procedure:

Before we can get to any results, as in any "more academic" endeavor, we need to set the stage for what was done here to understand the intent and procedure.

First, because this is a naturalistic study intended to be a home evaluation distributed over the Internet, I must try my best to maintain the "blind". In an era where anyone can download Audacity freely and have a look at waveforms and FFTs, the only way to do this is to make sure obvious difference don't stick out on superficial analysis. This was why the decision was made to:

1. Compare from the basis of starting at 24/88 or 24/96. To maintain the highest sound quality, it's best to keep the testing procedure within the digital domain rather than doing something like recording the output from a DAC. The only way to do this is to capture the digital data coming out of the "MQA Core" decoder which is at a native 24/88 or 24/96 depending on the base frequency of the MQA file. At this time, TIDAL is really the only practical way to consume a substantial amount of MQA-encoded music. Since most TIDAL subscribers would not have MQA-capable DACs, in real life, this software-decoded 88/96kHz signal would be the audio data being heard. And from the perspective of Hi-Res Audio, I know of no research evidence to say anything higher than 88/96kHz is needed to achieve sonic transparency.

2. Use the same upsampling filter so that we achieve an "apples-to-apples" comparison of the MQA decode. As shown previously, MQA uses short, "leaky" FIR filters when upsampling the 88/96kHz decode to achieve the final 192+kHz output as appropriate. This seems to be what "rendering" is about when MQA is enabled (such as with the Dragonfly Black). The filtering used by MQA creates obvious FFT high frequency imaging which would be easy to spot if I did not standardize the way upsampling is handled. Yes, I could have just provided 24/96 and 24/88 files for testers to try, but in the interest of modeling the filtering so that everybody hears the output as through MQA's short minimum-phase filter, this provides another level of standardization and opportunity for all to experience what such a filter would sound like.

Remember that most popular songs on TIDAL are simply output at 88/96kHz, so I settled on the "standard" 24/96 filter shown with the Dragonfly Black and Mytek Brooklyn DACs previously and which can be easily modeled with iZotope RX 5:

This impulse response is in fact also the default filter used in the Mytek Brooklyn when playing back anything in 24/96 whereas with the Dragonfly DAC, the default non-MQA filter is a typical steep roll-off variety. As you'll see in the section below on song selection, the second song "Gjeilo: North Country II" in fact originates as a 24/96 master. Therefore what you hear on your DAC of this track upsampled to 24/192 using the above filter is essentially what you're going to hear on an actual MQA DAC.

Now, let's talk about the song selection... I selected 3, all from the 2L Test Bench. Why, you might ask?

1. For one, there's the matter of convenience. They are meant to be used as test material so I can at least feel better about distributing them for the intended testing purpose provided by the copyright holder.

2. We know that the provenance is indeed hi-res in origin so there is the potential for MQA's encoding to affect the noise floor. Since we know MQA uses the lowest significant bits to "fold" down into, the potential dynamic range with MQA encoding would be less than a true 24-bits (probably between 18 and 20-bits depending on material).

Some people have complained that I did not use more "popular" music even if not real high-resolution and that maybe these sound better through the MQA process. I do not believe this is reasonable because of what MQA claims to be - an encoding technique that brings high-resolution to streaming. How would we assess that unless there was some potential to hear the benefits of hi-res? Furthermore, the whole basis of the "de-blurring" is based on having access to the initial recording and knowledge of equipment used. Is there any reason to believe whatever timing improvement is made by MQA can affect standard or even low-res audio?

3. These were MQA's demonstration tracks! Remember, in CES 2016, these 2L tracks were presented to audiophiles as examples of the benefits of this "revolutionary" technology. It was supposed to "take off big-time in 2016". 2L started charging a higher premium for these MQA files; often priced between the original DXD samplerate and 96/192kHz downsamples. What better tracks to use than those squarely aimed at being a showcase!

As I stated when I introduced this test, the 3 tracks are:

Arnesen: Magnificat 4. Et miseracordia: (~1:53 sample) beautifully recorded classical vocal piece with orchestral accompaniment. Listen for the vocal placement, instrument soundstage depth and width, tonal quality in the voice, etc...

Gjeilo: North Country II: (~2:00 sample) a subtler piece consisting of primarily piano music with some understated accompaniment. Listen for the purity and sense of three-dimensional "realism" of the instruments. Listen to the temporal characteristics such as attack and quality of the decay of the notes.

Mozart: Violin Concerto in D maj (Allegro) [Original 2006]: (~2:00 sample) a lively and beautifully executed orchestral piece highlighting the violin of course. Great tempo, timre of instruments, attack, and "see" if you can delineate the spatial positioning of instruments in the soundstage. I would have loved to test out the 2016 MQA remix but there appears to be something wrong with the file as the MQA version would not decode properly!

Each have their own unique sound, instrumentation, and complexity. The hope of course is that with these differences, we might be able to detect audible variation and discern preferences in A/B testing.

As you can see, each sample was trimmed to approximately 2 minutes in duration. I trust this is a long enough sample length to hear any differences, allow for comparisons, and provide some opportunity for listeners to evaluate their emotional engagement.

So, how then were the sample tracks created? Simple...

1. Start with the MQA tracks and the original studio master. The North Country II track starts as 24/96, I kept it at that. However, the Magnificat and Violin Concerto in D maj tracks began as DXD (24/352.8). As I stated above, since I wanted to compare apples-to-apples, I downsampled these to 24/88.1kHz using iZotope RX with a very sharp "brick wall" filter. Why the very sharp brick-wall filter? Why not? :-) Since audiophiles often claim that brickwall filters are bad, we might as well do this as a "worse case" scenario to compare MQA and standard PCM. Remember, this is still at 24-bit/88kHz sampling rate so whatever impulse response "ringing" people might fret about would be ultrasonic:

Note that the frequency axis should be 2x at 88.2kHz so this will be an "ideal" very sharp resampler with cutoff frequency right at 44.1kHz.

2. Extract the MQA Core decoded ("unfolded") data. Because the same tracks were individually downloaded rather than streamed, I used Audirvana+; the latest version at the time of the listening test (3.0.7). One could use the desktop version of TIDAL on the PC to achieve the same with streaming the MQA data.

The Dragonfly Black was not plugged in at the time of the decoding. You see I'm using a recent generation Apple MacBook. To "rip" the decoded audio, I used Soundflower and Audacity (very easily done), making sure of course that the samplerate and bit-depths are appropriate.

3. The tracks (88/96kHz) were upsampled to 176.4/192kHz using the MQA-like filter setting as shown above. iZotope RX 5 was used to perform the minimum-phase, gentle roll-off filter.

4. For the final edit, I simply used Adobe Audition on my Windows PC workstation to trim the samples at the same point and same duration. 1.5-2 second fade in/out were added at the front and/or end.

5. The samples were randomized by flips of a coin, renamed and ZIPped and uploaded to download sites: filehosting.org, Amazon Drive, and Private Bits (thanks again Ingemar).

6. Some advertising / invitations were sent out on various audiophile forums - AudioAsylum, Steve Hoffman Forum, Squeezebox Forum, Computer Audiophile, Hydrogen Audio, among others where audiophiles linger! Of course, those who visit my blog here would have been reminded in my various posts weekly to check out the blind test and encouraged to submit their results.

7. Arbitrarily, a ~2 month window for submission of results on a survey site was opened (July 15 - September 8). Responses to most of the survey questions were mandatory to ensure completeness. Although not recorded within the data set, cookies were used to prevent re-submission of completed data from the same computer (although for some reason there were 2 submissions that were duplicates that I manually removed). Within the survey, I asked about:

- Demographics - gender, age by decade, continent of origin, involvement in music and audio reviewing.

- Preference for either sample A/B for each of the tracks.

- Confidence for the preference (1 = no real difference to 4 = clear differences).

- Information about audio system: components (esp. headphones vs. speakers), approximate price.

- Text boxes for other information - including subjective impressions.

As you can see, to get this done properly, the "subjects" needed to have a DAC / computer / server capable of 24/176.4 and 24/192 playback. I suggested taking 30 minutes to run the comparisons while taking notes and 10 minutes to fill out the survey... I see that many of you took more time than the suggestions. Some used both headphones and speakers, some tried a couple of different DACs, some used foobar ABX Comparator, etc... Hats off to you dedicated audiophile listeners!

To keep myself from potentially expressing any biases, I refrained from looking at the results through the 2 month data collection period. Other than quickly glancing at the text comments on the survey to make sure there were no major issues and that the survey link was working properly, I had no idea how the preferences were leaning.

As usual, thanks to my "beta testers" for providing feedback and "bugs" before the test went live in July :-).

Questions I hope to answer:

I trust it's clear that this is not a simple test where I'm just asking for a brief poll of "what sounds best". Beyond the headline statistics, within the data set there is the opportunity to explore for nuances. Here are the main questions I wished to explore:

1. Is there evidence that the MQA sample actually sounded "better"? Presumably if this is the case, more people will show a preference towards the MQA sample in this blind test. This should be the case for example if there is some kind of beneficial time domain "deblur" embedded in the encoding/decoding of the data and not simply the selection of the 16 slightly different upsampling filters (which we can experiment with independently). This I think is one of the benefits of subjecting both the MQA decode and Hi-Res audio file to the same upsampling filter. It lets us tease out if there is anything special about the MQA CODEC itself; or conversely the potential that the partially lossy encoding might result in degraded sound quality even when properly decoded compared to an original 24-bit file.

I know some folks have raised criticism that I did not include an option for "No Preference". There is a reason for this. It creates a force choice so that even if consciously one doesn't have a clear preference, perhaps subconsciously there is still some desire to pick one over the other. The listener can still express that he/she heard essentially no difference with a low "Confidence" rating.

2. Based on the preferences, is there evidence that the audibility is easily discernible. This should show up in the "Confidence" feedback. Did most listeners feel the sonic difference was obvious? Remember, even if the overall impression is that MQA sounds better, it doesn't necessarily mean much when it comes to improved quality of listening if the magnitude of the difference is minimal!

3. Is there a difference in preference (standard Hi-Res vs. MQA decode) between headphone users and those who used speakers only? Obviously when it comes to "soundstage", the presentation will be quite different!

4. Let's look at the "golden ear" subgroup - those who were able to select the same - either MQA Core decode or direct Hi-Res PCM - for all 3 answers! What are the demographic characteristics (eg. age group, cost of system, headphone vs. speakers, rated confidence) of these "golden ears"?

I can think of a few more questions to answer... But I think this core set will be a great start :-).

As I mentioned last week, I received a total of 83 competed submissions. With 3 audio tracks each, that means I have data for 249 individual comparisons. Like I said before, I think this is the largest data set for a "naturalistic" listening test of MQA where the listeners are actually at home (or where ever they prefer to listen), using their own equipment, in their own "sound space" sitting in the best listening spot, done in a blinded fashion. I think this surely must provide a clearer picture than individual impressions and opinions or a group of folks sitting around an unfamiliar sound system in an audio show!

-----------------------------

Alas, my friends, I am out in Ottawa for work this week so will not be able to do the analysis until I get back to Vancouver. Anyone recommend a good music or hi-fi store here?

I'm also taking delivery of a new car next week, so I have some other toys to play with :-). In any event, I will endeavor to get the results out soon... No need to rush with the write-up; better to get the job done to satisfaction than a rush job IMO. Stay tuned as we proceed along!

Hope you're all enjoying the music...

NEXT: MQA Core vs. Hi-Res Part II: Core Results

8 comments:

PhilFromTO16 September 2017 at 06:41
Planet of Sound is a reasonably good, rather old-fashioned audio store in Ottawa. (At least the TO one is.) You might want to check it out, but I doubt it will have anything (except maybe some demo/consignment stuff) that you don't see in BC:
https://www.planetofsoundonline.com
ReplyDelete
Replies
Unknown28 September 2017 at 13:16
Hello,
this is extremely good work!
You revealed MQA is the emperor's new cloths.

I don't see what MQA tries to "repair" at all?
The claimed "time-smearing" simply is just fantasy and non existent.

Keep on!
ReplyDelete
Replies
Unknown21 October 2017 at 11:30
also know How to Change Minecraft Name
ReplyDelete
Replies
Glen Rasmussen5 December 2017 at 09:02
As has been previously posted, Lossless vrs lossy, argument, has some merit, however, from a streaming perspective, Tidal MQA has a distinct advantage over regular 16/44 offered my a majority of the Competitors. The soundstage & dynamics of a majority of MQA samples that I have compared from by Blusound, DAC vrs a Sonos/Deezer stream was quite noticable. Some of the newly recored Flac recordings with a lot of energy, was hard to discern in AB comparisons, but Voice, jazz, piano pieces has a huge soundstage improvement in MQA. I have done this comparison 3 times now over 8 months and most of the participants in our AB study pick out the MQA versions, a statistically significant portion of the time. For a extra 5$ a month, it is a no-brainer. The number of Albums in my most recent Google spreadsheet search is over 2000 albums and climbing. Not fast enough for my liking, but gaining acceptance. Razz
ReplyDelete
Replies
Glen Rasmussen5 December 2017 at 09:04
Tidal MQA albums spreadsheet. Glen Rasmussen
https://docs.google.com/spreadsheets/d/10VtON9VjMAt3uyHC2-Oo2MjIa3orv9DKZfwiRQKmTAA/edit#gid=945476039
ReplyDelete
Replies
DappaD15 May 2021 at 01:01
I came across the blog whilst looking for more info on MQA. For curiosity, I just skipped to the results and was quite surprised at how close the findings were. So then I thought I’d read the whole article to see how it was undertaken.
This is a great piece of work and admire the time and effort that has gone into it....BUT....when I got the the procedure part I was quite disappointed with what I read.

It appears that to make a ‘Fair’ comparison and to stop anyone using measuring software to obtain the results, all tracks were upsampled and given an MQA-like filter.

To me this has made the whole test totally pointless. You are infact comparing MQA against MQA. This is like taking the ‘Pepsi Challenge’ but being given 2x cups of Pepsi and being asked to tell the difference......of course it’s pretty much going to be guess work as you are forced to make a call one way or another.

I think a more real world comparison would now be Hi-Res Tracks from Qobuz vs the same bit and sample rate MQA from Tidal........WITHOUT any manipulation of tracks from either steaming service.

This is what people are actually choosing between and I imagine the majority of people are just using their ears to judge the difference and not using computer software.

This is just my opinion on how I think the procedure has a MAJOR flaw and is not really testing what it claims to be testing. I’d love to undertake the test Qoboz vs Tidal to see what the results show.
ReplyDelete
Replies

Add comment