This paper received the Best Peer-Reviewed Paper award. Virtual reality systems with multimodal stimulation and up to six degrees-of-freedom movement pose novel challenges to audio quality evaluation. This paper adapts classic multiple stimulus test methodology to virtual reality and adds behavioral tracking functionality. The method is based on ranking by elimination while exploring an audiovisual virtual reality. The proposed evaluation method allows immersion in multimodal virtual scenes while enabling comparative evaluation of multiple binaural renderers. A pilot study is conducted to evaluate feasibility of the proposed method and to identify challenges in virtual reality audio quality evaluation. Finally, the results are compared to a non-immersive off-line evaluation method.