My way of interpreting what the question phrases is: once a Spirit is fully physically manifest would they be sensed just like any other concrete object/event (thus sound as well as sight/touch)...
and yet that is a fully physically present, yet this method seems to require being able to sense the Entity (~astral Form) fully and clearly prior to that. So that seems to be a step in the process:
(being able to contact, then see inwardly, then “inwardly” more clearly see/hear/feel, then that “inwardly” seeing is in a place (not the room your in, but a Room/Place separate, like in a dream- two places- distinct…) and then those are blended (via Rapture to Xrds… and the Shift happens).
As I would describe those “Steps” above are each distinct in the process, so it would seem the final Perception is not using those “inward” quality/channels of sensing… but to get to the final-state requires…
but the key to the above (as I just quickly scribble/type this out) "inward sensing" does not at-all use the Create a picture in your head skills, but is more like when you get an idea AHA arise (it wasn't, &then is)
or your trying to think.. "what was the name of that movie (or whatever)?" .. Think-think-think and then whoosh the image/memory thought-answer arises, like you open a door and look into a room: you See what is there just by looking (thus Spontaneous, and Clarity, as well as Surprise as EA writes in QaVs)
I find that suddenly having an image be there is like a different part of the brain/type of thinking, and thus distinct way of Attn- and having that sustained has a weird quality (as I wrote in another forum post, like noticing you've been watching a TV screen and that another screen is visible that has something on it- which second screen you hadn't noticed before- yet in this metaphor the two Screens are in the same visual field / brain parts.. whatever- Ihopethat makes sense) there is a sharp-HD quality to it. (and the jump of seeing that type of sight is overwritten easy, if "normal-mindstate" ala Castenada Int'lDialogue of the usualPos jumps in).. in that state a bit- xpands, then stabilizes... then can reVeal a different aspect, which can then xpand, and stabilize, etc (thus how a ReVealed Image transitions to LivingImagination, then into Xrds.. etc.)
I hope the above helps, but to get to the question: “whether I have to train my senses first” -many different aspects can be understood by “senses” so could confuse that (not all are equal, right tool for the right job- so to speak)