Abstract: Despite significant advancements in text-to-motion syn-thesis, generating language-guided human motion within 3D environments poses substantial challenges. These challenges stem primarily ...