To approach prolonged context prompts efficiently, designs involve sturdy recall abilities. The 'Needle In the Haystack' (NIAH) evaluation steps a model's power to precisely recall data from the vast corpus of knowledge. We Improved the robustness of the benchmark by utilizing one of 30 random needle/question pairs for every prompt and testing on a