Because the workload involved in making changes to test procedures like that is massive. Most likely there'll be a separate article for that, possibly with a more limited scope of CPUs - there are 37 CPUs in the game test charts here after all. 12 games x 37 CPUs x even just 10 minutes per game test = 74 hours of testing. Most likely each game test takes more than that, and of course there's data collection, processing and analysis on top of this, plus all the time needed to build, tear down and re-build systems for testing, re-imaging OSes to avoid driver issues when swapping chips, and more. Even if you're able to run several tests in parallel some of the time, that's still a massive time expenditure to re-test everything for a new test suite.