Typically, you would train the algorithm on one set of data and then test it on ...

Typically, you would train the algorithm on one set of data and then test it on another. So you might train it on data from FY 2010 and then test it out in a simulation of FY 2011.

The fear is that if you train it on FY 2010 and then it does well in a simulation of FY 2010, it might only be because it has stored some representation of a record of FY 2010 which is extremely predictive of FY 2010 but doesn't generalize well to any other year. Testing the algorithm against a simulation of FY 2011 would reveal this flaw.