Modern GPU's can render millions of polygons with great ease, it's the combination with complex shading and other computational elements that poses the challenges.
There are always tradeoffs to be made with respect to the data set that eventually gets feed to the targeted GPU, we must assume that people that are into real-time graphics have enough knowledge to know what models will fit their data set and have thus own power to determine if a given model is optimized enough to their needs or not.
For a stock media developer of real-time content it is always the challenge of making every polygon count at whatever level of detail he is making that content.
However, the decisions where to make the tradeoffs with regards to texel resolution or mesh density depends on the final application of the model, a stock media developer cannot know what the application eventually is going to be, but he can target it at some.
For example an anatomical 3D model of a human body including blood vessels, nervous system, all internals+ tendons and muscles, etc. would exceed 2 mill poly's very rapidly even at optimized levels, jet still it would pose no problem to display it on recent GPU's in real-time (given for example it uses basic shading and not to many other things are in the scene).
The model would fit perfectly well for some real-time educational visualization, but obviously it would contain way to many details to use it as a regular game character.
Long story short, we cannot simply set a baseline for poly count.
And as Nixonpang put it, best thing indeed a content provider can do is report in most details possible about the data he is providing.