Hi John, Roland, Ethan,
I'm bumping this thread back up. Apologies for leaving it dangling for
some time.
I believe we may see a handful of TDS services of this scale (> 2M
files) in support of the ESG IPCC/CMIP5 activity within the next 12
months or so.
I appreciate your assistance with addressing our large scale TDS needs.
Regards,
-Eric
John Caron wrote:
Hi Eric, Nathan:
On 1/4/2011 1:29 PM, Eric Nienhouse wrote:
Hi Roland, John, Ethan,
I'm sorry for not posting this to the thredds list, which I am happy
to do. However, I thought I would raise this to you first as it
relates to the TDS and performance in our environment here in
CISL/VETS at NCAR.
I like to post to that group so others can follow along and see if it
also applies to them, so Im cc'ing there.
Thank you, John. I'm curious about other experiences with large scale
TDS installations as well.
We're close to the 2 million file mark in one of our production ESG
TDS servers (which supports www.earthsystemgrid.org.) I can get you
the specs on our machine running this service (~2 year old AMD
multi-core CentOS). In our experience, it takes about 5 minutes to
initialize the TDS from the underlying thredds catalogs. There are
many catlog refs, all for local catalog files, which represent about
3200 datasets over ~2800 catalog files. (I can provide more detail
if you would like. The service is at: tds.ucar.edu/thredds)
Could you send me a typical config catalog, so I get a sense of what
you are doing?
An example of a base-level catalog can be found at the NCAR ESG data node:
http://tds.ucar.edu/thredds/esgcet/catalog.xml
I'll send you a typical config catalog for one of the catalogs
references in the above by direct email.
This service requires ~30Gb of JVM memory to successfully initialize,
which is a scalability concern for us.
yes indeed
We re-init the TDS often during a new data publication process. We
find after some number of re-inits (likely 50 - 200) the TDS will
re-initialize *very slowly*, often taking hours to re-init. I
speculate this is due to memory resources and perhaps "perm gen"
space with the tomcat / JVM process and/or GC thrashing.
yes, you need to restart Tomcat when/before that happens. Apparently,
Tomcat 7 may be better, but we havent tested yet.
This is my understanding as well. We're still running Tomcat6, and will
likely continue to do so for the foreseeable future. As soon as we have
some experience w/ 7 we'll pass it along.
BTW, in the latest TDS 4.2 reinit is a little flaky, though I expect
it will work for your case. Let me know if you see problems (besides
the permgen problem).
How often do you reinit?
We reinit ~ 10 times a week.
We're anticipating at least double the number of files will be served
at NCAR due to CMIP5 modeling efforts over the next 18 months.
We've considered some possible solutions to the eventual, slow load
such as:
1) Restarting the TDS routinely.
2) "partitioning" TDS instances and thereby the files over multiple
processes or hosts.
We're curious, too, if there may be some tuning we could do w.r.t.
the TDS that may help the situation (so far we've only increased JVM
heap memory.) Do you have any initial recommendations?
At the moment we dont have any tuning for this, but I think a quick
fix is to add the ability to not cache the catalogs, but read them
each time, maybe by setting the "expires" attribute or adding a
"cache" attribute. Better would be to use an LRU cache like ehcache,
but that will take longer to implement.
Thanks for these insights. I'm interested in pursuing these - please
let me know what we can do to help.
This wont help the startup time that much (it will help some), mostly
the memory use.
To improve startup time we need caching of the info in catalogs that
dont change. Do all your catalogs get rewritten, or only the ones that
change (ie can we use lastModified on the OS File to detect changes) ?
I believe only the changed catalogs are rewritten to the file system.
The "root" catalog is rewritten as well (even if the catalog ref list
content is unchanged.)
John