Hi Jon:
I will follow bad internet etiquette and not bottom post, and I think
I agree with a lot of what you said except for the fact of whether
scientists want the original files. Let me give an example. Our high
resolution satellite day might have a global file for each day. I
want a time series of that data for a small region off of California.
You know what, I actually do not what to download the several thousand
files taking up many gigabytes to get that data. It would be nice if
I could get just that region for my time period in a single file. You
can do this with THREDDS/OPeNDAP, and that is the use case we see
most. Or think of the lagrangean case I mentioned with animal tags
where the tags only have position and the scientist wants an
environment variable along that track. Again, they do not want to
have to download all of the files to get that small amount of data.
The ability to subset the data before the download, and as discussed
at GO-ESSP sometimes perform server-side functions ( eg give me a
time series of the integrated heat in the upper 150m over a region),
which is then the "data" that I will be using over and over again in
the analysis.
When we start to include these type of uses into our use cases, then
we need to rethink our services. There is also one other point, one I
made to Ben privately. The use cases all assume that the user in the
use case will actually be willing to use your service. Our experience
is that if you are not delivering data in the form and the in the way
that they think about and use data, they will go elsewhere for the
data. It may not be the "Right Way" as decreed from above, but you
ignore your users at your own peril.
-Roy
On Oct 14, 2008, at 1:05 AM, Jon Blower wrote:
Hi Ben,
I think this is a very nice set of use cases, and I was also very
interested in the discussion with Roy M that ensued. These use cases
give good examples of "what" a user might want to do with FES data and
"why". I think it's just as valuable to look in a bit more detail at
"how" they might want to do things. This is another dimension through
"use case space", if you like.
One could divide methods of use into "get and forget" and "get and
reuse". Let me explain further:
1) A decision-maker responding to an emergency situation needs to get
the right data as quickly as possible to help make the decision.
After
having done this the data can be thrown away, or perhaps archived for
auditing purposes. Anyway, the data aren't reused. It probably
doesn't
matter too much if the data have been manipulated in some way to
expedite the process.
2) A scientist performing a detailed analysis on a dataset (e.g. a
reanalysis) needs to look at the data from a whole load of directions
and perform lots of analysis tasks. In this case the user will
probably
want the original data (probably in the original data files), and will
keep the data over an extended period of time. The scientist needs to
be confident that the data have not been manipulated by the server.
One could also think of these cases as being "real time use" and
"offline use" respectively. The priorities of each case are
different:
in case (1) the emphasis is on getting data quickly (requiring a
"clever" server); in case (2) the emphasis is on being confident that
the data are "correct" (requiring a "dumb" server). Scientists can
also
operated in "real time" mode when performing initial explorations on
data, prior to detailed analysis.
I think WCS fits in best with case (1) because case (2) can be
satisfied
simply by serving files in a sensible format (i.e. CF-NetCDF) from
some
kind of file server. Clearly there are some broad-brush
generalizations
here, but do others basically agree with this?
Cheers, Jon