Inconsistent & Unexpected Hyrax Server-Side Function Behavior Across Different Sub-Domains

I’m doing some research which requires me to programmatically download satellite data for certain days within a specific latitude/longitude bounding box. This should be a relatively simple task using Hyrax server-side functions to determine which data to get and OPeNDAP to get it. However, I’m running into some problems.

I have a few setups to try accessing these resources using different tools (Python urllib.request, C libcurl, and command line curl), but fundamentally all I am executing is an HTTP GET request and streaming that data into an appropriate file/memory.

One of the products I’m downloading is AquaMODIS L2 Cloud Data from https://ladsweb.modaps.eosdis.nasa.gov/opendap/hyrax/RemoteResources/laads/61/MYDATML2/YYYY/DDD/FILENAME.hdf. Using the version function, I can see that this endpoint does not support many standard Hyrax functions like geogrid, but does support functions like range, which I could not find any documentation for.

(Just in case anyone has the above issue, my solution was to query https://ladsweb.modaps.eosdis.nasa.gov/api/v1/files/product=MYDATML2&collection=61&areaOfInterest=x-121y32,x-116y35&dateRanges=YYYY-MM-DD, a separate API which happens to point to the same data. The download link this API supplies requires some auth, though, so you can just plug the back half of it into the front half of the opendap/hyrax URL to get that data.)

Another product I’m downloading is AquaMODIS L3 OC Data from https://oceandata.sci.gsfc.nasa.gov/opendap/MODISA/L3SMI/YYYY/MMDD/FILENAME.nc. Using version on this endpoint reveals that geogrid is supported, although when I get an example URL like https://oceandata.sci.gsfc.nasa.gov/opendap/MODISA/L3SMI/2002/0101/AQUA_MODIS.20020101_20021231.L3m.YR.RRS.Rrs_555.4km.nc.dap.nc4?geogrid(Rrs_555,35,-121,32,-116), it simply responds by giving me the entire file. Perhaps I misunderstand how geogrid works, but I expected only to get the data within the bounding coordinates from the Rrs_555 band.

It’s my understanding that this inconsistency in behavior is because these resources are hosted under different sub-domains (ladsweb.mdaps.eosdis.nasa.gov versus oceandata.sci.gsfc.nasa.gov). However, they both support some sort of Hyrax-style functions since version responds on each.

So, my main questions are:

  1. Why are these two endpoints different from one another?
    and
  2. Do I misunderstand geogrid or is there an obvious issue with my URL to access MODISA/L3SMI data using geogrid?

My understanding of who manages the Hyrax servers and compliance with new versions is not deep, so any insight is appreciated, thanks!

These are some good questions @dpaavola – thanks for raising them!
I spent some time on #2, and I think I can help understand what is going on. Perhaps @ndp can help with #1?

My best understanding of geogrid() is that it is most appropriate when the dimensions (lat and lon) are both monotonically increasing. In the case of your example, lat is NOT monotonically increasing… Looking into latitude values I get:

Lat[1300], Lat[1400] # monotonically decreasing
>>> (np.float32(35.8125), np.float32(31.645832))

Perhaps that explains some of the weird behavior of geogrid. Indeed when I try to download any subset, I end up downloading the entire dataset. I also tried

?dap4.function=geogrid(/Rrs_555,35,-121,30,0)

that is the DAP4 notation, I get the exact same weird behavior.

And so my best guess is since Lat decreases monotonically, you get an un-expected behavior with geogrid(). But I will spend some more time looking into this behavior.

Thanks again for bringing this up!

Recommendations for subsetting

I highly recommend looking into subsetting via index. You can construct the URL interactively via the DAP Response Form by selecting the variables you want, over the spatial (index space) range you want. And after selecting the encoding (e.g. NetCDF-4), you have the option to download data via the Get Data button, or pasting the proper encoded data URL into a browser. The second option is nicer when trying to automatize downloading data (this is how pydap, etc, can be helpful too).

For example, in your case (your area of interest), the following (un-encoded) URL downloads data in your area of interest:

http://oceandata.sci.gsfc.nasa.gov/opendap/MODISA/L3SMI/2002/0101/AQUA_MODIS.20020101_20021231.L3m.YR.RRS.Rrs_555.4km.nc.dap.nc4?dap4.ce=/lat[1300:1:1400];/lon[1400:1:1550];/Rrs_555[1300:1:1400][1400:1:1550]

You can trigger the download with this encoded URL.

NOTE. Encoding refers in this scenario to replacing square brackets ‘[’ and ] with %5B and %5D, respectively, in the Data URL. You can check the correct encoding from the DAP Response Form since you can always select Copy Encoded Data URL.

1 Like

Hi, I’d like to add to Miguel’s comments regarding geogrid(). The syntax ‘...?dap4.function= <function call>’ is correct, but in this case the variable /Rrs_555 is not a Grid. It is an Array and so the geogrid() function cannot find the latitude and longitude arrays it needs.

Really, this is a short coming of the function - we should fix this. For a long time, few people used these functions, but we are getting more and more questions about them. I’ll put this on our todo list and second Miguel’s comment that for now, the index-based subsetting it the best way forward.

2 Likes

After having a conversation with @ndp – the difference in behavior between the two Hyrax is a difference in version between them. The one at http://oceandata.sci.gsfc.nasa.gov is much newer (From March). OPeNDAP has very little control on how often Hyrax servers get updated.

1 Like

Got it, that makes a lot of sense thank you for looking into this for me! I probably should’ve just found the correct index space for my area of interest from the beginning since like you say index subsetting is significantly easier (and more efficient). Thanks again for the recommendation!

OPeNDap has very little control on how often Hyrax servers get updated

Yeah, I kind of assumed y’all had less of a hand in that stuff. Do you know if it would be worthwhile for me to reach out to whoever runs an out of date server like that or do they usually know whats up already and just haven’t gotten around to things yet?

Yeah, there are a lot of different resources online that document DAP and Hyrax function syntax, many of which I assume are out of date but its really quite hard to tell just looking at them. Good to know that the dap4.function syntax is the way to go.

geogrid is definitely a super useful function, but I also understand that its intended behavior can be difficult to fully implement over all possible data. I don’t know how easy this would be, but aside from expanding the implementation of the function, better/more informative errors on bad requests would definitely help users like me troubleshoot (or understand when something is just not possible yet).

1 Like

That is a good idea @dpaavola - perhaps a glossary of error terms or something. I’ll think about the best way to do this (same for functions/API descriptions). We are definitely working to consolidate and update all documentation (there are tons, generated throughout the years), to provide better information to all OPeNDAP users.

Thanks again!
I will close this as completed. Feel free to start another topic, question, etc related to any other issue you may find. Thanks again for your feedback!

Miguel Jimenez-Urias