Hello Reader, here is this month’s iRODS news and developments!
News
Two months off - oops!
The ‘holiday’ season somewhat derailed me (should you meet me in person, ask me about the three c calamities) - lets see if I can pick it up for the new year!
Fediverse?
Does anyone know of any iRODS resources on Mastodon / the Fediverse? With Twitter increasingly hard to use and arguably unpleasant, I’ve seen a lot of technical communities move over to Mastodon.
The computer security community seems to have moved comprehensively, I wonder if the iRODS community will/should?
Have opinions? Resources? Let me know?
iRODS Development Update: January 2024
Blog post
As always, a lot to digest here - I recommend reading the introduction even if you don’t dive into the details, particularly if you use their testing infrastructure, the tiering plugin or the S3 plugin.
iRODS Capability Storage Tiering 4.3.1.1 is released
Loving the decoupling of the plugins with the exact version, allowing for point updates like this.
I would like it to have release notes though, pretty please RENCI?
Comparison between 4.3.1.0 and 4.3.1.1
iRODS Internship - Summer 2024
This internship is open to United States citizens who are students at least 18 years of age.
Five projects, in no particular order…
1) Create an iRODS database plugin template repository (C++)
2) Convert existing web applications to our new HTTP API (ReactJS + HTTP)
3) Connect iRODS to many different cloud storage technologies via rclone (Linux)
4) Add an iRODS backend to rclone (Go)
5) Create new client libraries around our new HTTP API (Various Languages)
iRODS HTTP API v0.2.0 is released
The latest release of the iRODS HTTP API (v0.2.0) now available at GitHub and Docker Hub.
This release closed 57 issues, including OpenID Connect (OIDC) support (including https), improved consistency around return codes, easier configuration, and more straightforward access to getting data objects.
Please find all the closed issues [here](https://github.com/irods/irods_client_http_api/issues?q=milestone%3A0.2.0+is%3Aclosed):
In related news - anyone using this? Should I add this to the list of repo’s tracked and updated?
Main Repository Activity
Open Issues
Worth reading if you make use of this regularly and are on 4.3.0 (and while I have no evidence to suspect >=4.2.9 I do suspect it, but have not tested it).
A pertinent note taken from a larger comment;
The expected behavior suggested by the historical interpretation of -N would be that none of the replicas are trimmed because there are no good replicas. This is what is documented in the table (even if it is not implemented correctly). For completeness, not providing a -N would mean "trim down to 2 good replicas".
Potential options for the future for logging who did what where if you are writing rules manually and not using the audit plugin.
Worth noting if you depend on error codes for collection creation options. Demonstrated on 4.3.1, but likely present before that.
Let's say we want to try and create a collection in a place where we don't have permission to. One may expect rxCollCreate to return the same permission error in all cases, but that's not what happens.
When RECURSIVE_OPR__KW is passed, you get SYS_INVALID_INPUT_PARAM.
When RECURSIVE_OPR__KW is NOT passed, you get CAT_NO_ACCESS_PERMISSION.
The current implementation of create_collections does not provide any information for why it fails. If there's a permission error, it should throw an exception. If there's some other reason for why it failed, it should report that too. No error should be reported if the target collection already exists though. For these reasons, the iRODS HTTP API must provide its own implementation.
Pertinent information;
Currently, the only way to be sure you're dealing with a data object or collection is to query the catalog. dst_opr_type is not guaranteed to reflect the correct type of the object because it is set by the client. For example, the filesystem library provided by iRODS doesn't set the type of the objects.
Keep in mind the PEPs expose only the raw inputs. Those inputs are not verified until the API operation runs (which is after the pre PEP). And those inputs are not guaranteed to be updated for PEP handlers that follow the API operation.
We cannot provide the contents of a collection during a rename because that may be costly. GenQuery is the tool for fetching that information if needed.
I think ruleName is a poorly named value here... it's really just the code that is to be run by the delay_server. It's not surprising that storage tiering no longer worked if you changed the code it was trying to run.
Related to the issue below;
The same issue when synching from irods to local. When the irsync command is executed pep_api_data_obj_get_post will be fired and this pep will contain the forceFlag key with an empty string as if the -f option is executed with the iget command.
As an aside, it’s great to see the detective work being done by the community writing debug logging rules - a few tickets this month have had excellent examples one could cherry-pick if you wanted to get started debugging or writing your own logging.
Also related to 7450 above.
Now we have a more complex ACL structure in 4.3.X we’re going to see more edge cases like this. Permissions are hard!
Closed Issues
Took another look at this, and I think this is actually the expected behavior. Here's what's happening:
When PREP is configured, any rules defined in core.py or even imported by core.py become available for use in any other rule engine plugin, including the native rule engine plugin. irule when run with no target REP will give every configured REP a chance to execute the rule, in order. What we're seeing is the rule pythonRuleEnginePluginTest being executed by all configured REPs. We only see two because irods_rule_engine_plugin-cpp_default_policy-instance produces no output when it executes the rule.
Yes, I am confirming it is firing together with pep_api_replica_open_post when istream write is executed. Actually I should not have reported this because in another project I already used relevant PEPs for istream operations. So, this can be closed pls.
This section was added for the 4.3.1 documentation to reflect the new way to configure authentication (see #7274)...
https://docs.irods.org/4.3.1/system_overview/configuration/#authentication-configuration
It includes descriptions of the existing options as well as pam_password_length, which has been removed.
and a little later
irodsPamAuthCheck now has more words in the documentation around its behavior and usage. As we have no plans to update the application, that's all we are going to do at this time. Please open another issue if we want to document other things or modify the auth checker.
Python iRODS Client Activity
Open Issues
Closed Issues
Consistency is hard m’key?
Since iRODS 4.3.0., iCommands require the irods_authentication_scheme to be 'pam_password' instead of 'PAM' in the irods_environment.json if you are using PAM.
However, the Python-irodsclient still seems to require 'PAM', even if the server is 4.3.0. This means that on a 4.3.0., users cannot use the same irods_environment.json for iCommands and the Python client.
This is inconvenient for two reasons:
A lot of users use iCommands next to the python-irodsclient to quickly be able to check the effect of their scripts etc.
Authenticating with iinit is for Linux the easiest way to authenticate for the Python-irodsclient
Hurrah - I had forgotten I had gone right down the rabbit hole on this one.
The fixes appear to be;
Seeks are otherwise used to calculate object size.
The dataSize default of 0 is needed in servers before 4.3.1 because otherwise -1 can wrap to (1L<<64)-1.
Also, we now send a positive dataSize parameter in a "put" to hint the eventual size of a data objects so that a resource's minimum free space requirement is heeded.
icommands Activity
Closed Issues
Closed because of irsync does not honor “ignore symlinks flag –link”, when symlink is broken - 5359, which is present since at least 4.2.7.
Externals Activity
Open Issues
Presently, all our packages are version 1.0 for EL distros and 1.0~[codename] for deb-based distros. We have no way to produce a package with a different version number. Any update to a package will produce a whole new package with a different package name. This new package can be installed side-by-side with the previous version of the package. While we do have a need for side-by-side installs of different versions of our externals, it would behoove us to provide a means of updating an existing package.
YODA Activity
Open Issues
TIL about FAIR Data Points - I have to say that the Netherlands is really crushing it from a data research perspective - leading the way, lots of other research organisations should follow, in my opinion.
FAIR data point is a standard describing a REST API for creating, storing, and serving FAIR metadata. It is used in many research projects in the Netherlands, and the goal is to create a network of FAIR data points to enable full interoperability across domains. See an example implementation here: https://github.com/FAIRDataTeam/FAIRDataPoint. It would be fantastic if Vault data could be delivered to the outside world through a FDP compatible API, so that each Yoda instance adds to this network. This also requires RDF expression of metadata, which is currently not yet supported by Yoda.
Closed Issues
If you think someone else would appreciate this newsletter, they can sign up at https://theresource.metadata.school/.
No Yaks were shaved in the making of this newsletter. That’s mostly because I ran out of time. The Yak’s need shaving, oh yes.