DataONE Tasks: Issueshttps://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2018-10-27T16:04:14ZDataONE Tasks
Redmine Infrastructure - Story #8738 (In Progress): HZEventFilter performance decline with increased task...https://redmine.dataone.org/issues/87382018-10-27T16:04:14ZRob Nahfrnahf@epscor.unm.edu
<p>While reindexing, I noticed that creating index tasks was taking about 300ms (when index_task table had about 30k records). Later in the index task generation, that duration increased to about 500 ms on average. (now it's at 600ms).</p>
<p>There are two calls to the database that search for the pid to check its status, and those filters are not against a field that is indexed (pid). Ideally, we should index that field.</p>
<p>At the very least, the 2 queries should be reduced to one query. This could be done without changing the ORM model we're using.</p>
<p>below is the table description in postgres:</p>
<pre>d1-index-queue=# \d index_task
Table "public.index_task"
Column | Type | Collation | Nullable | Default
---------------------+------------------------+-----------+----------+---------
id | bigint | | not null |
datesysmetamodified | bigint | | not null |
deleted | boolean | | not null |
formatid | character varying(255) | | |
nextexecution | bigint | | not null |
objectpath | text | | |
pid | text | | not null |
priority | integer | | not null |
status | character varying(255) | | |
sysmetadata | text | | |
taskmodifieddate | bigint | | not null |
trycount | integer | | not null |
version | integer | | not null |
Indexes:
"index_task_pkey" PRIMARY KEY, btree (id)
d1-index-queue=# \q
</pre> Member Nodes - Story #8727 (In Progress): NCAR Discovery & Assessmenthttps://redmine.dataone.org/issues/87272018-10-04T16:48:23ZAmy Forresteraforres4@utk.edu
<p>Establish contact and build relationship with a potential new member node. Determine if DataONE and the repository are a good fit for one another and if the repository generally meets the requirements of DataONE member nodes. </p>
<p>This story is complete when a determination is made to either proceed with a new deployment, or that joining DataONE is not an option for the repository at this time.</p>
Member Nodes - Story #8535 (In Progress): Neotoma: Story: Discovery & Planninghttps://redmine.dataone.org/issues/85352018-04-09T21:14:39ZAmy Forresteraforres4@utk.edu
<p>Discovery: establish contact and build relationship with a potential new member node. Determine if DataONE and the repository are a good fit for one another and if the repository generally meets the requirements of DataONE member nodes. </p>
<p>Planning: If the repository and DataONE have agreed to proceed with deployment as a member node. Decisions will be made as to how to proceed with development. Node operators will receive training. </p>
<p>This story is complete when a determination is made to either proceed with planning a new deployment, or that joining DataONE is not an option for the repository at this time.</p>
<p><strong>Record initial communication here</strong></p>
Infrastructure - Story #8525 (In Progress): timeout exceptions thrown from Hazelcast disable sync...https://redmine.dataone.org/issues/85252018-03-27T22:36:54ZRob Nahfrnahf@epscor.unm.edu
<p>Very occasionally, synchronization disables itself when RuntimeExceptions bubble up. The most common of these is when the Hazelcast client seemingly disconnects, or can't complete an operation, and a java.util.concurrent.TimeoutException is thrown.</p>
<p>These are usually due to network problems, as evidenced by timeout exceptions appearing in both the Metacat hazelcast-storage.log files as well as d1-processing logs.</p>
<p>Temporary problems like this should be recoverable, and so a retry or bypass for those timeouts should be implemented. It's not clear whether or not a new HazelcastClient should be instantiated, or whether the same client is still usable. (Is the client tightly bound to a session, or does it recover?) If a new client is needed, preliminary searching through the code indicates that refactoring the HazelcastClientFactory.getProcessingClient() method is only used in a few places, and the singleton behavior it uses can be sidestepped by removing the method and replacing it with a getLock() wrapper method (that seems to be the dominant use case for it). See the newer SyncQueueFacade in d1_synchronization for guidance on that. If the client is never exposed, it can be refreshed as needed.</p>
<pre>root@cn-unm-1:/var/metacat/logs# grep FATAL hazelcast-storage.log.1
[FATAL] 2018-03-27 03:15:19,380 (BaseManager$2:run:1402) [64.106.40.6]:5701 [DataONE] Caught error while calling event listener; cause: [CONCURRENT_MAP_CONTAINS_KEY] Operation Timeout (with no response!): 0
</pre><pre>[ERROR] 2018-03-27 03:15:19,781 [ProcessDaemonTask1] (SyncObjectTaskManager:run:84) java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.util.concurrent
.TimeoutException: [CONCURRENT_MAP_REMOVE] Operation Timeout (with no response!): 0
java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.util.concurrent.TimeoutException: [CONCURRENT_MAP_REMOVE] Operation Timeout (with no response!): 0
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.dataone.cn.batch.synchronization.SyncObjectTaskManager.run(SyncObjectTaskManager.java:76)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.util.concurrent.TimeoutException: [CONCURRENT_MAP_REMOVE] Operation Timeout (with no response!): 0
at com.hazelcast.impl.ClientServiceException.readData(ClientServiceException.java:63)
at com.hazelcast.nio.Serializer$DataSerializer.read(Serializer.java:104)
at com.hazelcast.nio.Serializer$DataSerializer.read(Serializer.java:79)
at com.hazelcast.nio.AbstractSerializer.toObject(AbstractSerializer.java:121)
at com.hazelcast.nio.AbstractSerializer.toObject(AbstractSerializer.java:156)
at com.hazelcast.client.ClientThreadContext.toObject(ClientThreadContext.java:72)
at com.hazelcast.client.IOUtil.toObject(IOUtil.java:34)
at com.hazelcast.client.ProxyHelper.getValue(ProxyHelper.java:186)
at com.hazelcast.client.ProxyHelper.doOp(ProxyHelper.java:146)
at com.hazelcast.client.ProxyHelper.doOp(ProxyHelper.java:140)
at com.hazelcast.client.QueueClientProxy.innerPoll(QueueClientProxy.java:115)
at com.hazelcast.client.QueueClientProxy.poll(QueueClientProxy.java:111)
at org.dataone.cn.batch.synchronization.type.SyncQueueFacade.poll(SyncQueueFacade.java:231)
at org.dataone.cn.batch.synchronization.tasks.SyncObjectTask.call(SyncObjectTask.java:131)
at org.dataone.cn.batch.synchronization.tasks.SyncObjectTask.call(SyncObjectTask.java:73)
</pre> CN REST - Story #8364 (In Progress): Ensure portal uses correct X509 certificateshttps://redmine.dataone.org/issues/83642018-02-13T20:17:25ZChris Jonescjones@nceas.ucsb.edu
<p>We've run into issues where after an upgrade of the <code>dataone-cn-portal</code> package on the CNs, the properties pointing to the public certificate and private key are incorrectly pointing to the old GeoTrust wildcard files rather than the new Lets Encrypt files:<br>
<br>
cn.server.publiccert.filename=/etc/ssl/certs/<em>.test.dataone.org.crt<br>
cn.server.privatekey.filename=/etc/ssl/private/</em>.test.dataone.org.key</p>
<p>These should be (in STAGE):</p>
<p>/etc/letsencrypt/live/cn-stage.test.dataone.org/cert.pem<br>
/etc/letsencrypt/live/cn-stage.test.dataone.org/privkey.pem</p>
<p>The issue might be that these are not being set correctly during the <code>postinst</code> script run. Jing pointed out that these values are taken from the debconf database settings that get set when <code>dataon-cn-os-core</code> is installed. So although the <code>postinst</code> script might be setting the correct values, the old cached values might still be in memory in the debconf database. If so, we'll need to clear those values during installations and upgrades.</p>
<p>Also, knowing where to look for these configuration settings can be challenging. These are referenced from <code>/var/lib/tomcat7/webapps/portal/WEB-INF/portal.properties</code>. These settings should be consolidated into <code>/etc/dataone/portal/portal.properties</code> so they also don't get blown away on war file upgrades in Tomcat.</p>
Member Nodes - Story #8358 (In Progress): Discovery & Planning (CAFF)https://redmine.dataone.org/issues/83582018-02-12T16:25:06ZAmy Forresteraforres4@utk.edu
<p>Discovery is about establishing contact and building a relationship with a potential new member node. In this phase, it is determined if DataONE and the repository are a good fit for one another and if the repository generally meets the requirements of DataONE member nodes. Broad discussions of deployment options may be reviewed as well.<br>
This story is complete when a determination is made to either proceed with planning a new deployment, or that joining DataONE is not an option for the repository at this time.</p>
Infrastructure - Story #8227 (In Progress): ExceptionHandler regurgitates long html pages into th...https://redmine.dataone.org/issues/82272017-12-13T21:19:23ZRob Nahfrnahf@epscor.unm.edu
<p>While useful to know what was returned in the error response when it was not the correct response, HTML pages can be verbose and include excessive markup that's not useful. Especially when a GMN MN is in debugging mode and there is a systematic error being returned (like during an authentication issue), these logged html pages can end up being 75% of the log files, and cause meaningful log lines from scrolling off the end of the log rotation.</p>
<p>An option should be provided to limit the amount of characters being returned in the ServiceFailure.</p>
<p>Options are to:<br>
1. eliminate the message body altogether<br>
2. truncate the message body<br>
3. only print the visible parts of the HTML (remove and elements)<br>
4. combination of 2 & 3</p>
<p>since a new feature, develop in trunk.</p>
Member Nodes - Story #8225 (In Progress): Customize Indexing & View for gmd-pangaeahttps://redmine.dataone.org/issues/82252017-12-06T19:41:28ZMonica Ihliemail@monicaihli.com
<p>An example metadata record: <a href="http://cn-sandbox.test.dataone.org/cn/v2/object/doi:10.1594/PANGAEA.877809_.201711172109">http://cn-sandbox.test.dataone.org/cn/v2/object/doi:10.1594/PANGAEA.877809_.201711172109</a></p>
<p>This record in the search interface on sandbox: <a href="https://search-sandbox.test.dataone.org/#view/doi:10.1594/PANGAEA.877809_.201711172109">https://search-sandbox.test.dataone.org/#view/doi:10.1594/PANGAEA.877809_.201711172109</a></p>
<p>Currently, alternate access point is pulling the link from:<br>
/ns0:MD_Metadata/ns0:distributionInfo[ 1 ]/ns0:MD_Distribution[ 1 ]/ns0:transferOptions[ 1 ]/ns0:MD_DigitalTransferOptions[ 1 ]/ns0:onLine[ 1 ]/ns0:CI_OnlineResource[ 1 ]/ns0:linkage[ 1 ]/ns0:URL[ 1 ]</p>
<p>However, Pangaea wishes users to be directed towards a landing page where they are able to obtain METADATA in multiple formats, found in:<br>
/ns0:MD_Metadata/ns0:dataSetURI[ 1 ]/ns2:CharacterString[ 1 ]</p>
<p>The landing page for this example: <a href="https://doi.pangaea.de/10.1594/PANGAEA.877809">https://doi.pangaea.de/10.1594/PANGAEA.877809</a></p>
Infrastructure - Story #8172 (In Progress): investigate atomic updates for some solr updateshttps://redmine.dataone.org/issues/81722017-09-01T19:35:25ZRob Nahfrnahf@epscor.unm.edu
<p>Atomic updates came to solr with v4.0. (We're currently at 5.x)</p>
<p>Atomic updates are supposed to be more efficient, and could help us with the race condition in <a class="issue tracker-5 status-5 priority-4 priority-default closed child" title="Task: Use multiple threads to index objects (Closed)" href="https://redmine.dataone.org/issues/7771">#7771</a>.<br>
(multiple tasks reading a solr record and then modifying it in divergent ways via overwriting existing values.</p>
<p>atomic add and remove modifiers allow addition and removal of multivalued fields, which is where our race conditions arise.</p>
Infrastructure - Story #8155 (In Progress): Ensure GMN fully supports the Package APIhttps://redmine.dataone.org/issues/81552017-08-01T16:25:32ZDave Vieglaisdave.vieglais@gmail.com
<p>The package API </p>
<p><a href="https://releases.dataone.org/online/api-documentation-v2.0/apis/MN_APIs.html#MNPackage.getPackage">https://releases.dataone.org/online/api-documentation-v2.0/apis/MN_APIs.html#MNPackage.getPackage</a></p>
<p>is a convenience method for clients to download a complete data package in single call. The result is a ZIP file in the BagIt format</p>
<p>The goal of this story is to fully implement the Package API on GMN.</p>
Infrastructure - Story #8081 (In Progress): develop federated broker configuration for indexinghttps://redmine.dataone.org/issues/80812017-04-24T22:52:34ZRob Nahfrnahf@epscor.unm.eduInfrastructure - Story #8049 (In Progress): Support synchronization of system metadata for unhost...https://redmine.dataone.org/issues/80492017-03-21T05:34:38ZRob Nahfrnahf@epscor.unm.edu
<p>As part of mutable-content MN support, allow the MemberNode to keep the system metadata records for all resultant versions of its changeable entities. this allows them to keep accurate system metadata for every version even though they do not have the object bytes anymore for that version.</p>
<p>Benefits:<br>
1. MN does not orphan any objects<br>
2. MN can administer objects from past versions on their own MN. Adjust the access policy of all versions, for example.<br>
3. don't need to call cn.setObsoletedBy or leave that field empty.</p>
<p>Costs:<br>
1. requires new logic for indexing (possibly)<br>
2. requires new logic for registerSystemMEtadata (possibly)<br>
3. require new logic for synchronization </p>
<p>very similar to how we synchronize DATA objects, but don't trigger MN replication.</p>
Infrastructure - Story #8038 (In Progress): connect logging output to a log analysis toolhttps://redmine.dataone.org/issues/80382017-03-07T20:42:15ZRob Nahfrnahf@epscor.unm.edu
<p>this would be part of a larger monitoring framework effort</p>
Python GMN - Story #8032 (In Progress): GMN v2https://redmine.dataone.org/issues/80322017-03-02T18:25:18ZRoger Dahldahl@unm.edu
<p>GMN v2 related tasks that don't require their own story.</p>
OGC-Slender Node - Story #7149 (Testing): Implement mechanism to retrieve a list of objects avail...https://redmine.dataone.org/issues/71492015-06-04T20:20:47ZDave Vieglaisdave.vieglais@gmail.com
<p>Using Python, implement a tool that is able to retrieve a list of packages, and the objects that make up each package.</p>