DataONE Tasks: Issueshttps://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2019-06-19T02:03:44ZDataONE Tasks
Redmine Infrastructure - Story #8823 (New): Recent Apache and OpenSSL combinations break connectivity on ...https://redmine.dataone.org/issues/88232019-06-19T02:03:44ZDave Vieglaisdave.vieglais@gmail.com
<p>The latest Ubuntu 18.04 release of Apache is 2.4.29 and OpenSSL is 1.1.1.</p>
<p>This combination creates a significant delay in TLS renegotiation that results from the Apache config option on the CNs:</p>
<pre>SSLVerifyClient none
<Location "/cn">
<If " ! ( %{HTTP_USER_AGENT} =~ /(windows|chrome|mozilla|safari|webkit)/i )">
SSLVerifyClient optional
</If>
</Location>
</pre>
<p>Which is intended to disable client certificate authentication for web browsers, but allow it for others. This approach worked fine on older Apache / OpenSSL but the new combination creates a several second wait when the server discovers the client is not a web browser and tells it to reconnect with the option of including a client certificate.</p>
<p>The latest released version of Apache is 2.4.39 and this is available through a PPA intended for Debian developers. This has been installed so far on dev-2, sandbox, stage, and stage-2 with the process:</p>
<pre>sudo add-apt-repository ppa:ondrej/apache2
sudo apt update
sudo apt dist-upgrade
sudo systemctl restart apache2
</pre>
<p>This installs Apache 2.4.39 and OpenSSL 1.1.1c which appears to resolve the apparent bug in the 2.4.29 / 1.1.1 combination.</p>
<p>One issue with the update is that by default, Apache now offers TLSv1.3, which is great except that it appears to cause problems with at least Python clients failing to connect and getting a 403 error. For example:</p>
<pre>$ python3
>>> import requests
>>> r = requests.get("https://cn-sandbox-ucsb-1.test.dataone.org/cn/v2/monitor/ping")
>>> r.status_code
403
</pre>
<p>That TLSv1.3 is the problem was verified with cn-stage-unm-2 by configuring Apache with:</p>
<pre> SSLProtocol all -TLSv1.3 -SSLv2 -SSLv3
</pre>
<p>to disable TLSv1.3. After this change the Python client was able to connect as expected.</p>
<p>A workaround has not yet been researched.</p>
<p>It is not clear if this issue applies to other clients such as R and Java, so until we learn one way or the other, TLSv1.3 will be disabled on the CNs.</p>
<p>--This issue will likely apply to Member Nodes as well once TLSv1.3 is generally available or if MNs choose to install Apache 2.4.39.-- CORRECTION: this issue only applies when attempting to renegotiate TLS after headers have been transferred, so will not typically apply to a MN.</p>
Infrastructure - Story #8791 (New): Complete deprecation of ONEMercury interfacehttps://redmine.dataone.org/issues/87912019-05-01T14:00:51ZDave Vieglaisdave.vieglais@gmail.com
<p>ONEMercury is still deployed on CNs and there are a number of links on the DataONE web site that link to the ONEMercury UI.</p>
<p>On production systems, the page: </p>
<p><a href="https://cn.dataone.org/onemercury/">https://cn.dataone.org/onemercury/</a> </p>
<p>should redirect to:</p>
<p><a href="https://search.dataone.org/">https://search.dataone.org/</a></p>
<p>It may be useful to have an informational page that continues the redirect after a few seconds to help inform users of the change.</p>
Search UI - Story #8574 (New): PANGAEA Temporary Fix: SID only in Data Citationhttps://redmine.dataone.org/issues/85742018-04-30T13:26:15ZMonica Ihliemail@monicaihli.com
<p>Adjust appearance of data citation in the case of formatid <a href="http://www.isotc211.org/2005/gmd-pangaea">http://www.isotc211.org/2005/gmd-pangaea</a> such that the SID alone appears in the data citation and not the PID. This is intended as a temporary fix to be in place only while a longer term strategy for customizing appearances of data citations is designed and implemented. Once that longer term strategy is in place, whatever temporary code level changes were implemented to accommodate PANGAEA should be removed.</p>
Infrastructure - Story #8470 (New): Make the spring context of the d1_index_processor daemon more...https://redmine.dataone.org/issues/84702018-03-02T20:56:14ZJing Taotao@nceas.ucsb.edu
<p>In the IndexTaskProcessorDaemon class, which is the start point of d1-indexx-processor, the context is initialized from a file in the class path:</p>
<p><code>context = new ClassPathXmlApplicationContext("processor-daemon-context.xml");</code></p>
<p>The file, processor-daemon-context.xml, imports all the index sub-processor application files. This make very hard for us to add a new sub-processor since we have to rebuild a new jar file in order to include a new file. Instead, if we use the <code>FileSystemXmlApplicationContext</code> class and read the file path from a properties file, we can get more flexibility.</p>
Testing MN Management - Story #8463 (New): test: Testing & Developmenthttps://redmine.dataone.org/issues/84632018-03-01T21:04:41ZAmy Forresteraforres4@utk.edu
<p>Install or develop a functional member node to be registered to a non-production environment. </p>
Infrastructure - Story #8363 (New): indexer shutdown generates index taskshttps://redmine.dataone.org/issues/83632018-02-12T21:42:22ZRob Nahfrnahf@epscor.unm.edu
<p>Seen in STAGE, somehow the index processor generated about 15k tasks (after processing 215k tasks over the weekend) during a service stop. It also created about 12.5 failures. Before trying to stop services, this the status of postgres:</p>
<pre>d1-index-queue=# select status, count(*) from index_task group by status;
status | count
------------+-------
NEW | 5
FAILED | 1659
IN PROCESS | 367
(3 rows)
</pre>
<p>Execution of <code>/etc/init.d/d1-index-task-processor stop</code> timed out.<br>
I performed <code>/etc/init.d/d1-index-task-generator stop</code> successfully, getting an <code>[OK]</code><br>
then I performed <code>/etc/init.d/d1-processing stop</code> on UCSB, also getting an '<code>[OK]</code></p>
<p>examination of the indexing log file a couple minuted later showed this:</p>
<pre>[ INFO] 2018-02-12 20:36:08,975 (IndexTaskProcessor:logProcessorLoad:245) new tasks:0, tasks previously failed: 1661
[ INFO] 2018-02-12 20:36:09,361 (IndexTaskProcessor:processFailedIndexTaskQueue:226) IndexTaskProcessor.processFailedIndexTaskQueue with size 0
[ WARN] 2018-02-12 20:36:09,361 (IndexTaskProcessorJob:execute:58) processing job [org.dataone.cn.index.processor.IndexTaskProcessorJob@515de84e] finished execution of index task processor [org.dataone.cn.index.processor.IndexTaskProcessor@2062
1d44]
[ WARN] 2018-02-12 20:36:26,571 (IndexTaskProcessorScheduler:stop:99) stopping index task processor quartz scheduler [org.dataone.cn.index.processor.IndexTaskProcessorScheduler@103bbd22] ...
[ INFO] 2018-02-12 20:36:26,572 (QuartzScheduler:standby:572) Scheduler QuartzScheduler_$_NON_CLUSTERED paused.
[ INFO] 2018-02-12 20:36:26,572 (IndexTaskProcessorScheduler:stop:111) Scheuler.interrupt method can't succeed to interrupt the d1 index job and the static method IndexTaskProcessorJob.interruptCurrent() will be called.
[ WARN] 2018-02-12 20:36:26,572 (IndexTaskProcessorJob:interruptCurrent:92) IndexTaskProcessorJob class [1806183035] interruptCurrent called, shutting down processor [org.dataone.cn.index.processor.IndexTaskProcessor@20621d44]
[ WARN] 2018-02-12 20:36:26,573 (IndexTaskProcessor:shutdownExecutor:952) processor [org.dataone.cn.index.processor.IndexTaskProcessor@20621d44] Shutting down the ExecutorService. Will allow active tasks to finish; will cancel submitted tasks
and return them to NEW status, wait for active tasks to finish, then return any remaining task not yet submitted to NEW status....
[ WARN] 2018-02-12 20:36:26,573 (IndexTaskProcessor:shutdownExecutor:955) ...1.) closing ExecutorService to new tasks...
[ WARN] 2018-02-12 20:36:26,574 (IndexTaskProcessor:shutdownExecutor:957) ...2.) cancelling cancellable futures...
[ WARN] 2018-02-12 20:36:26,575 (IndexTaskProcessor:shutdownExecutor:958) ...number of futures: 591344
[ WARN] 2018-02-12 20:36:26,575 (IndexTaskProcessor:shutdownExecutor:959) ... number of tasks in futures map: 591344
</pre>
<p>15 minutes or so later, the log showed this:</p>
<pre>[ WARN] 2018-02-12 20:36:26,573 (IndexTaskProcessor:shutdownExecutor:955) ...1.) closing ExecutorService to new tasks...
[ WARN] 2018-02-12 20:36:26,574 (IndexTaskProcessor:shutdownExecutor:957) ...2.) cancelling cancellable futures...
[ WARN] 2018-02-12 20:36:26,575 (IndexTaskProcessor:shutdownExecutor:958) ...number of futures: 591344
[ WARN] 2018-02-12 20:36:26,575 (IndexTaskProcessor:shutdownExecutor:959) ... number of tasks in futures map: 591344
[ WARN] 2018-02-12 20:52:30,811 (IndexTaskProcessor:shutdownExecutor:988) ...number of (cancellable) runnables/tasks reset to new: 0
[ WARN] 2018-02-12 20:52:30,811 (IndexTaskProcessor:shutdownExecutor:989) ...number of (cancellable) runnables not mapped to tasks: 0
[ WARN] 2018-02-12 20:52:30,811 (IndexTaskProcessor:shutdownExecutor:990) ...number of uncancellable runnables: 591344 (completed or in process)
[ WARN] 2018-02-12 20:52:30,812 (IndexTaskProcessor:shutdownExecutor:993) ...3.) waiting (with timeout) for active futures to finish...
[ WARN] 2018-02-12 20:52:30,812 (IndexTaskProcessor:shutdownExecutor:998) ...4.) Reviewing remaining uncancellables to check for completion, returning incomplete ones to NEW status...
[ WARN] 2018-02-12 20:52:30,835 (IndexTaskProcessor:shutdownExecutor:1026) ...5.) Calling shutdownNow on the executor service.
[ WARN] 2018-02-12 20:52:30,835 (IndexTaskProcessor:shutdownExecutor:1028) ... .... number of runnables still waiting: 0
[ WARN] 2018-02-12 20:52:30,835 (IndexTaskProcessor:shutdownExecutor:1030) ...6.) returning preSubmitted tasks to NEW status...
[ WARN] 2018-02-12 20:52:30,835 (IndexTaskProcessor:shutdownExecutor:1031) ... .... number of preSubmitted tasks: 34735
[ INFO] 2018-02-12 20:52:30,835 (IndexTask:markNew:454) Even tough it was masked new, it is still considered failed for id testGetPackage_2017119234441164 since it was tried to many times.
[ERROR] 2018-02-12 20:52:30,891 (IndexTaskProcessor:shutdownExecutor:1038) ....... Exception thrown trying to return task to NEW status for pid: testGetPackage_2017119234441164
org.springframework.orm.hibernate3.HibernateOptimisticLockingFailureException: Object of class [org.dataone.cn.index.task.IndexTask] with identifier [13071797]: optimistic locking failed; nested exception is org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect): [org.dataone.cn.index.task.IndexTask#13071797]
...
[ INFO] 2018-02-12 20:54:19,618 (IndexTask:markNew:454) Even tough it was masked new, it is still considered failed for id P3_201622214921901 since it was tried to many times.
[ WARN] 2018-02-12 20:54:19,621 (IndexTaskProcessor:shutdownExecutor:1036) ... preSubmittedTask for pid P3_201622214921901returned to NEW status.
[ WARN] 2018-02-12 20:54:19,623 (IndexTaskProcessor:shutdownExecutor:1036) ... preSubmittedTask for pid resource_map_doi:10.5065/D6VD6WFPreturned to NEW status.
[ INFO] 2018-02-12 20:54:19,623 (IndexTask:markNew:454) Even tough it was masked new, it is still considered failed for id testGetPackage_NotAuthorized_201710605522454 since it was tried to many times.
[ WARN] 2018-02-12 20:54:19,626 (IndexTaskProcessor:shutdownExecutor:1036) ... preSubmittedTask for pid testGetPackage_NotAuthorized_201710605522454returned to NEW status.
[ WARN] 2018-02-12 20:54:19,628 (IndexTaskProcessor:shutdownExecutor:1036) ... preSubmittedTask for pid resource_map_urn:uuid:d3606ccb-2d50-4723-ae45-c0d01b817e48returned to NEW status.
[ WARN] 2018-02-12 20:54:19,631 (IndexTaskProcessor:shutdownExecutor:1036) ... preSubmittedTask for pid resource_map_doi:10.18739/A2165Freturned to NEW status.
[ WARN] 2018-02-12 20:54:19,631 (IndexTaskProcessor:shutdownExecutor:1041) ............7.) DONE with shutting down IndexTaskProcessor.
[ INFO] 2018-02-12 20:54:19,631 (IndexTaskProcessorScheduler:stop:113) The scheuler.interrupt method seems not interrupt the d1 index job and the static method IndexTaskProcessorJob.interruptCurrent() was called.
[ WARN] 2018-02-12 20:54:19,632 (IndexTaskProcessorScheduler:stop:128) Job scheduler [org.dataone.cn.index.processor.IndexTaskProcessorScheduler@103bbd22] finished executing all jobs. The d1-index-processor shut down sucessfully.============================================
</pre>
<p>but postgres yielded this:</p>
<pre>d1-index-queue=# select status, count(*) from index_task group by status;
status | count
--------+-------
NEW | 15367
FAILED | 14032
(2 rows)
</pre>
<p>indexer shutdowns are a stubborn problem...</p>
Testing MN Management - Story #8352 (New): Move to Productionhttps://redmine.dataone.org/issues/83522018-02-08T15:28:13ZMonica Ihliemail@monicaihli.comTesting MN Management - Story #8347 (New): Testing & Developmenthttps://redmine.dataone.org/issues/83472018-02-08T15:28:11ZMonica Ihliemail@monicaihli.com
<p>Install or develop a functional member node to be registered to a non-production environment. </p>
Testing MN Management - Story #8343 (New): Planninghttps://redmine.dataone.org/issues/83432018-02-08T15:28:10ZMonica Ihliemail@monicaihli.com
<p>The repository and DataONE have agreed to proceed with deployment as a member node. Decisions will be made as to how to proceed with development. Node operators will receive training.</p>
Testing MN Management - Story #8340 (New): Discoveryhttps://redmine.dataone.org/issues/83402018-02-08T15:28:09ZMonica Ihliemail@monicaihli.com
<p>Discovery is about establishing contact and building a relationship with a potential new member node. In this phase, it is determined if DataONE and the repository are a good fit for one another and if the repository generally meets the requirements of DataONE member nodes. Broad discussions of deployment options may be reviewed as well.<br>
This story is complete when a determination is made to either proceed with planning a new deployment, or that joining DataONE is not an option for the repository at this time.</p>
Infrastructure - Story #8173 (New): add checks for retrograde systemMetadata changeshttps://redmine.dataone.org/issues/81732017-09-01T19:42:33ZRob Nahfrnahf@epscor.unm.edu
<p>with the ability to prioritize and the introduction of parallelized index task processing, the effective queue is not guaranteed to be time-ordered. If there are two valid system metadata changes resulting in two tasks and the second change hits the index first, the earlier task should be rejected, as its changes are out of date.</p>
Python Libraries - Story #6796 (New): Migrate to Python v3https://redmine.dataone.org/issues/67962015-02-02T16:16:38ZDave Vieglaisdave.vieglais@gmail.com
<p>Supporting Python 2.7.x will become increasingly difficult and so all the DataONE python code should be migrated to version 3.x support. </p>
<p>For example, Python version 3.x includes many fixes related to secure communications that will not be back ported to Python 2.7.x</p>
DataONE API - Story #6759 (New): ObjectFormat Managementhttps://redmine.dataone.org/issues/67592015-01-13T20:12:14ZRob Nahfrnahf@epscor.unm.edu
<p>There currently are not any API methods for managing the collection of objectFormats registered to a dataone environment. There is a "bootstrap" resource that constitutes a the list in either d1_libclient_java or d1_common_java that can be used in testing environments. There's also a different resource in the cn-os-core project that is used in production.</p>
<p>These 2 resources are difficult to maintain (keep synchronized), and there isn't a defined process for adding formats to production.</p>
<p>We discussed the inclusion of an "addFormat(...) method in V2, but it is not currently in the API. (It would be part of the CNCore API).</p>
<p>It would be good to review the situation with a focused discussion to at least troubleshoot the existing informal management practices and formalize them; and then consider if more infrastructure is needed.</p>
Infrastructure - Story #6069 (New): open ask.dataone.org sign-in to communityhttps://redmine.dataone.org/issues/60692014-08-22T16:45:27ZRob Nahfrnahf@epscor.unm.edu
<p>We are starting to get community involvement in questions asked on DataONE, but community members are not able to sign-in and post responses, I believe.</p>
<p>the solution is to open up the Askbot server to accept new members from the general community.</p>
Infrastructure - Story #4091 (New): ESRI GeoPortal MN stackhttps://redmine.dataone.org/issues/40912013-10-15T13:36:56ZBruce Wilsonbwilso27@utk.edu
<p>The objective is to design, develop, and implement a MN Stack to integrate with the ESRI GeoPortal server (<a href="http://www.esri.com/software/arcgis/geoportal">http://www.esri.com/software/arcgis/geoportal</a>).</p>