DataONE Tasks: Issueshttps://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2019-03-14T17:11:26ZDataONE Tasks
Redmine Infrastructure - Story #8779 (New): ForesiteResourceMap performance issuehttps://redmine.dataone.org/issues/87792019-03-14T17:11:26ZRob Nahfrnahf@epscor.unm.edu
<p>Profiling reveals that much time is spent in IndexVisibilityDelegate, and it seemingly is called twice unnecessarily, first in _init, second in getAllResourceIDs().</p>
<p>This class in general is not well documented and has some confusing traversal code, so it is difficult to assess what exactly is going on. It also seems to be a misleading encapsulation of data, in that it attempts to filter out resource map members based on current system metadata properties (archived or not), but that's not mentioned at all in the sparse javadocs.</p>
<p>the code needs to be reviewed to make sure no unnecessary calls are made.<br>
If resource map checking (for completeness) is not going to be done anymore, this class probably should be deprecated or removed.</p>
Infrastructure - Bug #8735 (In Progress): NPE in IndexTask causes indexing job to failhttps://redmine.dataone.org/issues/87352018-10-18T18:05:44ZRob Nahfrnahf@epscor.unm.edu
<p>the isArchived() method calls a method that can return null, and doesn't check for null values before using it.</p>
<p>(IndexTask is in d1_cn_common component)</p>
Infrastructure - Bug #8724 (New): index out of bounds error in PortalCertificateManagerhttps://redmine.dataone.org/issues/87242018-10-02T19:11:41ZRob Nahfrnahf@epscor.unm.edu
<p>noticed in DEV logs:</p>
<pre>[ WARN] 2018-10-02 05:19:01,700 (PortalCertificateManager:getSession:308) 1
java.lang.ArrayIndexOutOfBoundsException: 1
at org.dataone.portal.PortalCertificateManager.getSession(PortalCertificateManager.java:305)
at org.dataone.cn.rest.v1.IdentityController.verifyAccount(IdentityController.java:542)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:176)
at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:436)
at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:424)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
at org.springframework.web.servlet.FrameworkServlet.doPut(FrameworkServlet.java:800)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:649)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dataone.cn.rest.PortalCertificateFilter.doFilter(PortalCertificateFilter.java:82)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:88)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dataone.cn.rest.ServiceDisableFilter.doFilter(ServiceDisableFilter.java:78)
</pre>
<p>The exception appears to be handled and logged, but the code should anticipate the session subject to be single-worded. ("public") for example, and not log an error with stacktrace.</p>
Infrastructure - Task #8703 (New): test the cleaned up indexer in DEVhttps://redmine.dataone.org/issues/87032018-09-24T18:05:50ZRob Nahfrnahf@epscor.unm.edu
<p>test the indexing code in DEV, after removing messaging and dependency upgrades. </p>
Infrastructure - Story #8702 (New): Indexing Refactor Strategyhttps://redmine.dataone.org/issues/87022018-09-21T22:42:48ZRob Nahfrnahf@epscor.unm.edu
<p>Indexing is non-performing and has some inconsistency problems.</p>
<p>A solution was developed that addresses the main issues, and involves the creation of a separate solr core for relationships (the resource maps). Initially, the solution will create the separate core as a behind the scenes reference for the main search index. Relationships (resource_map, documents, isDocumentedBy) will still be copied into the main search record.</p>
<p>Additionally, archived objects will not be removed from the index, but the field archived will be added to the schema.</p>
<p>The new logic for processing resource maps and archiving objects should remove many of the inefficient checks that cause records to be reindexed.</p>
<p>The main phases for development will be:</p>
<ol>
<li>refactor out the custom solr client for use of the standard org.apache.solrj-client.<br></li>
<li>migrate the schema to include archived field & introduce relationships core. Refactor the resourcemap subprocessor to use it, and trigger relationship tasks.</li>
<li>refactor the delete subprocessor (for archived records) & add the search handler.</li>
</ol>
Infrastructure - Story #8525 (In Progress): timeout exceptions thrown from Hazelcast disable sync...https://redmine.dataone.org/issues/85252018-03-27T22:36:54ZRob Nahfrnahf@epscor.unm.edu
<p>Very occasionally, synchronization disables itself when RuntimeExceptions bubble up. The most common of these is when the Hazelcast client seemingly disconnects, or can't complete an operation, and a java.util.concurrent.TimeoutException is thrown.</p>
<p>These are usually due to network problems, as evidenced by timeout exceptions appearing in both the Metacat hazelcast-storage.log files as well as d1-processing logs.</p>
<p>Temporary problems like this should be recoverable, and so a retry or bypass for those timeouts should be implemented. It's not clear whether or not a new HazelcastClient should be instantiated, or whether the same client is still usable. (Is the client tightly bound to a session, or does it recover?) If a new client is needed, preliminary searching through the code indicates that refactoring the HazelcastClientFactory.getProcessingClient() method is only used in a few places, and the singleton behavior it uses can be sidestepped by removing the method and replacing it with a getLock() wrapper method (that seems to be the dominant use case for it). See the newer SyncQueueFacade in d1_synchronization for guidance on that. If the client is never exposed, it can be refreshed as needed.</p>
<pre>root@cn-unm-1:/var/metacat/logs# grep FATAL hazelcast-storage.log.1
[FATAL] 2018-03-27 03:15:19,380 (BaseManager$2:run:1402) [64.106.40.6]:5701 [DataONE] Caught error while calling event listener; cause: [CONCURRENT_MAP_CONTAINS_KEY] Operation Timeout (with no response!): 0
</pre><pre>[ERROR] 2018-03-27 03:15:19,781 [ProcessDaemonTask1] (SyncObjectTaskManager:run:84) java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.util.concurrent
.TimeoutException: [CONCURRENT_MAP_REMOVE] Operation Timeout (with no response!): 0
java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.util.concurrent.TimeoutException: [CONCURRENT_MAP_REMOVE] Operation Timeout (with no response!): 0
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.dataone.cn.batch.synchronization.SyncObjectTaskManager.run(SyncObjectTaskManager.java:76)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.util.concurrent.TimeoutException: [CONCURRENT_MAP_REMOVE] Operation Timeout (with no response!): 0
at com.hazelcast.impl.ClientServiceException.readData(ClientServiceException.java:63)
at com.hazelcast.nio.Serializer$DataSerializer.read(Serializer.java:104)
at com.hazelcast.nio.Serializer$DataSerializer.read(Serializer.java:79)
at com.hazelcast.nio.AbstractSerializer.toObject(AbstractSerializer.java:121)
at com.hazelcast.nio.AbstractSerializer.toObject(AbstractSerializer.java:156)
at com.hazelcast.client.ClientThreadContext.toObject(ClientThreadContext.java:72)
at com.hazelcast.client.IOUtil.toObject(IOUtil.java:34)
at com.hazelcast.client.ProxyHelper.getValue(ProxyHelper.java:186)
at com.hazelcast.client.ProxyHelper.doOp(ProxyHelper.java:146)
at com.hazelcast.client.ProxyHelper.doOp(ProxyHelper.java:140)
at com.hazelcast.client.QueueClientProxy.innerPoll(QueueClientProxy.java:115)
at com.hazelcast.client.QueueClientProxy.poll(QueueClientProxy.java:111)
at org.dataone.cn.batch.synchronization.type.SyncQueueFacade.poll(SyncQueueFacade.java:231)
at org.dataone.cn.batch.synchronization.tasks.SyncObjectTask.call(SyncObjectTask.java:131)
at org.dataone.cn.batch.synchronization.tasks.SyncObjectTask.call(SyncObjectTask.java:73)
</pre> Infrastructure - Story #8363 (New): indexer shutdown generates index taskshttps://redmine.dataone.org/issues/83632018-02-12T21:42:22ZRob Nahfrnahf@epscor.unm.edu
<p>Seen in STAGE, somehow the index processor generated about 15k tasks (after processing 215k tasks over the weekend) during a service stop. It also created about 12.5 failures. Before trying to stop services, this the status of postgres:</p>
<pre>d1-index-queue=# select status, count(*) from index_task group by status;
status | count
------------+-------
NEW | 5
FAILED | 1659
IN PROCESS | 367
(3 rows)
</pre>
<p>Execution of <code>/etc/init.d/d1-index-task-processor stop</code> timed out.<br>
I performed <code>/etc/init.d/d1-index-task-generator stop</code> successfully, getting an <code>[OK]</code><br>
then I performed <code>/etc/init.d/d1-processing stop</code> on UCSB, also getting an '<code>[OK]</code></p>
<p>examination of the indexing log file a couple minuted later showed this:</p>
<pre>[ INFO] 2018-02-12 20:36:08,975 (IndexTaskProcessor:logProcessorLoad:245) new tasks:0, tasks previously failed: 1661
[ INFO] 2018-02-12 20:36:09,361 (IndexTaskProcessor:processFailedIndexTaskQueue:226) IndexTaskProcessor.processFailedIndexTaskQueue with size 0
[ WARN] 2018-02-12 20:36:09,361 (IndexTaskProcessorJob:execute:58) processing job [org.dataone.cn.index.processor.IndexTaskProcessorJob@515de84e] finished execution of index task processor [org.dataone.cn.index.processor.IndexTaskProcessor@2062
1d44]
[ WARN] 2018-02-12 20:36:26,571 (IndexTaskProcessorScheduler:stop:99) stopping index task processor quartz scheduler [org.dataone.cn.index.processor.IndexTaskProcessorScheduler@103bbd22] ...
[ INFO] 2018-02-12 20:36:26,572 (QuartzScheduler:standby:572) Scheduler QuartzScheduler_$_NON_CLUSTERED paused.
[ INFO] 2018-02-12 20:36:26,572 (IndexTaskProcessorScheduler:stop:111) Scheuler.interrupt method can't succeed to interrupt the d1 index job and the static method IndexTaskProcessorJob.interruptCurrent() will be called.
[ WARN] 2018-02-12 20:36:26,572 (IndexTaskProcessorJob:interruptCurrent:92) IndexTaskProcessorJob class [1806183035] interruptCurrent called, shutting down processor [org.dataone.cn.index.processor.IndexTaskProcessor@20621d44]
[ WARN] 2018-02-12 20:36:26,573 (IndexTaskProcessor:shutdownExecutor:952) processor [org.dataone.cn.index.processor.IndexTaskProcessor@20621d44] Shutting down the ExecutorService. Will allow active tasks to finish; will cancel submitted tasks
and return them to NEW status, wait for active tasks to finish, then return any remaining task not yet submitted to NEW status....
[ WARN] 2018-02-12 20:36:26,573 (IndexTaskProcessor:shutdownExecutor:955) ...1.) closing ExecutorService to new tasks...
[ WARN] 2018-02-12 20:36:26,574 (IndexTaskProcessor:shutdownExecutor:957) ...2.) cancelling cancellable futures...
[ WARN] 2018-02-12 20:36:26,575 (IndexTaskProcessor:shutdownExecutor:958) ...number of futures: 591344
[ WARN] 2018-02-12 20:36:26,575 (IndexTaskProcessor:shutdownExecutor:959) ... number of tasks in futures map: 591344
</pre>
<p>15 minutes or so later, the log showed this:</p>
<pre>[ WARN] 2018-02-12 20:36:26,573 (IndexTaskProcessor:shutdownExecutor:955) ...1.) closing ExecutorService to new tasks...
[ WARN] 2018-02-12 20:36:26,574 (IndexTaskProcessor:shutdownExecutor:957) ...2.) cancelling cancellable futures...
[ WARN] 2018-02-12 20:36:26,575 (IndexTaskProcessor:shutdownExecutor:958) ...number of futures: 591344
[ WARN] 2018-02-12 20:36:26,575 (IndexTaskProcessor:shutdownExecutor:959) ... number of tasks in futures map: 591344
[ WARN] 2018-02-12 20:52:30,811 (IndexTaskProcessor:shutdownExecutor:988) ...number of (cancellable) runnables/tasks reset to new: 0
[ WARN] 2018-02-12 20:52:30,811 (IndexTaskProcessor:shutdownExecutor:989) ...number of (cancellable) runnables not mapped to tasks: 0
[ WARN] 2018-02-12 20:52:30,811 (IndexTaskProcessor:shutdownExecutor:990) ...number of uncancellable runnables: 591344 (completed or in process)
[ WARN] 2018-02-12 20:52:30,812 (IndexTaskProcessor:shutdownExecutor:993) ...3.) waiting (with timeout) for active futures to finish...
[ WARN] 2018-02-12 20:52:30,812 (IndexTaskProcessor:shutdownExecutor:998) ...4.) Reviewing remaining uncancellables to check for completion, returning incomplete ones to NEW status...
[ WARN] 2018-02-12 20:52:30,835 (IndexTaskProcessor:shutdownExecutor:1026) ...5.) Calling shutdownNow on the executor service.
[ WARN] 2018-02-12 20:52:30,835 (IndexTaskProcessor:shutdownExecutor:1028) ... .... number of runnables still waiting: 0
[ WARN] 2018-02-12 20:52:30,835 (IndexTaskProcessor:shutdownExecutor:1030) ...6.) returning preSubmitted tasks to NEW status...
[ WARN] 2018-02-12 20:52:30,835 (IndexTaskProcessor:shutdownExecutor:1031) ... .... number of preSubmitted tasks: 34735
[ INFO] 2018-02-12 20:52:30,835 (IndexTask:markNew:454) Even tough it was masked new, it is still considered failed for id testGetPackage_2017119234441164 since it was tried to many times.
[ERROR] 2018-02-12 20:52:30,891 (IndexTaskProcessor:shutdownExecutor:1038) ....... Exception thrown trying to return task to NEW status for pid: testGetPackage_2017119234441164
org.springframework.orm.hibernate3.HibernateOptimisticLockingFailureException: Object of class [org.dataone.cn.index.task.IndexTask] with identifier [13071797]: optimistic locking failed; nested exception is org.hibernate.StaleObjectStateException: Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect): [org.dataone.cn.index.task.IndexTask#13071797]
...
[ INFO] 2018-02-12 20:54:19,618 (IndexTask:markNew:454) Even tough it was masked new, it is still considered failed for id P3_201622214921901 since it was tried to many times.
[ WARN] 2018-02-12 20:54:19,621 (IndexTaskProcessor:shutdownExecutor:1036) ... preSubmittedTask for pid P3_201622214921901returned to NEW status.
[ WARN] 2018-02-12 20:54:19,623 (IndexTaskProcessor:shutdownExecutor:1036) ... preSubmittedTask for pid resource_map_doi:10.5065/D6VD6WFPreturned to NEW status.
[ INFO] 2018-02-12 20:54:19,623 (IndexTask:markNew:454) Even tough it was masked new, it is still considered failed for id testGetPackage_NotAuthorized_201710605522454 since it was tried to many times.
[ WARN] 2018-02-12 20:54:19,626 (IndexTaskProcessor:shutdownExecutor:1036) ... preSubmittedTask for pid testGetPackage_NotAuthorized_201710605522454returned to NEW status.
[ WARN] 2018-02-12 20:54:19,628 (IndexTaskProcessor:shutdownExecutor:1036) ... preSubmittedTask for pid resource_map_urn:uuid:d3606ccb-2d50-4723-ae45-c0d01b817e48returned to NEW status.
[ WARN] 2018-02-12 20:54:19,631 (IndexTaskProcessor:shutdownExecutor:1036) ... preSubmittedTask for pid resource_map_doi:10.18739/A2165Freturned to NEW status.
[ WARN] 2018-02-12 20:54:19,631 (IndexTaskProcessor:shutdownExecutor:1041) ............7.) DONE with shutting down IndexTaskProcessor.
[ INFO] 2018-02-12 20:54:19,631 (IndexTaskProcessorScheduler:stop:113) The scheuler.interrupt method seems not interrupt the d1 index job and the static method IndexTaskProcessorJob.interruptCurrent() was called.
[ WARN] 2018-02-12 20:54:19,632 (IndexTaskProcessorScheduler:stop:128) Job scheduler [org.dataone.cn.index.processor.IndexTaskProcessorScheduler@103bbd22] finished executing all jobs. The d1-index-processor shut down sucessfully.============================================
</pre>
<p>but postgres yielded this:</p>
<pre>d1-index-queue=# select status, count(*) from index_task group by status;
status | count
--------+-------
NEW | 15367
FAILED | 14032
(2 rows)
</pre>
<p>indexer shutdowns are a stubborn problem...</p>
Infrastructure - Task #8098 (Closed): Token-based authentication fails with LE CN certshttps://redmine.dataone.org/issues/80982017-05-21T17:06:46ZChris Jonescjones@nceas.ucsb.edu
<p>When trying to call @MN.create()@ on my local Metacat setup (which points to the production CN environment for ORCID authentication), I'm getting an @InvalidToken@ error:<br>
<br>
<?xml version="1.0" encoding="UTF-8"?><br>
Session is required to WRITE to the Node.<br>
</p>
<p>This is odd because I recently logged in via ORCID, so it looked like a token verification issue. In the Metacat log, I see:</p>
<p>metacat 20170521-10:35:07: [WARN]: Could not use public key to verify provided token: eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiJodHRwOlwvXC9vcmNpZC5vcmdcLzAwMDAtMDAwMi04MTIxLTIzNDEiLCJmdWxsTmFtZSI6IkNocmlzdG9waGVyIEpvbmVzIiwiaXNzdWVkQXQiOiIyMDE3LTA1LTIxVDE1OjU3OjA2LjEyOCswMDowMCIsImNvbnN1bWVyS2V5IjoidGhlY29uc3VtZXJrZXkiLCJleHAiOjE0OTU0NDcwMjYsInVzZXJJZCI6Imh0dHA6XC9cL29yY2lkLm9yZ1wvMDAwMC0wMDAyLTgxMjEtMjM0MSIsInR0bCI6NjQ4MDAsImlhdCI6MTQ5NTM4MjIyNn0.dynDbRKqIuI1bXzPYlHfW7aFcrl2J7O8ZWqxS_2DHBotx4AqX_hbxuRrlQ_9s-V1mRJupyxkYxW3EWkLcoMUQNTuyMLGpV53GPoGdBjkTEd407GU-yxv_G3cmmSovXSLj6AAjeKJ8KHBt4y6JtgqR2isf5YGoM18CwM-IZV3nJVPBMZpNMPhYSWJeaeD2u02duKCpcy7L-XD_OCLJdzHjtjyFqqbHvqGyZIPqc9Kp_JTuTmlYaAZe9JiLcjHnyaOeHMGCEkmOekiRA_wh6DtnBLKyCczBjNg0kirxMk27abjAxt-ckhKfrCT6dnXbd1lCLNnxVYiJj5wztNOGH492T3nyaSQGROnSQd6cxB3pPAiwW7AOR34MPNJlNv_r-3WbwThDeOOtrMSvfZtYGv6Mn_i0-d1yjccRDzZeXdRS0P91GYfdK2lfog1lhiPuec3gD4V4plNJR3wKSSMhgjikH6igCB5I7C5n9Ye5vSeyWW9ApwLogfbEUc3xKgiCgj1jtED4L7E3WgUvtWxsyqMMtaEAJGvRHlGPPShD3xHPsm6ltCVrU1arLXneuGa0R7M-GgzMk0z5HdRE2bD2agu5WuN-w5-w9W6jwrzgI4wM7v8KiJYxeM332nx4f2BF6ArFJ2K-DxlpgmdK6bkPTtL7H-uj5digXvBoHFYZAJF49c</p>
<p>After grabbing the public certificate from the production CN:</p>
<p>-----BEGIN CERTIFICATE-----<br>
MIIFQzCCBCugAwIBAgISAxPSoq7BM7aFc1VzgyTJkz3wMA0GCSqGSIb3DQEBCwUA<br>
MEoxCzAJBgNVBAYTAlVTMRYwFAYDVQQKEw1MZXQncyBFbmNyeXB0MSMwIQYDVQQD<br>
ExpMZXQncyBFbmNyeXB0IEF1dGhvcml0eSBYMzAeFw0xNzA1MTcxMjI5MDBaFw0x<br>
NzA4MTUxMjI5MDBaMBkxFzAVBgNVBAMTDmNuLmRhdGFvbmUub3JnMIIBIjANBgkq<br>
hkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAtp++UWPu0Zm4gIs01F+LE94i4eExI+UX<br>
82DIB3Xn93FW4IgDTsjEfXCB3AHggdx6GnExbDzu/iXn+K3LiW6QaeasG47XOeup<br>
JjpmJqDROAJvLy1GpgrFeNxEe5F6xljPcAxUH/W/NkoHAem7wMatRNA53f6JkMVd<br>
sKXAYPOdKUOqhQ9QRMqEFIPImt+SHfvxUkQyL4g+1taQ5XYDu5zwF5+k77ZRre+o<br>
RVR9gHdbdlvLLQYP9eGJdi+nmFFTrEuXIklB8SQi6yvck0p6nR2sjmxFlnaLTe7Z<br>
iaVWaA1vvwvwgG27Q2iMcnAG+JXQDe7Jd1YIuXUW7vVYyGl4ONbp3QIDAQABo4IC<br>
UjCCAk4wDgYDVR0PAQH/BAQDAgWgMB0GA1UdJQQWMBQGCCsGAQUFBwMBBggrBgEF<br>
BQcDAjAMBgNVHRMBAf8EAjAAMB0GA1UdDgQWBBSyQkmQUHHO3EkItWseuA3L6vg8<br>
1DAfBgNVHSMEGDAWgBSoSmpjBH3duubRObemRWXv86jsoTBwBggrBgEFBQcBAQRk<br>
MGIwLwYIKwYBBQUHMAGGI2h0dHA6Ly9vY3NwLmludC14My5sZXRzZW5jcnlwdC5v<br>
cmcvMC8GCCsGAQUFBzAChiNodHRwOi8vY2VydC5pbnQteDMubGV0c2VuY3J5cHQu<br>
b3JnLzBcBgNVHREEVTBTghRjbi1vcmMtMS5kYXRhb25lLm9yZ4IVY24tdWNzYi0x<br>
LmRhdGFvbmUub3JnghRjbi11bm0tMS5kYXRhb25lLm9yZ4IOY24uZGF0YW9uZS5v<br>
cmcwgf4GA1UdIASB9jCB8zAIBgZngQwBAgEwgeYGCysGAQQBgt8TAQEBMIHWMCYG<br>
CCsGAQUFBwIBFhpodHRwOi8vY3BzLmxldHNlbmNyeXB0Lm9yZzCBqwYIKwYBBQUH<br>
AgIwgZ4MgZtUaGlzIENlcnRpZmljYXRlIG1heSBvbmx5IGJlIHJlbGllZCB1cG9u<br>
IGJ5IFJlbHlpbmcgUGFydGllcyBhbmQgb25seSBpbiBhY2NvcmRhbmNlIHdpdGgg<br>
dGhlIENlcnRpZmljYXRlIFBvbGljeSBmb3VuZCBhdCBodHRwczovL2xldHNlbmNy<br>
eXB0Lm9yZy9yZXBvc2l0b3J5LzANBgkqhkiG9w0BAQsFAAOCAQEAJo/aaCo0NweP<br>
prHz+9Ko39xZ/Y6kum0ZOSw6BFM8zgkOOd1R0rbc53j09yKDi3V+MKd5rXfISNsp<br>
LKBVe/R8HH/rglYUhMTBBizGsEdyPE4n5I3ml4RyOVmC1SpDPUzH0CAeSLkzBpBV<br>
WVIfEwl641GtT0hBcwVjMlDYywrvSHv4mifVLd/2ZTSYillrhQzQySKb9g7jbEld<br>
LHY1WoIU0E5XgQJq3b6Vhb5dXVkHsDfwPHNpJA5fVCVYoKazo+xSNBP757ta/ix4<br>
e9CbRsQQ0TgEsuUAOa9lh9+O8uAL5zkZ4kwZCLypxbkZ8/YYOCMGMtGz4632J7VF<br>
Ozukfk41bw==<br>
-----END CERTIFICATE-----</p>
<p>and trying to verify the token with this certificate, it fails. </p>
<p>However, it verifies correctly with the old CN certificate:<br>
<br>
-----BEGIN CERTIFICATE-----<br>
MIIFrTCCBJWgAwIBAgICbkowDQYJKoZIhvcNAQELBQAwRzELMAkGA1UEBhMCVVMx<br>
FjAUBgNVBAoTDUdlb1RydXN0IEluYy4xIDAeBgNVBAMTF1JhcGlkU1NMIFNIQTI1<br>
NiBDQSAtIEczMB4XDTE0MTEwMzEyNTMyNFoXDTE3MDUyMDIxNDU0OVowgZExEzAR<br>
BgNVBAsTCkdUMzkwMjU2MTcxMTAvBgNVBAsTKFNlZSB3d3cucmFwaWRzc2wuY29t<br>
L3Jlc291cmNlcy9jcHMgKGMpMTIxLzAtBgNVBAsTJkRvbWFpbiBDb250cm9sIFZh<br>
bGlkYXRlZCAtIFJhcGlkU1NMKFIpMRYwFAYDVQQDDA0qLmRhdGFvbmUub3JnMIIC<br>
IjANBgkqhkiG9w0BAQEFAAOCAg8AMIICCgKCAgEAzZaa/tslwA/CJ6Wqfzl72TrF<br>
/8IurHHrfzmme/B2dSUt0+zDfdfXWe7p6pZ4yJp95Kk34cf0EFWgFJ5Nc1gyXJUh<br>
Ht6IVweDDFrExeNPsNbI5DLFdUJ5ZfNhWrqu2C4kdeRfHqxOvI0w6XEfdZ4yI3QC<br>
zfx5EtsoFEXpqK5Xe3r5KEnXVsPq6azerVqvq2UqhPa0EYJA8/CVJiQ0CRQl+w9x<br>
Mh6GBvHUXqCHBPlRPIY7QomI+3Cx8gYgcLCCEcHVgzU05zQQRwdtIqjENq6CubH9<br>
UTMiKS81CFJbAVrKetDRI3bNGIcEEpjV1XC28OOWXNc9fXXAK3fvVFVl2tuzYFn0<br>
ROmRrtiz4+jXC7mp7/fTb5ekTeenKyoVA5UicbIHM1PPQeTwcHUH7CxybJVheGAo<br>
7wwzqrxin3LMMyn56QBXqB81qL+iMJ+ZBHXxiS5V6g4W1ag3VOtDvyRtN1QGB6J2<br>
enOTBOHNwr9bHuJcVPx1dYd6YjZD3LQbyJZyVtYHalnlCXGjLCxs9B2uL4MBllb5<br>
N++ouBiujO5ww6Ht+MgOq/gbahx9WlJCs5xXLy8Hf+FfjUBZXDdkvLwa36FWktZa<br>
ibbqqeBBq9IaW0gUNNmhYs3SB8J7JICVflUIp7e7wy7cXBJHpkATZKAuHVnqJ8ZT<br>
83YekoQFyxpcqB2fmRkCAwEAAaOCAVYwggFSMB8GA1UdIwQYMBaAFMOc8/zTRgg0<br>
u85Gf6B8W/PiCMtZMFcGCCsGAQUFBwEBBEswSTAfBggrBgEFBQcwAYYTaHR0cDov<br>
L2d2LnN5bWNkLmNvbTAmBggrBgEFBQcwAoYaaHR0cDovL2d2LnN5bWNiLmNvbS9n<br>
di5jcnQwDgYDVR0PAQH/BAQDAgWgMB0GA1UdJQQWMBQGCCsGAQUFBwMBBggrBgEF<br>
BQcDAjAlBgNVHREEHjAcgg0qLmRhdGFvbmUub3JnggtkYXRhb25lLm9yZzArBgNV<br>
HR8EJDAiMCCgHqAchhpodHRwOi8vZ3Yuc3ltY2IuY29tL2d2LmNybDAMBgNVHRMB<br>
Af8EAjAAMEUGA1UdIAQ+MDwwOgYKYIZIAYb4RQEHNjAsMCoGCCsGAQUFBwIBFh5o<br>
dHRwczovL3d3dy5yYXBpZHNzbC5jb20vbGVnYWwwDQYJKoZIhvcNAQELBQADggEB<br>
ABcvSyNwX1jHZ7HRX5Lzcua0Q4//wc5KCBvPgPrbr3bGSi3+t+Rc4ZagIUxFWSd1<br>
uZ+guQ4lywhQXGOXh7dH1SPljPOwZ9VPdhJMPW/woaQ0ndakLvW0OBIgyyqIcJ57<br>
8e6DKzZ0jd97xmXYAa7iMhCxL2lpXzDQMH5k8XhENHcjMXfVitkqmIS2Wfi1rEMK<br>
phszml9yRABtx+X0z/4/xmNZ2PrNApqmqVD2DnY1MgJNHga/KmPX/6VZ+NEszudP<br>
rvrD5hQvAjkJA+5kgqX31w98ggfXg4oxQo8AhKrHWnhI52SoWT1BOwSGDRpgRW/n<br>
1AdVxT9TIoHXbhf6+c8fWOU=<br>
-----END CERTIFICATE-----</p>
<p>So, effectively, the @d1_cn_portal@ component is still using the old RapidSSL certificate to sign tokens, but (I think) on MNs that have recently been restarted and grab the most recent CN certificate for verification purposes, the get the new LE certificate, and so can't verify incoming tokens signed by the CN. My guess is that this is going to be problematic for other MNs that go through a reboot and or restart and rely on the CN signing tokens. Looking at the @portal.properties@ file on the cn, I see that it is indeeed still pointing to the old certificate and key:<br>
<br>
cn.server.publiccert.filename=/etc/ssl/certs/_.dataone.org.crt<br>
cn.server.privatekey.filename=/etc/ssl/private/dataone_org.key</p>
<p>So, in the short term, we need to plan to re-configure @portal.properties@ on the production CNs to use the new Let's Encrypt certificates for token signing:<br>
<br>
cn.server.publiccert.filename=/etc/letsencrypt/live/cn.dataone.org/fullchain.pem<br>
cn.server.privatekey.filename=/etc/letsencrypt/live/cn.dataone.org/privkey.pem</p>
<p>However, the @fullchain.pem@ includes the intermediate CA certs as well, and I don't know if @CertificateManager.loadCertificateFromFile()@ handles multiple certificates in a file (i.e. does it use the first found, last found, etc?). We need to determine this before making the properties change, but also before other production MNs get rebooted and begin to fail authentication for clients.</p>
<p>Once tested, for the long term, we need to update the portal properties in the buildout to make the changes permanent. We may also need to add some logic for ensuring the @/etc/letsencrypt@ files have the correct permissions as Dave pointed out.</p>
Infrastructure - Feature #8053 (New): add funding award number to indexhttps://redmine.dataone.org/issues/80532017-03-28T01:21:39ZMatthew Jonesjones@nceas.ucsb.edu
<p>Many groups want to track data sets based on the funding award numbers and organizations that funded the work. For example, the Arctic Data Center, BCO-DMO, and R2R all need to report on and search for data based on NSF award numbers. We should add two new multi-valued fields, funding_agency and award_number to the SOLR index so that they can be used for search, display, and faceting. There is a proposal in EML to add structured fields for these, so for details see EML issue <a href="https://github.com/NCEAS/eml/issues/266">https://github.com/NCEAS/eml/issues/266</a> </p>
Infrastructure - Bug #8043 (New): The origin field for EML documents isn't properly extracted whe...https://redmine.dataone.org/issues/80432017-03-10T21:36:25ZBryce Mecummecum@nceas.ucsb.edu
<p>We just ran into this with the following EML record: <a href="https://knb.ecoinformatics.org/#view/doi:10.5063/F15B00CC">https://knb.ecoinformatics.org/#view/doi:10.5063/F15B00CC</a></p>
<p>The EML has six creators (Kiesecker, Fargione, Baruch-Mordo, Trainor, Ryan, Patterson) but the origin field in the Solr index has two (Ryan, Patterson). After some digging, we realized this was likely because the indexing component responsible for EML doesn't respect EML references. The XML for the relevant section is:</p>
<p><code><br>
<creator scope="document"><br>
<references>1484778487589</references><br>
</creator><br>
<creator scope="document"><br>
<references>1484778426939</references><br>
</creator><br>
<creator scope="document"><br>
<references>1484778028081</references><br>
</creator><br>
<creator scope="document"><br>
<references>1484778171131</references><br>
</creator><br>
<creator id="1485385283277" scope="document"><br>
<individualName><br>
<salutation>Dr.</salutation><br>
<givenName>Joe</givenName><br>
<surName>Ryan</surName><br>
</individualName><br>
<organizationName>University of Colorado Boulder</organizationName><br>
<positionName>Professor</positionName><br>
<electronicMailAddress>joseph.ryan@colorado.edu</electronicMailAddress><br>
</creator><br>
<creator id="1484777776976" scope="document"><br>
<individualName><br>
<salutation>Dr.</salutation><br>
<givenName>Lauren</givenName><br>
<surName>Patterson</surName><br>
</individualName><br>
<organizationName>Duke University</organizationName><br>
<positionName>Water Policy Associate</positionName><br>
<address scope="document"><br>
<deliveryPoint>Nicholas Institute for Environmental Policy Solutions, Duke University</deliveryPoint><br>
<city>Durham</city><br>
<administrativeArea>NC</administrativeArea><br>
<postalCode>27708</postalCode><br>
<country>USA</country><br>
</address><br>
<electronicMailAddress>lauren.patterson@duke.edu</electronicMailAddress><br>
</creator><br>
</code></p>
<p>It would be really nice if the origin field got populated with all those referenced creators.</p>
Infrastructure - Story #8028 (Rejected): Migrate UNM CN servers to DMZ networkhttps://redmine.dataone.org/issues/80282017-02-28T15:42:47ZDave Vieglaisdave.vieglais@gmail.com
<p>UNM now has a DMZ available which will place servers outside of the campus intrusion prevention infrastructure, and so should significantly reduce latency and increase throughput for network activity.</p>
<p>The goal of this story is to migrate all UNM CNs including test instances to the new network.</p>
<p>Network info:<br>
<br>
IP Range: 64.106.84.2/27 (.2 - .8 currently reserved for DataONE, .5 - .8 currently available for CNs)<br>
Gateway: 64.106.84.1<br>
Netmask: 255.255.255.224<br>
Broadcast: 64.106.84.31</p>
<p>To move a VM to the new network:</p>
<ol>
<li> Select IP Address</li>
<li>Update /etc/network/interfaces</li>
<li>Update /etc/hosts</li>
<li>Reconfigure any services that specify IP address, including but not limited to:</li>
</ol>
<p>a) apache<br>
b) UFW<br>
c) Zookeeper<br>
d) LDAP (?)<br>
e) Hazelcast<br>
f) Metacat replication<br>
g) CILogon ?</p>
<p>Note that the other CNs also specify specific IP addresses for connectivity, so it will be necessary to update configurations on those machines as well.</p>
<ol>
<li>Select the DMZ network in VMWare configuration for the VM</li>
<li>Restart the VM</li>
</ol>
Infrastructure - Task #3726 (Rejected): Design strategy for dealing with data packages containing...https://redmine.dataone.org/issues/37262013-04-24T15:39:26ZSkye Roseboomsroseboo@dataone.unm.edu
<p>Currently the search index is modeling the ORE data package relationships in three column - resourceMap, documents, documentedBy. These correspond to the 'aggregates', 'describes', 'describedBy' relations defined in the ORE document.</p>
<p>DataONE is currently deploying the search index using solr 3.6. This version does not allow partial updates - the entire record is updated/inserted and re-indexed. Solr does not provide the capability for specifying multiple records to update. This means that to enter the relationships defined in ORE documents: n+1 update/inserts are required to record the relationships in the search index. Each document in the index will need to be updated to record the resourceMap, documents, documentedBy relationships. This means to record a data set with 100k objects will take 100k updates to the solr index. This processing time to do this may become a performance issue.</p>
<p>Another issue is the use of a solr multivalued field (array) to model the 'documents' and 'documentedBy' relationships. In the case of large data packages, this field records the pid of each document in the relationship - potentially 100's of thousands of pids - in a large data set. Solr will attempt to store any number of items in the multivalued field but at some point, performance issues will arise.</p>
<p>Furthermore use of the ORE relationships becomes difficult when an object is in more than one data set. Since all the 'documents,documentedBy' relationships are stored in one field - users cannot easily determine which relationships in those fields correspond to which data packages.</p>
<p>The design and strategy of how DataONE and the search index presents relationships between objects needs some consideration of the best way to represent these relationships internally and how to present to the user.</p>
<p>Large datasets and how DataONE handles them will effect other clients/ITK tools - as they will need to determine how to display/present data package information and relationships for large data sets as well. For example - the download panel in OneMercury. Can it show 100k data package? Should it? Is this useful for users? </p>
Infrastructure - Task #3676 (Closed): design proposal for archivehttps://redmine.dataone.org/issues/36762013-03-20T23:00:47ZRob Nahfrnahf@epscor.unm.edu
<p>Design an implementable solution that:</p>
<p>1) allows non-current package relationships to be traversable<br>
2) does the right thing regarding discovery and non-current items<br>
3) compatible with mutability requirements</p>
Infrastructure - Bug #3675 (New): package relationships not available for archived objectshttps://redmine.dataone.org/issues/36752013-03-20T19:12:28ZRob Nahfrnahf@epscor.unm.edu
<p>Currently, records for obsoleted items are maintained in the solr index so its resourceMap, documents, documentedBy relationships are available, and people can "investigate the past". However, those same relationships are not available for archived items, leading to an incomplete solution for this use case (accessing package relationships of out-of-date content).</p>
<p>Archive is used to limit discoverability, but it also eliminates the ability to navigate the package relationships. </p>
<p>Note: archive is intended to be used when the owner does not want to update the object, but simply remove it. However, nothing prevents the owner from archiving obsoleted content. So, in fact, the ability to navigate the package relationships of out-of-date content cannot be guaranteed, and is subject to the individual data management practices of content owners. </p>
Infrastructure - Feature #3608 (Rejected): Enable OpenSearch interface on CNshttps://redmine.dataone.org/issues/36082013-02-25T18:04:10ZDave Vieglaisdave.vieglais@gmail.com
<p>OpenSearch ( <a href="http://www.opensearch.org/">http://www.opensearch.org/</a> ) provides a standard mechanism for simple searches against a search engine. There are three parts to an opensearch implementation:</p>
<ol>
<li>An XML document that provides a programmatic description of the service</li>
<li>A tag that references the XML document from a user facing HTML page somewhere (e.g. on the ONEMercury landing page)</li>
<li>At least one service that accepts a search term and returns an Atom feed that follows some opensearch guidelines.</li>
</ol>
<p>An example implementation from a SOLR book ("Apache Solr 3 Enterprise Search Server by Smiley and Pugh) is in svn at:</p>
<p><a href="https://repository.dataone.org/software/cicore/trunk/itk/d1_opensearch">https://repository.dataone.org/software/cicore/trunk/itk/d1_opensearch</a></p>
<p>See also <a class="issue tracker-5 status-5 priority-3 priority-lowest closed" title="Task: Create Wrapper metthod in servlet for opensearch (Closed)" href="https://redmine.dataone.org/issues/449">#449</a></p>