Project

General

Profile

Bug #7631

synchronization not processing v1 Member Nodes

Added by Rob Nahf almost 6 years ago. Updated almost 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
d1_synchronization
Target version:
Start date:
2016-02-03
Due date:
% Done:

100%

Milestone:
None
Product Version:
*
Story Points:
Sprint:

Description

Following up on issue with EDACGSTORE; Marco "found that cn.dataone.org synchronization is only happening for KNB and GOA since about 2016-01-28 10:05:00 GMT"

KNB and GOA are v2 member nodes. all others are not synchronizing.


Subtasks

Task #7632: remove CLOEBIRD from synchronization (it is down) and restart processingClosedRobert Waltz

Task #7634: refactor libclient to use the custom socket configuration...ClosedRob Nahf


Related issues

Related to Infrastructure - Task #7633: refactor libclient to remove synchronized from the 2 key methods Closed 2016-02-05

History

#1 Updated by Rob Nahf almost 6 years ago

stack trace shows all threads (at least greater than 20 threads) held up by a lock on RestClient.doRequestNoBody, a synchronized method.
All NodeCommunication within the same version use the same RestClient, so a hung call via the synchronized doRequestNoBody method is indeed able to block all communication to v1 nodes.

We use timeouts, but apparently, HttpClient 4.3.3 is susceptible to not timing out. see: https://issues.apache.org/jira/browse/HTTPCLIENT-1478.

Now that RestClient passes in a RequestConfiguration with the calls, we have less reason to keep the methods synchronized, and can minimize the affect of a hung call for multithreaded applications.

Attempts to recreate the hung HttpClient call have failed (trying to call CLO eBird.)

SynchronizeTask910" daemon prio=10 tid=0x00007ff8c0009800 nid=0x61f1 waiting for monitor entry [0x00007ff964fce000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.dataone.client.rest.RestClient.doRequestMMBody(RestClient.java:245)
- waiting to lock (a org.dataone.client.rest.RestClient)
at org.dataone.client.rest.RestClient.doPostRequest(RestClient.java:192)

"SynchronizationQuartzScheduler_Worker-35" prio=10 tid=0x0000000002df5000 nid=0x2e15 waiting for monitor entry [0x00007ff98dddc000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.dataone.client.rest.RestClient.doRequestNoBody(RestClient.java:212)
- waiting to lock (a org.dataone.client.rest.RestClient)
at org.dataone.client.rest.RestClient.doGetRequest(RestClient.java:148)

"SynchronizationQuartzScheduler_Worker-38" prio=10 tid=0x0000000002dfb000 nid=0x2e18 runnable [0x00007ff98dad8000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:442)
at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:554)
at sun.security.ssl.InputRecord.read(InputRecord.java:509)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:946)
- locked (a java.lang.Object)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1344)
- locked (a java.lang.Object)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1371)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1355)
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:275)
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:254)
at org.apache.http.impl.conn.HttpClientConnectionOperator.connect(HttpClientConnectionOperator.java:117)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:314)
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:363)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:219)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:86)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at org.dataone.client.rest.RestClient.doRequest(RestClient.java:294)
at org.dataone.client.rest.RestClient.doRequestNoBody(RestClient.java:230) < -- a synchronized method
- locked (a org.dataone.client.rest.RestClient)
at org.dataone.client.rest.RestClient.doGetRequest(RestClient.java:148)
at org.dataone.client.rest.HttpMultipartRestClient.doGetRequest(HttpMultipartRestClient.java:329)
at org.dataone.client.rest.HttpMultipartRestClient.doGetRequest(HttpMultipartRestClient.java:318)
at org.dataone.client.v1.impl.MultipartMNode.listObjects(MultipartMNode.java:214)

#2 Updated by Rob Nahf almost 6 years ago

  • Related to Task #7633: refactor libclient to remove synchronized from the 2 key methods added

#3 Updated by Rob Nahf almost 6 years ago

  • Status changed from In Progress to Testing
  • % Done changed from 30 to 50

tasks for operations are complete. This turned out to be a non-testable bug, so the solution in libclient is inferred to be correct based on overlap of symptoms and stack traces.

#4 Updated by Dave Vieglais almost 6 years ago

  • Status changed from Testing to Closed
  • % Done changed from 50 to 100

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)