Project

General

Profile

Bug #8085

listObjects fails at 338940

Added by Dave Vieglais over 3 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
Metacat
Target version:
Start date:
2017-04-27
Due date:
% Done:

100%

Milestone:
None
Product Version:
*
Story Points:
Sprint:

Description

While iterating though listObjects on production, a 500 service failure error is raised by the CN:

curl "https://cn.dataone.org/cn/v2/object?count=10&start=338940

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><?xml-stylesheet type="text/xsl" href="/cn/xslt/dataone.types.v2.xsl" ?>


doi:10.5063/AA/VIR.6.1
-//ecoinformatics.org//eml-entity-2.0.0beta6//EN
d61b91b538b19be71399e51d01d77334
2015-01-06T07:39:30.945+00:00
442


doi:10.5063/AA/VIR.7.1
-//ecoinformatics.org//eml-attribute-2.0.0beta6//EN
f7e4dd423207c6853978b37de83ec529
2014-12-14T05:06:15.010+00:00
4452


doi:10.5063/AA/VIR.8.1</<?xml version="1.0" encoding="UTF-8"?>


Related issues

Related to Infrastructure - Bug #4674: Ask Judith, Mike and Virgina Perez.2.1 to obsolete those pids which contain the white spaces. New 2014-03-31

History

#1 Updated by Dave Vieglais over 3 years ago

  • Related to Bug #4674: Ask Judith, Mike and Virgina Perez.2.1 to obsolete those pids which contain the white spaces. added

#2 Updated by Jing Tao over 3 years ago

explain select guid, date_uploaded, rights_holder, checksum, checksum_algorithm, origin_member_node, authoritive_member_node, date_modified, submitter, object_format, size from systemmetadata where guid not like '% %' order by guid LIMIT 1000 OFFSET 0;
Limit (cost=0.55..339.58 rows=1000 width=248)
-> Index Scan using systemmetadata_pk on systemmetadata (cost=0.55..593427.69 rows=1750386 width=248)
Filter: (guid !~~ '% %'::text)

explain select guid, date_uploaded, rights_holder, checksum, checksum_algorithm, origin_member_node, authoritive_member_node, date_modified, submitter, object_format, size from systemmetadata order by guid LIMIT 1000 OFFSET 0;

Limit (cost=0.55..337.05 rows=1000 width=248)
-> Index Scan using systemmetadata_pk on systemmetadata (cost=0.55..589051.29 rows=1750561 width=248)
(2 rows)

(593427- 589051)/589051=0.0074

explain select guid, date_uploaded, rights_holder, checksum, checksum_algorithm, origin_member_node, authoritive_member_node, date_modified, submitter, object_format, size from systemmetadata where guid not like '% %' and object_format='eml://ecoinformatics.org/eml-2.0.1' order by guid LIMIT 1000 OFFSET 0;
Limit (cost=0.55..5227.98 rows=1000 width=248)
-> Index Scan using systemmetadata_pk on systemmetadata (cost=0.55..597804.10 rows=114359 width=248)
Filter: ((guid !~~ '% %'::text) AND ((object_format)::text = 'eml://ecoinformatics.org/eml-2.0.1'::text))
(3 rows)

explain select guid, date_uploaded, rights_holder, checksum, checksum_algorithm, origin_member_node, authoritive_member_node, date_modified, submitter, object_format, size from systemmetadata where object_format='eml://ecoinformatics.org/eml-2.0.1' order by guid LIMIT 1000 OFFSET 0;
Limit (cost=0.55..5189.21 rows=1000 width=248)
-> Index Scan using systemmetadata_pk on systemmetadata (cost=0.55..593427.69 rows=114370 width=248)
Filter: ((object_format)::text = 'eml://ecoinformatics.org/eml-2.0.1'::text)
(3 rows)

(597804-593427)/593427=0.0074

explain select count(*) from systemmetadata where guid not like '% %';
Aggregate (cost=175262.98..175262.99 rows=1 width=0)
-> Seq Scan on systemmetadata (cost=0.00..170887.01 rows=1750386 width=0)
Filter: (guid !~~ '% %'::text)
(3 rows)

explain select count(*) from systemmetadata;
Aggregate (cost=170887.01..170887.02 rows=1 width=0)
-> Seq Scan on systemmetadata (cost=0.00..166510.61 rows=1750561 width=0)
(2 rows)

(170887-166510)/166510=0.026

explain select count(*) from systemmetadata where guid not like '% %' and object_format='eml://ecoinformatics.org/eml-2.0.1';

Aggregate (cost=175549.31..175549.32 rows=1 width=0)
-> Seq Scan on systemmetadata (cost=0.00..175263.42 rows=114359 width=0)
Filter: ((guid !~~ '% %'::text) AND ((object_format)::text = 'eml://ecoinformatics.org/eml-2.0.1'::text))
(3 rows)

explain select count(*) from systemmetadata where object_format='eml://ecoinformatics.org/eml-2.0.1';
Aggregate (cost=171172.94..171172.95 rows=1 width=0)
-> Seq Scan on systemmetadata (cost=0.00..170887.01 rows=114370 width=0)
Filter: ((object_format)::text = 'eml://ecoinformatics.org/eml-2.0.1'::text)
(3 rows)

(175263 -170887)/170887=0.026

The cost of adding the filter to remove pids with white spaces is not high. We will go through this way.

#3 Updated by Jing Tao over 3 years ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 30

#4 Updated by Jing Tao over 3 years ago

  • Category changed from d1_cn_service to Metacat

#5 Updated by Jing Tao over 3 years ago

  • % Done changed from 30 to 100
  • Status changed from In Progress to Closed

Test on sandbox and it works.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)