Bug #8085
listObjects fails at 338940
100%
Description
While iterating though listObjects on production, a 500 service failure error is raised by the CN:
curl "https://cn.dataone.org/cn/v2/object?count=10&start=338940
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><?xml-stylesheet type="text/xsl" href="/cn/xslt/dataone.types.v2.xsl" ?>
doi:10.5063/AA/VIR.6.1
-//ecoinformatics.org//eml-entity-2.0.0beta6//EN
d61b91b538b19be71399e51d01d77334
2015-01-06T07:39:30.945+00:00
442
doi:10.5063/AA/VIR.7.1
-//ecoinformatics.org//eml-attribute-2.0.0beta6//EN
f7e4dd423207c6853978b37de83ec529
2014-12-14T05:06:15.010+00:00
4452
doi:10.5063/AA/VIR.8.1</<?xml version="1.0" encoding="UTF-8"?>
Related issues
History
#1 Updated by Dave Vieglais over 7 years ago
- Related to Bug #4674: Ask Judith, Mike and Virgina Perez.2.1 to obsolete those pids which contain the white spaces. added
#2 Updated by Jing Tao over 7 years ago
explain select guid, date_uploaded, rights_holder, checksum, checksum_algorithm, origin_member_node, authoritive_member_node, date_modified, submitter, object_format, size from systemmetadata where guid not like '% %' order by guid LIMIT 1000 OFFSET 0;
Limit (cost=0.55..339.58 rows=1000 width=248)
-> Index Scan using systemmetadata_pk on systemmetadata (cost=0.55..593427.69 rows=1750386 width=248)
Filter: (guid !~~ '% %'::text)
explain select guid, date_uploaded, rights_holder, checksum, checksum_algorithm, origin_member_node, authoritive_member_node, date_modified, submitter, object_format, size from systemmetadata order by guid LIMIT 1000 OFFSET 0;
Limit (cost=0.55..337.05 rows=1000 width=248)
-> Index Scan using systemmetadata_pk on systemmetadata (cost=0.55..589051.29 rows=1750561 width=248)
(2 rows)
(593427- 589051)/589051=0.0074
explain select guid, date_uploaded, rights_holder, checksum, checksum_algorithm, origin_member_node, authoritive_member_node, date_modified, submitter, object_format, size from systemmetadata where guid not like '% %' and object_format='eml://ecoinformatics.org/eml-2.0.1' order by guid LIMIT 1000 OFFSET 0;
Limit (cost=0.55..5227.98 rows=1000 width=248)
-> Index Scan using systemmetadata_pk on systemmetadata (cost=0.55..597804.10 rows=114359 width=248)
Filter: ((guid !~~ '% %'::text) AND ((object_format)::text = 'eml://ecoinformatics.org/eml-2.0.1'::text))
(3 rows)
explain select guid, date_uploaded, rights_holder, checksum, checksum_algorithm, origin_member_node, authoritive_member_node, date_modified, submitter, object_format, size from systemmetadata where object_format='eml://ecoinformatics.org/eml-2.0.1' order by guid LIMIT 1000 OFFSET 0;
Limit (cost=0.55..5189.21 rows=1000 width=248)
-> Index Scan using systemmetadata_pk on systemmetadata (cost=0.55..593427.69 rows=114370 width=248)
Filter: ((object_format)::text = 'eml://ecoinformatics.org/eml-2.0.1'::text)
(3 rows)
(597804-593427)/593427=0.0074
explain select count(*) from systemmetadata where guid not like '% %';
Aggregate (cost=175262.98..175262.99 rows=1 width=0)
-> Seq Scan on systemmetadata (cost=0.00..170887.01 rows=1750386 width=0)
Filter: (guid !~~ '% %'::text)
(3 rows)
explain select count(*) from systemmetadata;
Aggregate (cost=170887.01..170887.02 rows=1 width=0)
-> Seq Scan on systemmetadata (cost=0.00..166510.61 rows=1750561 width=0)
(2 rows)
(170887-166510)/166510=0.026
explain select count(*) from systemmetadata where guid not like '% %' and object_format='eml://ecoinformatics.org/eml-2.0.1';
Aggregate (cost=175549.31..175549.32 rows=1 width=0)
-> Seq Scan on systemmetadata (cost=0.00..175263.42 rows=114359 width=0)
Filter: ((guid !~~ '% %'::text) AND ((object_format)::text = 'eml://ecoinformatics.org/eml-2.0.1'::text))
(3 rows)
explain select count(*) from systemmetadata where object_format='eml://ecoinformatics.org/eml-2.0.1';
Aggregate (cost=171172.94..171172.95 rows=1 width=0)
-> Seq Scan on systemmetadata (cost=0.00..170887.01 rows=114370 width=0)
Filter: ((object_format)::text = 'eml://ecoinformatics.org/eml-2.0.1'::text)
(3 rows)
(175263 -170887)/170887=0.026
The cost of adding the filter to remove pids with white spaces is not high. We will go through this way.
#3 Updated by Jing Tao over 7 years ago
- Status changed from New to In Progress
- % Done changed from 0 to 30
#4 Updated by Jing Tao over 7 years ago
- Category changed from d1_cn_service to Metacat
#5 Updated by Jing Tao over 7 years ago
- % Done changed from 30 to 100
- Status changed from In Progress to Closed
Test on sandbox and it works.