Story #3875: Create a dashboard (version 1) for DataONE to provide high level overall system status
Report of downloads of DATA objects per month
Report on Number of GETs per month by object format per MN since DataONE production deployment
Report on Number of GETs per month per MN since DataONE production deployment.
Filter out CN activity
make substitutions in static report. null or "" then CLOAKN, and appropriate subs for DAAC and Clearinghouse.
Report of downloads per object format per month (data/metadata)
ignoring a jar file
#1 Updated by Dave Vieglais over 8 years ago
Following discussion on how to integrated the report data into the dashboard, an alternative approach was identified that would provide the same capability but would be more flexible for feeding the dashboard:
Modify the reporting tool to generate a CSV file, one row per log record, with each row augmented to include size, formatId, and formatType. The CSV should be escaped according to the requirements of the solr.CSVRequestHandler described at http://wiki.apache.org/solr/UpdateCSV This step could also do obfuscation of PII as appropriate.
Create a new SOLR index using the log aggregation schema with the additional fields included. This SOLR index need not be running on the CNs, just at a location accessible by the dashboard client
Populate the new SOLR index with the CSV file. Word is that the CSV loader is very fast.
Point the dashboard client at the new SOLR index. Summary reports (for Rebecca et al) could also be generated easily from this source.
#2 Updated by Robert Waltz over 8 years ago
- Subject changed from Report of downloads per object format per month (data/metadata) to Report of downloads of DATA objects per month
The design I was given has always been 'READ' count and bytes for data objects only.
On Oct 14, 2013, at 2:25 PM, Waltz, Robert Patrick wrote:
I thought part of the issue was the # of GETS per month for all objects. formatType=DATA will only provide a fraction of the objects that may be interesting to report upon. As has been said, eml may contain a dataset. Maybe we just need to combine METADATA + DATA and exclude the RESOURCE types, so that we don't have the break down by formatId and remove the reports array?
Robert Patrick Waltz
Research Associate, Center for Information and Communication Studies
The University of Tennessee
5 James D Hoskins Library
1400 West Cumberland
Knoxville, TN 37996-0341
From: Skye Roseboom email@example.com
Sent: Monday, October 14, 2013 15:45
To: Waltz, Robert Patrick; Christopher Jones
Subject: Re: log aggregation
For dashboard, we just need information about formatType=DATA as an aggregated report. Dashboard is not looking at formatId at all. Preference would be for a single report object per member node for the entire DATA document set.
Combining the time series information to contain both the count and the byte size sounds good to me.