Bulkloader

top

New specimen records may be created from a single flat (non-relational) file, a text file in which all (or most) data for a single cataloged item are in a single row. This file can be created with any convenient client-side application. The file is then loaded into a similarly structured table on the server, and a server-side application (the bulkloader) parses the columns from each row into the relational structure of the database. The process provides an independent layer of data checking before new information is incorporated into the database proper. Original data that are received in electronic format may require minimal manipulation; you can sometimes merely add the necessary columns to build a file in the bulk-loading format.

Bulkloader templates should be created from the Bulkloader Builder in Arctos. All other means, including this documentation, may produce non-current data which will be rejected.

There is no standard method for moving data into table Bulkloader. You may import data from any file format, type the data into the table, write your own data entry screen, or use any other method you choose. We appreciate documentation, even for specialized datasets – contact us if you wish to contribute.

You may mix accessions, collections, or anything else in a single load.

The specimen Bulkloader will not alone handle every eventuality that may ever occur while entering data. (The suite of tools available should.) Use flags to mark incomplete records for further editing, tie to other bulkloaders with UUIDs, or talk to your friendly local Arctos development team BEFORE you make a mess.

Error messages should include more than enough information to allow you to locate and correct the problem. If that isn’t the case, contact us with the error message and a description of the action that caused the error message.

Arctos is case-sensitive. JOHN DOE is not the same value as John Doe. Leading and trailing spaces and other non-printing characters matter.

The web-based applications may not work well for very large loads. Contact us if you’re having problems.

Agent names must match a unique namestring, not necessarily the preferred name. If you are loading “John Smith” and there are three John Smiths in Arctos, you might create a new name “John Smith (my project)” and use that namestring in your data. Once loaded, the records will display preferred name, and agent name “John Smith (my project)” may be removed.

Special note primarily for botanists: The bulkloader requires taxonomy.scientific_name, not taxonomy.display_name. That is, “Carex bigelowii subsp. lugens” rather than “Carex bigelowii Torr. subsp. lugens (Holm) T.V.Egorova”.

Any of the following are acceptable taxon name values (current 23Aug2011). ref. cttaxa_formula

  • Formula “A”: An exact match to any accepted taxonomy.scientific_name
    • Sorex cinereus
    • Soricidae
  • Formula “A sp.”: Any accepted taxonomy.scientific_name where scientific name is also genus plus ” sp.”
    • Sorex sp.
  • Formula “A cf.”: Any accepted taxonomy.scientific_name plus ” cf.”
    • Sorex cf.
  • Formula “A ?”: Any accepted taxonomy.scientific_name plus ” ?”
    • Sorex ?
  • Formula “A x B”: Any two accepted taxonomy.scientific_names separated by ” x ”
    • Sorex cinereus x Sorex yukonicus
  • Formula “A or B”: Any two accepted taxonomy.scientific_names separated by ” or ”
    • Sorex cinereus or Sorex yukonicus
  • Formula “A and B”: Any two accepted taxonomy.scientific_names separated by ” and ”
    • Sorex cinereus and Sorex yukonicus
  • Formula “A {string}”: Any valid taxonomy.scientific_name, followed by a space, followed by an opening curly bracket, followed by a verbatim identification, followed by a closing curly bracket.
    • Sorex {Sorex new species “my name”}
    • unidentifiable {granite}

Be sure anything coming from other applications (especially Microsoft products) has not changed field length, precision, or other attributes. Watch dates and non-integer numbers (such as decimal latitude) most closely.

The following table describes select individual fields in BULKLOADER. Check the Bulkloader Builder for the latest table structure. Do not attempt to use this as a template. Let us know if it’s out of date, incomplete, cryptic, or otherwise useless.

Field Name
required
conditionally required
not required
Data Type/Vocabulary Description/Example
Collection_Object_Id any unique number Temporary record identifier; Does NOT carry over to any internal primary keys.
Cat_Num set by collection Existing catalog number, or leave blank to assign sequential numbers on upload.
Began_Date ISO8601 date [ doc ] Earliest date the specimen could have been collected.
Ended_Date ISO8601 date [ doc ] Latest date the specimen could have been collected.
Verbatim_Date text; any string [ doc ] Examples: ‘winter 2002’; ‘1 Nov 2002’; ‘Nov 2002’.
VERIFICATIONSTATUS text; CTVERIFICATIONSTATUS
SPECIMEN_EVENT_TYPE text; CTSPECIMEN_EVENT_TYPE Type of specimen-event relationship
Event_Assigned_By_Agent text; agent name  Agent asserting specimen-to-event relationship; often coordinate determiner.
Event_Assigned_Date date Date on which the specimen-event relationship is made
Coll_Event_Remarks text; any string Remarks about Collecting Event.
Higher_Geog text; pre-existing [ doc ] Higher Geography exactly as it appears in table Geog_Auth_Rec. New values must be added to the database prior to bulk-loading.
Maximum_Elevation integer > minimum_elevation [ doc ] Maximum elevation from which the specimen could have come. Used in conjunction with Minimum_Elevation and Orig_Elev_Units.
Minimum_Elevation integer < maximum_elevation [ doc ] Minimum elevation from which the specimen could have come. Used in conjunction with Maximum_Elevation and Orig_Elev_Units.
Orig_Elev_Units text; ctorig_elev_units [ doc ] Used in conjunction with Maximum_Elevation and Minimum_Elevation. (Code table controlled.)
Spec_Locality text; any string [ doc ] Specific locality from which a specimen originates.
Locality_Remarks text; any string Remarks associated with Locality.
Begin coordinate fields. All coordinate data are optional unless Orig_Lat_Long_Units is specified, and leaving Orig_Lat_Long_Units NULL will cause all other coordinate data to be ignored.
Orig_Lat_Long_Units text; ctlat_long_units [ doc ] Lat/Long units as given by the determining agent and before any transformations.
Datum text; ctdatum [ doc ] Map datum used to determine Lat/Long. Required if coordinates are given.
GEOREFERENCE_SOURCE text; any string [ doc ] A code indicating the reference from which a Lat/Long was determined.
GEOREFERENCE_PROTOCOL text; CTGEOREFMETHOD
Max_Error_Distance number [ doc ] The maximum possible error in distance between the recorded Lat_Long and the actual Lat_Long of the specific locality. Required if Max_Error_Units provided.
Max_Error_Units text; ctlat_long_error_units [ doc ] The units in which the Max_Error_Distance are recorded. Required if Max_Error_Distance provided.
Geographic coordinates may be entered in decimal degrees1, degrees-minutes-seconds2, or in degrees with decimal minutes3.[ doc ]
Dec_Lat1 number Decimal latitude.
Dec_Long1 number Decimal longitude.
LatDeg2 and 3 positive number Degrees Latitude (Integer, 90 or less.)
LatMin2 positive number Minutes Latitude (Integer, less than 60.)
LatSec2 positive number Seconds Latitude (Decimal fraction, less than 60.)
LatDir2 and 3 text; N or S Latitude Direction: “N” or “S” (North or South).
LongDeg2 and 3 positive number Degrees Longitude (Integer, 180 or less.)
LongMin2 positive number Minutes Longitude (Integer, less than 60.)
LongSec2 positive number Seconds Longitude (Decimal fraction, less than 60.)
LongDir2 and 3 text Longitude Direction: “E” or “W” (East or West).
Dec_Lat_Min3 positive number Decimal Latitude Minutes (Used with LatDeg, decimal fraction, less than 60.)
dec_long_min3 positive number Decimal Longitude Minutes (Used with LongDeg, decimal fraction, less than 60.)
— end coordinate fields —
Verbatim_Locality text; any string [ doc ] The locality, entered as closely as possible to the original text provided by the collector. (Not necessarily the same as specific locality.)
Collecting_Source text; ctcollecting_source [ doc ] Source from which the specimen was received. Example: “wild caught”
Habitat text; any string [ doc ] A description of habitat at the time of the collecting event.
Associated_Species text; any string A description of other species occurring at the collecting event. Use relationships to other specimens when possible.
Coll_Object_Remarks text; any string Remarks about the cataloged item.
Id_Made_By_Agent text; agent name [ doc ] Determiner, or agent who identified the specimen.
Identification_Remarks text; any string [ doc ] Remarks associated with this identification.
Made_Date ISO8601 date [ doc ] Date that the taxonomic determination (or identification) was made.
Nature_of_Id text; ctnature_of_id [ doc ] How identification was determined. (Code-table controlled.)
Taxon_Name text; taxon name [ doc ] Scientific Name assigned by identifying agent.
Other_Id_Num_x text; any string Other identifying numbers (ie, original field number).
Other_Id_Num_Type_x text; ctcoll_other_id_type Used in conjunction with Other_Id_Num. (Code-table controlled.)
Other_Id_References_x text; ctid_references Establish relationships to other specimens. (Code-table controlled.)
Collector_Agent_x text; agent name Collector or preparator name as it appears in Arctos. At least one collector_agent is required.
Collector_Role_x [ ctcollector_role ] Collector Role.
Part_Name_x text; ctspecimen_part_name [ doc ] At least one part is required.
Part_lot_count_x number [ doc ] A part_lot_count is required for all non-null parts.
Part_Condition_x text; any string [ doc ] A description of the latest documented condition.
Part_disposition_x text; ctcoll_obj_disp [ doc ] A Part_disposition is required for all non-null parts. Example: “in collection”
Part_Barcode_x text; any barcode [ doc ] Barcode on the part as it will be read by a barcode scanner.
Part_Container_Label_x text; any string [ doc ] Label on the container (ie, Nunc tube). The human-readable printing on the container. NULL results in no changes to the part container; ignored if Part_Barcode_x is null.
Part_Remark_x text Remark about the part.
Accn text; accn number [ doc ] Accession Number assigned upon acceptance of specimens. Format is accn number without collection information, but see cross-collection considerations.
EnteredBy text; agent name [ doc ] Agent entering the data into this table. Must match agent_name of type login. NULLable if entered_by_agent_id provided.
ENTERED_AGENT_ID number; key EnterdBy’s agent_id. Increased performance over EnteredBy.
GUID_Prefix text; controlled [ doc ] Unique-within-Arctos identifier of the collection under which the specimen will be cataloged. Replaces Institution_Acronym + Collection_Cde.
collection_id number, key Primary key of table Collection. Alternative to guid_prefix.
Loaded text; any string This is where errors are stored after Bulkloader processing. More Info
Flags text; ctflags Flag indicating the specimen needs further work.
Attribute text; ctattribute_type [ doc ] Attribute name. (Code-table controlled.)
Attribute_value text; various [ doc ] Value of the attribute. Leaving this NULL will cause the bulkloader to ignore the attribute entry regardless of other values.
Attribute_units text; L,W, etc. [ doc ] Units on attribute_value, where appropriate.
attribute_remarks text; any string [ doc ] Remarks about the attribute.
attribute_date ISO8601 date [ doc ] Date the attribute was determined.
attribute_det_meth text; any string [ doc ] How the attribute was determined.
attribute_determiner text; agent name [ doc ] Agent who determined the attribute.
locality_id number; key A primary key from table locality may be used in place of locality information. A value here will over-ride anything entered into higher_geog, spec_locality, coordinates, etc.
collecting_event_id number; key A primary key from table collecting_event may be used in place of collecting_event information. A value here will over-ride anything entered into higher_geog, spec_locality, coordinates, dates, method, etc.

* All date fields should be formatted as ISO8601, e.g., 2006-12-31.

Primary Key Warning

Some values may be replaced by or require primary keys: locality_id, entered_by_agent_id, collecting_event_id, etc. These are internal database identifiers that exist only for convenience, and may be updated, transferred to another data object, or removed for seemingly arbitrary reasons and without warning. They’ll probably work over short time-periods, but we offer no guarantees.