Add new Solr fields via an API call (investigation) #5989
Description
When it comes to adding custom metadata block, there is a manual step involved: adding Solr fields to schema.xml, however Solr provides an API (Schema API) to make this manual step unnecessary, however it invloves some changes in Dataverse.
Note: this ticket is about investigation instead on implementation. First we have to understand every aspect of this change to make sure the existing technologies are reliable and fully support the request.
Solr has a Schema API, which lets you to modify the Solr schema (the list of fields and their properties). Solr can handle the schema in two different ways, and it can be controlled in the solrconfig.xml file. There is a "classic" way, which is based on schema.xml file, and a newer way, called managed schema (its materialization is the "managed-schema" file, and it is editable via the Solr user interface or via API, but it is not advised to edit this file manually).
In the Dataverse provided solrconfig.xml you have this:
The schema API doesn't work with the ClassicIndexSchemaFactory. If you try, Solr returns an error message: "schema is not editable". To enable Schema API, we have to change this setting:
Set ManagedIndexSchemaFactory in solrconfig.xml:
<schemaFactory class="ManagedIndexSchemaFactory"/>
After this you have to restart Solr, and the Schema API will work this way:
curl -X POST -H 'Content-type:application/json' \
http://localhost:8983/api/cores/collection1/schema --data-binary '{
"add-field":{
"name":"title", "type":"text_en", "multiValued":false,
"stored":true, "indexed":true
},
"add-copy-field":{"source":"title", "dest":"_text_", "maxChars":"3000"}
}'
The details of the Schema API can be found here:
https://lucene.apache.org/solr/guide/7_3/schema-api.html
The details of change from classic schema:
The problems:
The documentation says: "Once Solr is restarted and it detects that a schema.xml file exists, but the managedSchemaResourceName file (i.e., “managed-schema”) does not exist, the existing schema.xml file will be renamed to schema.xml.bak and the contents are re-written to the managed schema file." When I tried it, the schema.xml were not copied, and not renamed. However since the same searches, even fielded searches are working.
When I use Schema API to retrieve fields, it contains only the default Solr fields, and not those Dataverse added via schema.xml.
I asked help from a Solr expert.
(I added @4tikhonov as watcher)