Apache Solr 7.4 : Use case feasibility for deeply nested objects

Clash Royale CLAN TAG#URR8PPP
Apache Solr 7.4 : Use case feasibility for deeply nested objects
I have the following use case. I have deeply nested objects (that can have nesting till about 4 or 5 levels).
Example :
"siteId": "1h2112o8320",
"siteName": "Maricopa",
"type" : "site",
"_childDocuments_": [
"areaId": "xjn2d2d2",
"type" : "area",
"areaName": "Some Area",
"areaSpecialty": "Awesomeness!",
"_childDocuments_": [
"lineId": "idb283d22dcou",
"type" : "line"
"lineName": "Some Line",
"lineScore": 100
]
]
I am new to Apache Solr and I am using 7.4. Assuming I have a copyField defined for all the different properties of individual documents in the hierarchy, is it feasible to search at any level and return the entire document?
Say I search for "Maricopa" or "Some Area", it should still find this document and return the entire hierarchy? Is that possible using the current capabilities of SOlr?
I have tried the following query without any field selection of "fl"
Query :
(type:site OR type:area OR type:line) AND text:"Maricopa"
This query works but returns only the matching document and not the entire hierarchy.
So to return the entire hierarchy, I added the filed list as below :
*,[child parentFilter=type:site]
This returns the entire hierarchy (all the children as a flat list though)
Now if the result of the above query matches a parent, say a site, I get the hierarchy. If the query matches a child or a grandchild I get the following error :
java.lang.IllegalStateException: Parent query must not match any docs besides parent filter. Combine them as must (+) and must-not (-) clauses to find a problem doc. docID=6
So, I am trying to understand if the use case I am trying to implement is at all possible with deeply nested object or should I just maintain a flat object, in which case later if I have to satisfy queries like "Get all lines for a given site", I have to do a lot of post processing to extract the required fields. (Since I don't want to do that post processing, I was thinking of a deeply nested objects implementation)
Any thought/ help is appreciated.
1 Answer
1
Usually having flat objects for searching makes everything easier, and instead having several collections for several use cases. That will depend on the amount of data you're indexing (i.e. if you have a very large data set, indexing that data set several times might demand more resources than what you have available or can afford). It'll also allow you to optimize the structure of each collection to fit the use case for that set of data.
Instead rebuilding the objects from Solr, you probably already have a data store that produces these documents. In that case, return an id from your collection and fetch the actual, pre-built object from that storage instead. Most sets of data already have a way to fetch the structured based on id, but if you're ingesting from flat files, that won't work in this case. If that's the case, you can also attach the complete, serialized object to your document, allowing you to recreate the object by deserializing JSON for the content of that field.
That being said, I've usually preferred how Elasticsearch handles (deeply) nested objects easier on Lucene, but their approach has its own trade-offs.
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Thanks for the suggestions, I like the work arounds you suggested, and I believe the option of retrieving directly from storage by the id of the response makes more sense than having to serialize the document and keep it as a field at each level. Thanks much!
– Partha
Aug 8 at 19:08