Cypher Performance with Multiple Relations

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Cypher Performance with Multiple Relations



I am tasked with prototyping Neo4J as a replacement to our existing data mart which stores data in Redshift/Postgres schemas. I have loaded an instance of neo running on an EC2 instance on an m5.xlarge server to model a marketing campaign and trying to get simple counts of who saw my spots in a given demographic. The results as far as numbers produced are the exact same as my existing data mart, but i am surprised to see that performance is much slower. The same query to get simple count of impressions by a television network returns in 48 seconds compared to 1.5 seconds in Redshift. Question is am I doing something wrong in my cypher (i.e. too many joins) or is this expected behavior. Here is a diagram of the model:



Campaign Relationship Model



Here is my Cypher to get the results in 48s:


match (c:Campaigncampaign_id:98)<-[:PART_OF]-(sa)
, (sa)-[:AIRED_ON]->(n)
, (n)-[:BELONGS_TO]->(ng:NetworkGroupnetwork_group_id:2)
, (sa)<-[:EXPOSED_WITH]-(e)
, (e)<-[se:CONTAINS_ENTITY]-(s:Samplesample_id:2000005)
, (e)-[:MEMBER_OF]->(a:DemographicAudienceaudience_id:2)
return c.campaign_id as `campaign_id`
, a.audience_id as `audience_id`
, a.audience_name as `audience_name`
, s.sample_id as `sample_id`
, n.network_id as `network_id`
, n.network_name as `network_name`
, n.network_call_sign as `network_call_sign`
, count(distinct sa.spot_airing_id) as `spot_airings`
, sum(se.weight) as `spot_impressions`



In addition, I believe all necessary constraints are added to optimize:


Indexes
ON :DemographicAudience(audience_id) ONLINE (for uniqueness constraint)
ON :Campaign(campaign_id) ONLINE (for uniqueness constraint)
ON :Entity(entity_id) ONLINE (for uniqueness constraint)
ON :Network(network_id) ONLINE (for uniqueness constraint)
ON :NetworkGroup(network_group_id) ONLINE (for uniqueness constraint)
ON :Sample(sample_id) ONLINE (for uniqueness constraint)
ON :SpotAiring(spot_airing_id) ONLINE (for uniqueness constraint)

Constraints
ON ( audience:DemographicAudience ) ASSERT audience.audience_id IS UNIQUE
ON ( campaign:Campaign ) ASSERT campaign.campaign_id IS UNIQUE
ON ( entity:Entity ) ASSERT entity.entity_id IS UNIQUE
ON ( network:Network ) ASSERT network.network_id IS UNIQUE
ON ( networkgroup:NetworkGroup ) ASSERT networkgroup.network_group_id IS UNIQUE
ON ( sample:Sample ) ASSERT sample.sample_id IS UNIQUE
ON ( spotairing:SpotAiring ) ASSERT spotairing.spot_airing_id IS UNIQUE



Running Neo4J 3.3.1 Community on AWS: https://aws.amazon.com/marketplace/pp/B073S5MDPV/?ref_=_ptnr_intuz_ami_neoj4



I should also mention that quite a bit of data is loaded: 24,154,440 nodes and 33,220,694 relationships. Most of these are relationships to entities.



My understanding was that Neo4J should hold toe to toe with any RDBMS and even outperform as data grows. I'm hoping I am just being naive with my rookie cypher skills. Any help would be appreciated.



Thanks.





Can you PROFILE your query and attach the query plan (with all elements expanded) to your question?
– InverseFalcon
Aug 11 at 1:52




1 Answer
1



Keep in mind that in Neo4j indexes are used to find starting points in the graph, and once those starting points are found, relationship traversal is used to expand out to find the paths which fit the pattern.



In this case we have several unique nodes, so we can save on some operations by ensuring that we match to all of these nodes first, and we should see Expand(Into) in our query plan instead of Expand(All) and then Filter. I have a hunch that the planner is only using index lookup on a single node, and the rest of them aren't using the index but using property access and filtering, which is less efficient.


Expand(Into)


Expand(All)


Filter



In the case that the planner doesn't lookup all your unique nodes first, before expansion, we can force it by introducing a LIMIT 1 after the initial match.



Lastly, it's a good idea to aggregate using the node itself rather than its properties if the properties in question are unique. This will use the underlying graph id for the node for comparison purposes rather than having to do more expensive property access.



Give this a try and see how it compares:


MATCH (c:Campaigncampaign_id:98), (s:Samplesample_id:2000005), (a:DemographicAudienceaudience_id:2), (ng:NetworkGroupnetwork_group_id:2)
WITH c,s,a,ng
LIMIT 1
MATCH (c)<-[:PART_OF]-(sa)
, (sa)-[:AIRED_ON]->(n)
, (n)-[:BELONGS_TO]->(ng)
, (sa)<-[:EXPOSED_WITH]-(e)
, (e)<-[se:CONTAINS_ENTITY]-(s)
, (e)-[:MEMBER_OF]->(a)
WITH c, a, s, n, count(distinct sa) as `spot_airings`, sum(se.weight) as `spot_impressions`
RETURN c.campaign_id as `campaign_id`
, a.audience_id as `audience_id`
, a.audience_name as `audience_name`
, s.sample_id as `sample_id`
, n.network_id as `network_id`
, n.network_name as `network_name`
, n.network_call_sign as `network_call_sign`
, `spot_airings`
, `spot_impressions`






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard