Leverage Cypher Aggregation function to venture in-memory graphs utilizing all the flexibleness and expressiveness of Cypher question language
Cypher Aggregation is a strong function of the Neo4j Graph Information Science library that permits customers to venture an in-memory graph utilizing a versatile and expressive strategy. Whereas it was doable to make use of Cypher statements to venture an in-memory graph for fairly a while utilizing Cypher Projection, it lacked some options, most notably the flexibility to venture undirected relationships. Due to this fact, a brand new strategy to projecting an in-memory graph in GDS was added known as Cypher Aggregation. This weblog publish will discover the syntax and customary utilization of the Cypher Aggregation projection choice within the Neo4j Graph Information Science Library.
Surroundings setup
If you wish to comply with together with the examples, you may open a Graph Data Science project in Neo4j Sandbox. The venture has a small dataset containing details about airports, their places, and flight routes.
We are able to visualize the graph schema with the next Cypher assertion:
CALL db.schema.visualization()
Projecting in-memory graphs with Cypher aggregation
First, let’s rapidly revisit how the Neo4j Graph Information Science library operates.
Earlier than we will execute any graph algorithms, we first must venture an in-memory graph. The in-memory graph doesn’t must be a precise copy of the saved graph within the database. We’ve the flexibility to pick out solely a subset of graph, or as you’ll study later additionally venture digital relationships that aren’t saved within the database. After the in-memory graph is projected, now we have can execute what number of graph algorithms we would like, after which both stream the outcomes on to the consumer, or write them again to the database.
Projecting an in-memory graph with Cypher Aggregation
Cypher Aggregation function is a part of step one in Graph Information Science workflow, which is projecting an in-memory graph. It gives full flexibility of Cypher question language to pick out, filter, or remodel a graph throughout projection. The syntax of the Cypher Aggregation operate is the next:
gds.alpha.graph.venture(
graphName: String,
sourceNode: Node or Integer,
targetNode: Node or Integer,
nodesConfig: Map,
relationshipConfig: Map,
configuration: Map
)
Solely the primary two parameters (graphName as sourceNode) are necessary, nonetheless, it is advisable specify each the sourceNode and relationshipNode parameters to outline a single relationship. We are going to stroll via a lot of the choices you would possibly want that will help you venture graphs with Cypher Aggregation.
We are going to begin with a easy instance. Let’s say we need to venture all Airport nodes and the HAS_ROUTE relationship between them.
MATCH (supply:Airport)-[:HAS_ROUTE]->(goal:Airport)
WITH gds.alpha.graph.venture('airports', supply, goal) AS graph
RETURN graph.nodeCount AS nodeCount,
graph.relationshipCount AS relationshipCount
The Cypher statements begins with a MATCH clause that selects the related graph. To outline a relationship with Cypher Aggregation, we enter each the supply and goal node.
In fact, the Cypher question language gives flexibility to pick out any subset of the graph. So, for instance, we might venture solely airports within the Oceania content material and their flight routes.
MATCH (supply:Airport)-[:HAS_ROUTE]->(goal:Airport)
WHERE EXISTS {(supply)-[:ON_CONTINENT]->(:Continent {title:"OC"})}
AND EXISTS {(goal)-[:ON_CONTINENT]->(:Continent {title:"OC"})}
WITH gds.alpha.graph.venture('airports-oceania', supply, goal) AS graph
RETURN graph.nodeCount AS nodeCount,
graph.relationshipCount AS relationshipCount
The matching Cypher assertion turned barely extra sophisticated on this instance, however the Cypher Aggregation operate stayed the identical. The airports-oceania graph accommodates 272 nodes and 973 relationships. If you’re skilled with Cypher, you would possibly discover that the above Cypher assertion won’t seize any airports in Oceania that don’t have flight routes with different airports in Oceania.
Suppose we need to venture remoted airports within the projection as nicely. In that case, we have to modify the Cypher matching assertion barely.
MATCH (supply:Airport)
WHERE EXISTS {(supply)-[:ON_CONTINENT]->(:Continent {title:"OC"})}
OPTIONAL MATCH (supply)-[:HAS_ROUTE]->(goal:Airport)
WHERE EXISTS {(goal)-[:ON_CONTINENT]->(:Continent {title:"OC"})}
WITH gds.alpha.graph.venture('airports-isolated', supply, goal) AS graph
RETURN graph.nodeCount AS nodeCount, graph.relationshipCount AS relationshipCount
The connection rely stays equivalent, whereas the node rely has elevated to 304. Due to this fact, 32 airports in Oceania don’t have any flight routes to different airports in Oceania.
When coping with a number of node and relationship sorts in a graph, we’d need to retain details about node labels and relationship sorts throughout projection. Defining the node and relationship sorts throughout graph projection permits us to filter them at algorithm execution time.
CALL {
MATCH (supply:Airport)-[r:HAS_ROUTE]->(goal:Airport)
RETURN supply, goal, r
UNION
MATCH (supply:Airport)-[r:IN_CITY]->(goal:Metropolis)
RETURN supply, goal, r
}
WITH gds.alpha.graph.venture('airports-labels', supply, goal,
{sourceNodeLabels: labels(supply),
targetNodeLabels: labels(goal)},
{relationshipType:kind(r)}) AS graph
RETURN graph.nodeCount AS nodeCount, graph.relationshipCount AS relationshipCount
I choose utilizing UNION
clause when projecting a number of completely different graph patterns. Nevertheless, what Cypher matching assertion is completely as much as you. Since we’re projecting two sorts of nodes and relationships, it’s in all probability a good suggestion to retain the details about their labels and kinds. Due to this fact, we’re utilizing the sourceNodeLabels, targetNodeLabels, and relationshipType parameters. On this instance, we use the prevailing node labels and relationship sorts.
Nevertheless, generally we’d need to use {custom} labels or relationship sorts throughout projection.
CALL {
MATCH (supply:Airport)-[r:HAS_ROUTE]->(goal:Airport)
RETURN supply, goal, r
UNION
MATCH (supply:Airport)-[r:IN_CITY]->(goal:Metropolis)
RETURN supply, goal, r
}
WITH gds.alpha.graph.venture('airports-labels-custom', supply, goal,
{sourceNodeLabels: CASE WHEN supply.metropolis = 'Miami'
THEN 'Miami' ELSE 'NotMiami' END,
targetNodeLabels: ['CustomLabel']},
{relationshipType: CASE WHEN kind(r) = 'HAS_ROUTE'
THEN 'FLIGHT' ELSE 'NOT_FLIGHT' END}) AS graph
RETURN graph.nodeCount AS nodeCount, graph.relationshipCount AS relationshipCount
As you may see, we will use Cypher to dynamically outline the node or relationship kind or just hardcode it. The {custom} node label or relationship kind may also be calculated within the Cypher matching assertion whether it is extra sophisticated.
CALL {
MATCH (supply:Airport)-[r:HAS_ROUTE]->(goal:Airport)
RETURN supply, goal, r,
CASE WHEN supply.metropolis = goal.metropolis
THEN 'INTRACITY' ELSE 'INTERCITY' END as rel_type
UNION
MATCH (supply:Airport)-[r:IN_CITY]->(goal:Metropolis)
RETURN supply, goal, r, kind(r) as rel_type
}
WITH gds.alpha.graph.venture('airports-labels-precalculated', supply, goal,
{sourceNodeLabels: labels(supply),
targetNodeLabels: labels(goal)},
{relationshipType: rel_type}) AS graph
RETURN graph.nodeCount AS nodeCount, graph.relationshipCount AS relationshipCount
Typically, we additionally need to venture node or relationship properties.
MATCH (supply:Airport)-[r:HAS_ROUTE]->(goal:Airport)
WITH gds.alpha.graph.venture('airports-properties', supply, goal,
{sourceNodeLabels: labels(supply),
targetNodeLabels: labels(goal),
sourceNodeProperties: {runways: supply.runways},
targetNodeProperties: {runways: goal.runways}},
{relationshipType: kind(r), properties: {distance: r.distance}}) AS graph
RETURN graph.nodeCount AS nodeCount, graph.relationshipCount AS relationshipCount
The node or relationship properties are outlined as a map object (dictionary or JSON object for Python or JS builders), the place the important thing represents the projected property, and the worth represents the projected worth. This syntax permits us to venture properties which are calculated throughout projection.
MATCH (supply:Airport)-[r:HAS_ROUTE]->(goal:Airport)
WITH gds.alpha.graph.venture('airports-properties-custom', supply, goal,
{sourceNodeLabels: labels(supply),
targetNodeLabels: labels(goal),
sourceNodeProperties: {runways10: supply.runways * 10},
targetNodeProperties: {runways10: goal.runways * 10}},
{relationshipType: kind(r),
properties: {inverseDistance: 1 / r.distance}}) AS graph
RETURN graph.nodeCount AS nodeCount, graph.relationshipCount AS relationshipCount
Once more, we will use all the flexibleness of Cypher to calculate any node or relationship properties. Similary as with node labels, we will additionally calculate the {custom} properties within the MATCH
clause.
An vital factor to notice is that the present projection habits is that the engine shops the node properties when it first encounters a node. Nevertheless, on subsequent encounters of the identical node, it ignores the node properties fully. Due to this fact, it’s important to watch out to calculate equivalent node properties for each supply and goal nodes. In any other case, there could also be discrepancies between what’s projected and what you count on.
Some graph algorithms within the Neo4j Graph Information Science library count on undirected relationships. A relationship can’t be saved as undirected within the database and have to be explicitly outlined throughout graph projection.
Suppose you need to deal with all projected relationships as undirected.
MATCH (supply:Airport)-[r:HAS_ROUTE]->(goal:Airport)
WITH gds.alpha.graph.venture('airports-undirected', supply, goal,
{}, // nodeConfiguration
{}, // relationshipConfiguration
{undirectedRelationshipTypes: ['*']}
) AS graph
RETURN graph.nodeCount AS nodeCount, graph.relationshipCount AS relationshipCount
We are able to use the undirectedRelationshipType to specify which relationships ought to be projected as undirected. In follow, you may observe that the connection rely doubled after we projected an undirected graph.
Typically you would possibly need to venture a single relationship kind as undirected whereas treating the opposite as directed.
CALL {
MATCH (supply:Airport)-[r:HAS_ROUTE]->(goal:Airport)
RETURN supply, goal, r
UNION ALL
MATCH (supply:Airport)-[r:IN_CITY]->(goal:Metropolis)
RETURN supply, goal, r
}
WITH gds.alpha.graph.venture('airports-undirected-specific', supply, goal,
{},
{relationshipType:kind(r)},
{undirectedRelationshipTypes: ['IN_CITY']}) AS graph
RETURN graph.nodeCount AS nodeCount, graph.relationshipCount AS relationshipCount
On this instance, the HAS_ROUTE relationship is handled as directed, whereas the IN_CITY relationship is handled as undirected. After we need to specify particular relationship sorts to be handled as undirected, we should embrace the relationshipType parameter within the relationship configuration.
Lastly, we will additionally venture digital relationships. A digital relationship is a relationship that isn’t saved within the database.
Suppose you need to study the cities based mostly on their flight connections. The database doesn’t have flight relationships between cities. As a substitute of making the relationships within the database, you may calculate them throughout graph projection.
MATCH (sourceCity)<-[:IN_CITY]-(:Airport)-[:HAS_ROUTE]->(:Airport)-[:IN_CITY]->(targetCity)
WITH sourceCity, targetCity, rely(*) AS countOfRoutes
WITH gds.alpha.graph.venture('airports-virtual', sourceCity, targetCity,
{},
{relationshipType:'VIRTUAL_ROUTE'},
{}) AS graph
RETURN graph.nodeCount AS nodeCount, graph.relationshipCount AS relationshipCount
As you may observe, projecting digital relationships may be very straightforward with Cypher Aggregation projection. We’ve calculated the rely of routes between varied cities and added it as a relationship property within the projected graph.
Let’s calculate an important cities based mostly on the PageRank algorithm to complete off this weblog publish.
CALL gds.pageRank.stream('airports-virtual')
YIELD nodeId, rating
RETURN gds.util.asNode(nodeId).title AS metropolis, rating
ORDER BY rating DESC
LIMIT 5
Outcomes
Abstract
Cypher Aggregation is the newer choice to venture in-memory graphs within the Neo4j Graph Information Science library utilizing Cypher statements. Particularly, it may be used to venture undirected relationships, which is inconceivable with the older Cypher Projection. Nevertheless, with the added flexibility of choosing and remodeling graphs throughout projection comes a efficiency price. Due to this fact, when you can, you must use Native Projection when doable for efficiency causes. Then again, when you may have particular use instances to venture a selected subset of a graph, calculate {custom} properties or venture digital relationships, Cypher Aggregation is your buddy.