SQL Graph Objects in SQL Server 2017: the Good and the Bad

SQL Server 2017 will be bringing us support for some of the functionality of graph databases. Graph databases are unlike relational databases in that they represent data as nodes, edges and properties within a ‘graph’, which is an abstract data type representing relationships or connections as a set of vertices nodes, points and edges, like a tangled fish-net. They allow us to represent and traverse relationships between entities in easier ways than is possible with the traditional relational database.

Graph objects are built to represent complex relationships. A hierarchy is a special case of a graph and is useful to record such things as relationship between forum posts and their replies, likes in forum posts, and friendship between people. Hierarchies have a root node (forum post and replies, for example), but many graphs don’t have a root node (people friendship, for example).

In this article, we’ll be building a forum example using the new graph model. I will also compare the complexity of the queries between the graph and a relational model and calling your attention for what’s still missing in this first version. Yes, it’s a Beta technology in its early stages.

Demonstration Environment

I’m using SQL Server 2017 CTP 2.1 which you can download here: https://www.microsoft.com/en-us/sql-server/sql-server-2017

I’m also using SSMS v. 17.0 which you can download here: https://docs.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms

Creating the models

This is the Entity-relationship model I will use for comparison:

If you would like to follow along and try out some comparisons, you can create this entity-relational model with the script below, or you can go directly to the graph model. However, you need to create the new database, ‘GraphExample’ , using SSMS.

create database GraphExample
  go
  -- Trying an entire graph model
  use GraphExample
  go
  create schema Forum
  go
  create table Forum.ForumMembers
  (MemberId int not null primary key Identity(1,1),
  MemberName varchar(100))
  go
  create table Forum.ForumPosts
  ([PostID] int not null primary key,
  PostTitle varchar(100),
  PostBody  varchar(100),
  OwnerID int,
  ReplyTo   int)
  go
  Create table Forum.Likes
  (MemberId int,
  PostId int)
  go
  create table Forum.LikeMember
  (MemberId int,
   LikedMemberId int)
   go
  INSERT Forum.ForumMembers values('Mike'),('Carl'),('Paul'),('Christy'),('Jennifer'),('Charlie')
  go
   
  INSERT INTO [Forum].[ForumPosts] 
             (
             [PostID]
             ,[PostTitle]
             ,[PostBody],OwnerID, ReplyTo
                   )
       VALUES
           (4,'Geography','Im Christy from USA',4,null),
             (1,'Intro','Hi There This is Carl',2,null)
  INSERT INTO [Forum].[ForumPosts] 
             (
             [PostID]
             ,[PostTitle]
             ,[PostBody],OwnerID, ReplyTo
                   )
       VALUES
          (8,'Intro','nice to see all here!',1,1),
          (7,'Intro','I''m Mike from Argentina',1,1),
           (6,'Re:Geography','I''m Mike from Argentina',1,4),
          (5,'Re:Geography','I''m Jennifer from Brazil',5,4),
                (3,'Re: Intro','Hey Paul This is Christy',4,2),
                   (2,'Intro','Hello I''m Paul',3,1)
  go
  INSERT Forum.Likes VALUES (1,4),
   (2,7),
   (2,8),
   (2,2),
   (4,5),
   (4,6),
   (1,2),
   (3,7),
   (3,8),
       (5,4)
  go
  Insert Forum.LikeMember VALUES (2,1),
   (2,3),
   (4,1),
   (4,5)

The Graph Model

The planning of a graph model is quite different than the relational model. Tables in a graph model can be Edges or Nodes , we need to decide which tables will be edges and which tables will be nodes.

This image illustrates our graph model:

The decision between nodes and edges in our model is simple: All the entities in a logical model will be nodes, while all the relations in the logical model will be edges. In our model, we have ‘Posts’ and ‘Members’ as entities, ‘Reply To’ , ‘Like’ and ‘Written By’ as relations.

Architectural Notes

Nodes and Edges are nothing more than tables with some special fields. There’s no restriction that forbids us to create regular relationships between these tables, turning the model into a mix of relational and graph model.

For example, the relationship ‘Written By’ between ‘Posts’ and ‘Members’ is a simple one-to-many relationship. Instead of using an edge table, we could express it with a regular relation between these tables, creating a mixed model.

When we create a node entity, the entity receives a calculated field named ‘$node_id’ . We could use this field as a primary key, SQL Server allows calculated fields to be the primary key of a table: However this field is a JSON field, which is not a good option for primary key for many reasons. For this reason, our nodes have to maintain two keys: The business key, an integer, and the ‘$node_id’ key, an auto-generated JSON key that also contains an integer.

This will be the script for our node entities:

Use GraphExample
  go
  CREATE TABLE [dbo].[ForumMembers](
         [MemberID] [int] IDENTITY(1,1) NOT NULL,
         [MemberName] [varchar](100) NULL
  )
  AS NODE
  GO
   
  CREATE TABLE [dbo].[ForumPosts](
         [PostID] [int] NULL,
         [PostTitle] [varchar](100) NULL,
         [PostBody] [varchar](1000) NULL
  )
  AS NODE

Architectural Notes

After creating the objects, you can then examine the objects in Object Explorer. You may notice a new folder called ‘ Graph’ inside the ‘ Tables’ folder. All graph objects will be inside this folder.

You may also notice the name of the auto-generated fields. Although we can reference these fields with their short-name, for example, $node_id , the real name of the field includes a GUID. This short name is a pseudo-column and we can use it in our queries.

It is simple to insert the values into the node tables: We just ignore the pseudo-column ( $node_id ) and write simple INSERT statements. These will be our INSERTs:

INSERT ForumMembers values ('Mike'),('Carl'),('Paul'),('Christy'),('Jennifer'),('Charlie')
   
  INSERT INTO [dbo].[ForumPosts] 
             (
             [PostID]
             ,[PostTitle]
             ,[PostBody]
                   )
       VALUES
          (8,'Intro','nice to see all here!'),
          (7,'Intro','I''m Mike from Argentina'),
           (6,'Re:Geography','I''m Mike from Argentina'),
          (5,'Re:Geography','I''m Jennifer from Brazil'),
           (4,'Geography','Im Christy from USA'),
                (3,'Re: Intro','Hey Paul This is Christy'),
             (1,'Intro','Hi There This is Carl')
                   (2,'Intro','Hello I''m Paul')

You will see the result of SELECTing the records from the ForumPosts table in the image below. You may notice the field $node_id , a JSON field including the type of the entity and an integer id that is automatically generated.

Creating the Edge tables

The edge tables are very simple to create. Edge tables can have properties that would be regular fields in the table, however that’s not part of our example. This will be the script to create our edge tables:

Create table dbo.[Written_By]
  as EDGE
   
  CREATE TABLE [dbo].[Likes]
  AS EDGE
   
  CREATE TABLE [dbo].[Reply_To]
  AS EDGE

Each edge table has three pseudo-columns that we need to deal with:

  • $edge_id: The id of the edge record
  • $from_id: One of the nodes in the edge record
  • $to_id: The other node in the edge record

Noticing this definition, you may be asking: “Oh, ‘One of the nodes’ is quite vague, isn’t it?”. Yes, sure! That’s an important point: We need to define, in a logical way, what the $to_id and $from_id fields for each edge will mean. You may also be noticing that the name of the edge tables already defines the $to_id and $from_id sides. This is a logical choice of the sides and a good use of the names to make things easier for us.

These will be our logical definitions:

Written_By:

$from_id will be the post

$to_id will be the member

Likes:

$from_id will be who likes

$to_id will be who/what is liked

Reply_To:

$from_id will be the main post

$to_id will be the reply to the main post

There is no technical restriction about these choices, but we need to keep them when inserting new records, never mixing the meaning of each side of the relation.

Architectural Notes

Besides the three pseudo-columns, all edge tables have some more fields, all of them are hidden fields.

We can see the hidden definition in the field properties and these hidden fields will not appear in query results.

Inserting Edge records

The insert statement for the edge tables needs to fill both sides of the edge, $From_id and $To_id . These fields need to be filled with the $node_id value of the record of each side.

For example, to relate a member with a forum post, the ‘Written_By’ record will have the $node_id value of the post in the $From_id field and the $node_id value of the member in the $To_id field.

These will be the inserts:

Insert into Written_By ($to_id,$from_id) values 
   (
   (select $node_id from dbo.ForumMembers where MemberId= 1 ),
   (select $node_id from dbo.ForumPosts where PostID=8 ) 
   ),
   (
   (select $node_id from dbo.ForumMembers where MemberId=1  ),
   (select $node_id from dbo.ForumPosts where PostID=7 ) 
   ),
   (
   (select $node_id from dbo.ForumMembers where MemberId= 1 ),
   (select $node_id from dbo.ForumPosts where PostID= 6) 
   ),
   (
   (select $node_id from dbo.ForumMembers where MemberId=5  ),
   (select $node_id from dbo.ForumPosts where PostID=5 ) 
   ),
   (
   (select $node_id from dbo.ForumMembers where MemberId=4  ),
   (select $node_id from dbo.ForumPosts where PostID=4 ) 
   ),
   (
   (select $node_id from dbo.ForumMembers where MemberId=3  ),
   (select $node_id from dbo.ForumPosts where PostID=3 ) 
   ),
   (
   (select $node_id from dbo.ForumMembers where MemberId=3  ),
   (select $node_id from dbo.ForumPosts where PostID=1 ) 
   ),
   (
   (select $node_id from dbo.ForumMembers where MemberId=3  ),
   (select $node_id from dbo.ForumPosts where PostID=2 ) 
   )

Architectural Notes

It’s complex to build INSERTs like these, isn’t it? We will want to use an object framework with support for graph objects in the database. Today entity framework hasn’t got this feature, but we can expect to see something new about this in a while, either as a new feature for entity framework or as a new object model to map graph objects.

Let’s insert the replies records:

INSERT Reply_To ($to_id,$from_id) 
   VALUES
   ((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 4),
         (SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 6)),
   ((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 1),
         (SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 7)),
   ((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 1),
         (SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 8)),
   ((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 1),
         (SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 2)),
   ((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 4),
         (SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 5)),
   ((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 2),

(SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 3))

Finally, a let’s insert a lot of likes:

INSERT Likes ($to_id,$from_id) 
   VALUES
   ((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 4),
         (SELECT $node_id FROM dbo.ForumMembers WHERE MemberID = 1)),
   ((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 7),
         (SELECT $node_id FROM dbo.ForumMembers WHERE MemberID = 2)),
   ((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 8),
         (SELECT $node_id FROM dbo.ForumMembers WHERE MemberID = 2)),
   ((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 2),
         (SELECT $node_id FROM dbo.ForumMembers WHERE MemberID = 2)),
   ((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 5),
         (SELECT $node_id FROM dbo.ForumMembers WHERE MemberID = 4)),
   ((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 6),
         (SELECT $node_id FROM dbo.ForumMembers WHERE MemberID = 4)),
   ((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 2),
         (SELECT $node_id FROM dbo.ForumMembers WHERE MemberID = 1)),
   ((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 7),
         (SELECT $node_id FROM dbo.ForumMembers WHERE MemberID = 3)),
   ((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 8),
         (SELECT $node_id FROM dbo.ForumMembers WHERE MemberID = 3)),
   ((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 4),
         (SELECT $node_id FROM dbo.ForumMembers WHERE MemberID = 5))

The ‘Likes’ edge illustrates the capabilities of the edge feature rather well. We just inserted several relations between members and posts, but we now decide that in the application a member can also like another member. No problem, we can use the same edge to relate one member to another. In the relational model we need two tables for this, in the graph model, only a single edge.

Let’s insert more likes, now between members of the forum:

INSERT Likes ($to_id,$from_id) 
   VALUES
   ((SELECT $node_id FROM dbo.ForumMembers WHERE MemberID = 1),
         (SELECT $node_id FROM dbo.ForumMembers WHERE MemberID = 2)),
   ((SELECT $node_id FROM dbo.ForumMembers WHERE MemberID = 3),
         (SELECT $node_id FROM dbo.ForumMembers WHERE MemberID = 2)),
   ((SELECT $node_id FROM dbo.ForumMembers WHERE MemberID = 1),
         (SELECT $node_id FROM dbo.ForumMembers WHERE MemberID = 4)),
   ((SELECT $node_id FROM dbo.ForumMembers WHERE MemberID = 5),
         (SELECT $node_id FROM dbo.ForumMembers WHERE MemberID = 4))

The Good: Building queries over the Graph Model

T-SQL brings some new syntax elements into use to query graph tables. We have a special statement to relate nodes and edges in a SELECT.

Let’s do a walkthrough to build a query that retrieves all the posts and their replies:

  1. We will retrieve two posts in each record, the post and the reply, so we will need two references to the ‘ForumPosts’ table in the FROM clause of the select. Let’s do it with some meaningful aliases:
    FROM dbo.ForumPosts ReplyPost, dbo.ForumPosts RepliedPost
    

    Although we can, of course, use any alias we choose, it is best to use meaningful aliases when working with graph objects.

  2. We need the relation between the posts, the relation is the table ‘Reply_to’ .
    FROM dbo.ForumPosts ReplyPost, dbo.Reply_to, dbo.ForumPosts RepliedPost
    
  3. In the WHERE clause, we need to relate all the tables. We can do this with the new MATCH clause:
    FROM dbo.ForumPosts ReplyPost, dbo.Reply_to, dbo.ForumPosts RepliedPost
    WHERE MATCH(ReplyPost-(Reply_to)->RepliedPost)
    

    The syntax is interesting: A single “-” (dash) means a relation with the $From_id field of the edge, and a dash and greather-than (->) means a relation with the $To_id field of the edge

  4. Knowing which alias has the reply and which alias has the replied post, we can build the field list of the query.
    -- Posts and their replies
       select RepliedPost.PostId,RepliedPost.PostTitle,
       ReplyPost.PostId as ReplyId, ReplyPost.PostTitle as ReplyTitle
       From dbo.ForumPosts ReplyPost, dbo.Reply_to, dbo.ForumPosts RepliedPost
       Where Match(ReplyPost-(Reply_to)->RepliedPost)
    

    The same query in the relational model would be like this:

    select RepliedPost.PostId,RepliedPost.PostTitle,
       ReplyPost.PostId as ReplyId, ReplyPost.PostTitle as ReplyTitle
      from Forum.ForumPosts ReplyPost, Forum.ForumPosts RepliedPost
      where ReplyPost.PostId=RepliedPost.ReplyTo
    

    These queries are very similar, although it’s possible to argue that the MATCH syntax is easier to understand

  5. We can already execute this query. The result will be like the image below:

  6. Let’s include the name of the member who wrote the replied post. We need to include the ‘ForumMembers’ node and the ‘Written_By’ edge in the FROM clause. The new FROM clause will be like this:
    FROM dbo.ForumPosts ReplyPost, dbo.Reply_to, dbo.ForumPosts RepliedPost, dbo.ForumMembers RepliedMember, Written_By RepliedWritten_By
    
  7. Include the relation inside the MATCH clause:
    WHERE MATCH(ReplyPost-(Reply_to)->RepliedPost-(RepliedWritten_by)->RepliedMember)
    
  8. Include the member name of the replied post in the field list of the SELECT. The final query will be this:
    -- Posts and members and their replies
       SELECT RepliedPost.PostId,RepliedPost.PostTitle,RepliedMember.MemberName,
       ReplyPost.PostId as ReplyId, ReplyPost.PostTitle as ReplyTitle
          FROM dbo.ForumPosts ReplyPost, dbo.Reply_to, dbo.ForumPosts RepliedPost, 
            dbo.ForumMembers RepliedMember, Written_By RepliedWritten_By
       WHERE MATCH(ReplyPost-(Reply_to)->RepliedPost-(RepliedWritten_by)->RepliedMember)
    

    The same query against the relational model:

    SELECT RepliedPost.PostId,RepliedPost.PostTitle,
       ReplyPost.PostId as ReplyId, ReplyPost.PostTitle as ReplyTitle, 
       RepliedMember.MemberName
      FROM Forum.ForumPosts ReplyPost, Forum.ForumPosts RepliedPost, 
       Forum.ForumMembers RepliedMember
      WHERE ReplyPost.PostId=RepliedPost.ReplyTo
            and RepliedPost.OwnerId=RepliedMember.MemberId
    

    The image below shows the result:

  9. We still need to include the member name of the reply post. Again, we need to include the ‘ForumMembers’ and the ‘Written_By’ tables in the FROM clause:
    From dbo.ForumPosts ReplyPost, dbo.Reply_to, dbo.ForumPosts RepliedPost,
    dbo.ForumMembers RepliedMember, Written_By RepliedWritten_By,
    dbo.ForumMembers ReplyMember, Written_By ReplyWritten_By
    
  10. The next step is correcting the MATCH clause. ‘ReplyMember’ needs to be related to ‘ReplyPost’ , however, how to do this relation without breaking the others? Let’s do in a different direction:
    WHERE MATCH(ReplyMemberRepliedPost-(RepliedWritten_by)->RepliedMember)
    

    Notice the <- symbol. It’s in opposite direction than -> , however it’s the same meaning: a relation between $to_id of the edge table with the node table.

  11. Finally, let’s include the name of the member who wrote the reply in the field list and execute the query:
    -- Posts and members and their replies and members
       SELECT RepliedPost.PostId, RepliedPost.PostTitle,RepliedMember.MemberName,
       ReplyPost.PostId as ReplyId, ReplyPost.PostTitle as ReplyTitle,
       ReplyMember.MemberName [ReplyMemberName]
          FROM dbo.ForumPosts ReplyPost, dbo.Reply_to, dbo.ForumPosts RepliedPost, 
            dbo.ForumMembers RepliedMember, Written_By RepliedWritten_By, 
            dbo.ForumMembers ReplyMember, Written_By ReplyWritten_By
          WHERE MATCH(ReplyMemberRepliedPost-(RepliedWritten_by)->RepliedMember)
    

    The result is the following:

    The same query against the relational model:

    SELECT RepliedPost.PostId,RepliedPost.PostTitle, 
             RepliedMember.MemberName, ReplyPost.PostId as ReplyId, 
          ReplyPost.PostTitle as ReplyTitle, ReplyMember.MemberName
      FROM Forum.ForumPosts ReplyPost, Forum.ForumPosts RepliedPost, 
           Forum.ForumMembers RepliedMember, Forum.ForumMembers ReplyMember
      WHERE ReplyPost.PostId=RepliedPost.ReplyTo
            and RepliedPost.OwnerId=RepliedMember.MemberId
            and ReplyPost.OwnerId=ReplyMember.MemberId
    

    At this point you may be noticing that according to the increasing of the number of relations the WHERE clause in the relational model will be much harder to read than the MATCH clause in the graph model.

    Let’s see some more interesting and useful queries against the graph model.

Count of Replies of each Post

-- Count of replies of each post
   SELECT distinct RepliedPost.PostID,RepliedPost.PostTitle,
          RepliedPost.PostBody,
          count(ReplyPost.PostID) over(partition by RepliedPost.PostID) 
              as TotalReplies
   FROM dbo.ForumPosts ReplyPost, Reply_To, dbo.ForumPosts RepliedPost
   WHERE MATCH(ReplyPost-(Reply_To)->RepliedPost)

In this one we have the count of replies in each post, however only in a single level, not in a tree of replies.

List of Root Posts

We can get all the root posts without the MATCH syntax:

-- All the root posts
  SELECT Post1.PostId,Post1.PostTitle
  FROM dbo.ForumPosts Post1
  WHERE $node_id not in (select $from_id from dbo.Reply_To)

The MATCH syntax only allows us to relate three or more entities (two nodes and one relation). When we want to relate only two of them, we can do a regular join or sub-query, as the above.

Including a Level field in the result

It can be useful to include a ‘Level’ field to expose the structure as a tree. We have a syntax for this in T-SQL: The recursive CTE (Common Table Expression).

However, there is a catch: We can’t use the MATCH syntax over a derived table, in this case the CTE. We can, if needed, use the MATCH syntax in the CTE, but we can’t reference the CTE in it. This is a limitation.

We can, however, use the recursive CTE syntax without be MATCH syntax, using regular relations. The result will be this:

-- All replies with tree level
  with root as 
  ( select $node_id as node_id,RootPosts.PostId,
           RootPosts.PostTitle,
           1 as Level, 0 as ReplyTo
     from dbo.ForumPosts RootPosts
     where $node_id not in (select $from_id from dbo.reply_to) 
  union all
     select $node_id,ReplyPost.PostId, ReplyPost.PostTitle,
            Level+1 as [Level], root.PostId as ReplyTo
     from dbo.ForumPosts ReplyPost, reply_to, root
     where ReplyPost.$node_id=reply_to.$from_id 
           and root.node_id=reply_to.$to_id
  )
  select PostId,PostTitle, Level, ReplyTo
  from root

Retrieve all replies of a single post

Using the recursive CTE syntax, we can retrieve all the replies of a single post in a tree style. If we try to retrieve all replies of post 1 using a regular syntax we won’t retrieve the post 3 and we should, because post 3 is a reply to post 2, which is a reply to post 1.

We can only retrieve post 3 when querying the replies of post 1 using the recursive CTE syntax.

This will be the query to retrieve all replies of a single post:

-- All replies of a single post
  with root as 
  ( select $node_id as node_id,RootPosts.PostId,RootPosts.PostTitle,
           1 as Level, 0 as ReplyTo
     from dbo.ForumPosts RootPosts
     where PostId=1  
  union all
     select $node_id,ReplyPost.PostId, ReplyPost.PostTitle,
            Level+1 as [Level],root.PostId as ReplyTo
     from dbo.ForumPosts ReplyPost, reply_to, root
     where ReplyPost.$node_id=reply_to.$from_id 
           and root.node_id=reply_to.$to_id
  )
  select PostId,PostTitle, Level, ReplyTo
  from root

We can also do it in reverse, in order to retrieve all parent posts of a single post in a tree structure. For this we need to do the relations inside the CTE to retrieve the posts and at the outside do some OUTER JOINs to retrieve the parent of each post. It needs to be an OUTER JOIN because the main post has no parent. It also needs to be in the outside of the CTE because the recursive part of the CTE doesn’t support OUTER JOINs.

This brings another point to our attention: There isn’t an OUTER in the MATCH syntax.

-- All the parents of a single post, with reference to their parents
  with root as 
  ( select LeafPost.$node_id as node_id,LeafPost.PostId,
           LeafPost.PostTitle
     from dbo.ForumPosts LeafPost
     where LeafPost.PostId=3  -- Single post
  union all
     select RepliedPost.$node_id as node_id,RepliedPost.PostId, 
            RepliedPost.PostTitle
     from dbo.ForumPosts RepliedPost, Reply_to, root
     where root.node_id=Reply_to.$from_id 
           and Reply_to.$to_id=RepliedPost.$node_id
  )
  select root.PostId,root.PostTitle,
         RepliedPost.PostId ParentPostId
  from root
  left join reply_to 
       on root.node_id=reply_to.$from_id
  left join dbo.ForumPosts RepliedPost 
       on reply_to.$to_id=RepliedPost.$node_id

Recovering the posts of a single user

Recovering the information of a single user, instead of a post, is easier because there isn’t a tree in this situation:

-- All Post replied by Peter
   SELECT distinct RepliedPost.PostID,RepliedPost.PostTitle,
           RepliedPost.PostBody
   FROM dbo.ForumPosts ReplyPost, Reply_To, dbo.ForumPosts RepliedPost,
        dbo.ForumMembers Members,Written_By
   WHERE MATCH(MembersRepliedPost)
   and Members.MemberName='Peter'
  -- All replies made by Peter
   SELECT ReplyPost.PostID,ReplyPost.PostTitle,ReplyPost.PostBody, 
          RepliedPost.PostId ReplyTo
   FROM dbo.ForumPosts ReplyPost, Reply_To, dbo.ForumPosts RepliedPost,
        dbo.ForumMembers Members,Written_By
   WHERE MATCH(MembersRepliedPost)
   and Members.MemberName='Peter'

You may notice the only difference between the queries above is the field list and the use of DISTINCT. The DISTINCT is needed because Peter can reply the same post more than once.

Retrieving the Likes from the model

The queries to retrieve the ‘Likes’ are interesting: The ‘Likes’ edge has relations between members and between members and posts. Each of these selects brings only one of these kinds of relation and ignores the other.

-- All posts that have likes and who liked them
  SELECT Post.PostID,Post.PostTitle,Member.MemberName
  FROM   dbo.ForumPosts Post, Likes, 
         dbo.ForumMembers Member
  WHERE MATCH(Member-(Likes)->Post)
  -- All members that have likes and who liked them
  SELECT Member.MemberId,Member.MemberName LikeMember,
         LikedMember.MemberName LikedMemberName
  FROM dbo.ForumMembers Member, Likes, dbo.ForumMembers LikedMember
  WHERE MATCH(Member-(Likes)->LikedMember)

It is also easy to aggregate the information to get the total of likes of each post or each member.

-- Total likes of each Post
  select Post.PostId,Post.PostTitle,
         count(*) totalLikes
  from dbo.ForumPosts Post,Likes,
       dbo.ForumMembers Members
  where Match(Members-(Likes)->Post)
  group by PostId,PostTitle
  -- Total likes of each Member
  select LikedMembers.MemberId,LikedMembers.MemberName,
         count(*) totalLikes
  from dbo.ForumMembers Members,Likes,
       dbo.ForumMembers LikedMembers
  where Match(Members-(Likes)->LikedMembers)
  group by LikedMembers.MemberId,
           LikedMembers.MemberName

Members who liked and replied to the same post

We can build even some more interesting queries. For example, who are the members who liked and replied to a post?

This is the query:

-- Members who liked and replied on same post
  SELECT Member.MemberName,Member.Memberid,
         LikedPost.PostId,LikedPost.PostTitle,
         ReplyPost.PostTitle ReplyTitle
  FROM dbo.ForumPosts LikedPost, Reply_To, dbo.ForumPosts ReplyPost, 
       Likes, dbo.ForumMembers Member, Written_By
  WHERE MATCH(Member-(Likes)->LikedPostMember)

You may notice the use of the node ‘Member’ two times in the same MATCH expression. This creates a kind of a filter: The member who liked the ‘LikedPost’ needs to be the same member who wrote the ‘ReplyPost’ .

This one, over the relational model, would be like this:

select Likes.MemberId,Members.MemberName
  from Forum.Likes Likes, Forum.ForumPosts Posts, 
       Forum.ForumMembers Members 
  where Likes.MemberId=Posts.OwnerId 
   and Posts.ReplyTo=Likes.PostId
   and Members.MemberId=Likes.MemberId

It seems for me more difficult to build and read over the relational model.

Members who replied to multiple posts

SELECT Members.MemberId, Members.MemberName, 
         Count(distinct RepliedPost.PostId) as Total
  FROM   dbo.ForumPosts ReplyPost, Reply_To, dbo.ForumPosts RepliedPost,
         Written_By,dbo.ForumMembers Members
  WHERE  MATCH(MembersRepliedPost)
  GROUP BY MemberId, Members.MemberName
  Having Count(RepliedPost.PostId) >1

Members who replied multiple times to the same post:

SELECT Members.MemberId, Members.MemberName,
         RepliedPost.PostId RepliedId,count(*) as TotalReplies
  FROM   dbo.ForumPosts ReplyPost, Reply_To, dbo.ForumPosts RepliedPost,
       Written_By,dbo.ForumMembers Members
  WHERE MATCH(MembersRepliedPost)
  GROUP BY MemberId,MemberName,RepliedPost.PostId
  Having count(*) >1

The only differences between the two above are the field list and aggregation.

Missing features for queries

You probably don’t like the requirement to use recursive CTE to retrieve a tree of nodes. That’s because a feature called transitive closure, usually present in graph databases, isn’t available in this first version.

The ability to find any type of node connected to a single node, for example, all replies and likes of a post, is called polymorphism and isn’t available in this version. We can use UNION to solve this for small sets of node types.

Some graph-specific functions, such as the ability to find the ‘shortest path’ between two forum members, isn’t available in this version.

The Bad: Edge records validation

As well as all the missing features, with a lot of room for improvement, I found problems with the validation of the relations inserted in the model. You may have noticed that it’s too easy to create an edge table. That’s because we don’t need to identify the possible relations of that edge.

This creates some validation problems. For example, the ‘Reply_To’ edge: A forum message can be a reply to another single forum message, however, I shouldn’t be able to insert the same forum message as a reply of two others.

The model doesn’t validate this. We shouldn’t be able to execute the insert below because the message ‘3’ is already a reply to message ‘2’. However, the insert will work:

Insert Reply_to Values
  ((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 3),
         (SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 4))

After trying the insert, delete the inserted record:

delete reply_to
  from reply_to, dbo.forumposts replypost,
       dbo.forumposts repliedpost
  where MATCH(replypost-(reply_to)->repliedpost)
        and replypost.postid=3 
        and repliedpost.postid=4

How can we solve this problem?

The graph calculated field doesn’t support constraints. The only solution for this problem is the creation of triggers over our edge tables. In this example, we will need a trigger for INSERT and UPDATE.

The following trigger will solve the problem:

create trigger chkReplies on dbo.Reply_to
  for insert,update
  as
  begin
      if exists( select 1 from inserted 
   where 
   (SELECT count(*)
   FROM dbo.ForumPosts ReplyPost, Reply_To,
                                  dbo.ForumPosts RepliedPost
   WHERE MATCH(ReplyPost-(Reply_To)->RepliedPost)
   and ReplyPost.$node_id=inserted.$from_id)
       >1 )
   begin
       raiserror('One message can only be reply to another single message',13,1)
   rollback transaction
   end
  end

Try the INSERT again and this time you will the error message:

Insert Reply_to Values
  ((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 3),
         (SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 4))

Deleting Parent records with Children

This is another interesting problem. The relation isn’t enforced, so we can delete parent records even when they have children. You can try this:

Begin Transaction
  -- This delete works, unfortunately.
  Delete dbo.forumposts where postid=1
  -- Posts and their replies
   select RepliedPost.PostId,RepliedPost.PostTitle,
   ReplyPost.PostId as ReplyId, ReplyPost.PostTitle as ReplyTitle
   From dbo.ForumPosts ReplyPost, dbo.Reply_to, dbo.ForumPosts RepliedPost
   Where Match(ReplyPost-(Reply_to)->RepliedPost)

You will notice that the result in the SELECT will have several NULL values because of the deleted record. We can’t even insert the record again, because the relation is done by the $node_id value, which is auto-generated.

Undo the change with a rollback:

Rollback transaction

We can’t solve this with a simple relationship, such as foreign key constraint, first because the graph pseudo-columns don’t support constraints and also because this would be an impossible solution for and edge such as ‘Likes’ , which accepts multiple types of nodes.

Again, the solution is a trigger:

create trigger chkNotOrphanReply on dbo.ForumPosts
  for delete
  as
     if exists(select 1 from deleted
   where
   (SELECT count(*)
   FROM Reply_To
   WHERE $to_id=deleted.$node_id)
       >0 
   )
   begin
   raiserror('This message can''t be deleted because it has replies',13,1)
   rollback transaction
   end

After creating the trigger, execute this block of code again and you will see the error message:

Begin Transaction
  -- This delete works, unfortunatelly.
  Delete dbo.forumposts where postid=1
  -- Posts and their replies
   select RepliedPost.PostId,RepliedPost.PostTitle,
      ReplyPost.PostId as ReplyId, 
      ReplyPost.PostTitle as ReplyTitle
   From dbo.ForumPosts ReplyPost, dbo.Reply_to, 
        dbo.ForumPosts RepliedPost
   Where Match(ReplyPost-(Reply_to)->RepliedPost)

One Forum post shouldn’t like another forum post

The ‘Likes’ edge is much more flexible, however has also its problems. We shouldn’t be able to insert a like between two forum posts, but the model will accept it. The following insert will work:

--This insert should not be accepted, but it is
  INSERT Likes ($to_id,$from_id) VALUES
  ((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 8),
         (SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 7))

After trying the insert above, delete the inserted record and any like between posts:

-- Deleting
  delete likes 
  from likes, dbo.forumposts post1,
       dbo.forumposts post2
  where MATCH(Post1-(likes)->Post2)

Again, the solution will be a trigger. However, this time we need to check the type of record inserted in the relation. We can use the function JSON_VALUE to extract this information. The JSON_VALUE function can extract information from any JSON field, including the calculated graph fields.

For example, the following SELECT will extract the type of each record inserted in the $from_id field part of the like:

select json_value($from_id, '$.schema') as [schema],
         json_value($from_id, '$.table') as [table] 
   from likes

We can use this function in a trigger for insert/update. The trigger will be this:

Create trigger chkLikeType on Likes
  for Insert,Update
  as
  if exists (select 1 from inserted
   where json_value($from_id, '$.schema')'dbo' or
     json_value($from_id, '$.table')'ForumMembers')
   begin
   raiserror('Only forum members can like a post or another member',13,1)
   rollback transaction
   end

After creating the trigger, you can try the insert again, it will not be accepted:

--This insert should not be accepted, but it is
  INSERT Likes ($to_id,$from_id) VALUES
  ((SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 8),
         (SELECT $node_id FROM dbo.ForumPosts WHERE PostID = 7))

Missing Validation Features

All these problems should be solved with some kind of validation syntax on the creation of edge nodes. We should be able to constrain the types of entities that are allowed in the edge node and whether we will allow more than one record with the same entity.

This syntax would avoid the need for triggers. The support for a range of constraints would make the system much more robust.

Conclusion

It’s a first version, SQL Graph in SQL Server 2017 is very promising, even though there are a lot of missing features at this stage. There is plenty to do before this becomes usable and robust, but there is enough so far to be hopeful that Microsoft can deliver a fully-functional graph database within SQL Server

  • At the moment, there is little or any information about how soon we can expect to see the tools that a development is likely to need. We need an object model with support for this, Entity Framework doesn’t support these kind of database objects and I’m not sure if the best option really is to join relational and graph models in the same object model.
  • We need a tool to design this model. Besides the needed support for validation in the syntax of edge nodes, a tool to design this, such as database diagram is for relational models, will be very useful.

After reading this article, probably would like to check:

  • The documentation and architecture of this model. This link will be useful
  • More examples. This link will be useful, it’s a bigger sample and the only link I found talking also about the performance of the graph model
  • More references and other links . This summary of several references will be useful link.
Simple Talk稿源:Simple Talk (源链) | 关于 | 阅读提示

本站遵循[CC BY-NC-SA 4.0]。如您有版权、意见投诉等问题,请通过eMail联系我们处理。
酷辣虫 » 后端存储 » SQL Graph Objects in SQL Server 2017: the Good and the Bad

喜欢 (0)or分享给?

专业 x 专注 x 聚合 x 分享 CC BY-NC-SA 4.0

使用声明 | 英豪名录