You are reading content from Scuttlebutt
@mix

Forking / Nested threads

Hey y'all I'm building nested threads a client project. I've chatted with @dominic and @mikey some about how best to do this, but want to check if there are any major objections to this proposed implementation.

This is what I want - the first post (the root) is (A), and then there are a bunch of replies to that (B), (C), (D). These are all in some time order. (E) is a reply to a particular message, (C). A person might to this to make a targeted reply, or to address a side point. You might call this a fork, or a nest.

20171017_215119.jpg


Here's the best practice implementation of the simple part of that - i.e. without forking / nesting (to my knowledge). We need to remember that people posted messages at different times and may not have had the same context.

To track the context in which we're posting, we record two things :

  • root - this is the first message in the whole thread. By always recording this, it makes getting the whole tree of a thread as easy as querying for all messages with the same root.
  • branch - this is a term borrowed from git, it means something like the current branch state of the tree. Turns our you don't need to post all the messages before hand, just the most recent known messages (more on this below)

20171017_214732.jpg

In this thread (A) is the root message, and two different users post messages (B) and (C) , but they post before seeing each others message. So they both declare :

{ 
  root: A, 
  branch: A
}

Some time later, message (D) is posted, and they have seen all the messages before them because they've been gossiping. So this message declares :

{
  root: A,
  branch: B, C
}

Thereby signalling the messages before them / the context they were speaking into.

Good so far?

@mix

Notation for forking / nesting

Ok so someone wants to nest a comment under (C), or reply to just that, or fork the conversation:

20171017_214851.jpg

We post (E) with a new property called fork - this is declaring a new root for this sub-thread.

{ 
  root: A,
  branch: C,
  fork: C
}

Considerations:

  • we can only get fast thread queries if we don't touch the root property (this suggests a new key)
  • fork is an ok name - it's short, it's related to conversation forks, and it's similar enough to branch that it will force you to know the difference between branch state and fork :sweat_smile:
  • if (E) was posted much later, would we have had branch = D ? ssb-sort which can calculate branches might need patching / testing
@mix

cc @cel @matt @ev @regular @dominic @mikey
would love thoughts on this (I'm about to iomplement forks / nested comments)

@keks

We need to remember that people posted messages at different times and may not have had the same context.

You could add a field that specifies an array of message ids with the same branch that has been seen before.
So for the top picture let's say we want to post B but do not know about A yet, B.prev = [] (and vice-versa, so A.prev=[] as well). But let's say the node posting D already received A and B, but not C, they post D.prevs=[A,B]. Finally, the author of C is quite late and already has A, B and D, so they only reference D in C.prev, because A and B are referenced transitively by D.
Does that make sense to you? It's more code, but I like the idea of having the messages' context specified.

I might add more comments later today

@keks

Ah, I see you want branch to behave like I described prev. I thought branch would only be the post you respond to, but then again, you usually respond several posts from the conversation, so it makes sense to just reply to all the posts.

@Cryptical Envelopment
Voted this
@andrestaltz

From the data point-of-view, this makes sense to support no matter how we visualize it (or not). From the visualization/UX point-of-view, this is harder than it sounds. There was once a very cool startup called Branch.com (very hard to find screenshots from them nowadays) until they were acquired by you-know-who-social-media-behemoth.

So far I've seen these alternatives:

  • Linear thread, no forks, no inline citations
  • Linear thread, no forks, allows inline (clickable as links) citations
  • Tree-structured thread (Reddit, hackernews)
  • Tree-structured but only one branch linearly visible at a time (Branch.com)

Visually, I prefer the second option. But anyway, this type of support in the data layer will be useful even in that case.

:+1: @mix

@marco28
Voted this
@jake
Voted this
@ev
Liked this
@Anders
Voted this
@cel

This seems like it could help make thread forks discoverable. So you could still render a thread using links rel:root, but you could see which messages have a fork property and show a UI option to see the thread fork that they are declaring.

if (E) was posted much later, would we have had branch = D ?

In that case I would set branch to either:

  1. [C, D], because D is declaring it is after C and D
  2. C, because D is a direct reply to C
  3. omitted, because D is a direct reply to C and already links to C with the fork property
  4. left to the user to specify (defaulting to 1 or 2 above)

Should a reply to a message with a fork property (e.g. E) use the value of E's fork property as its root property? e.g.:

F: {
  root: C,
  branch: E
}

fork-F.jpg

How would you render the thread rooted at C? E does not link to C with rel root. I think you would have to also query for messages linking to C with rel fork or branch to get the full subthread. Or should E be the new root? e.g.:

G: {
  root: E,
  branch: E
}

fork-G.jpg

Should G link to C? Maybe with the fork property, inheriting it from E?
i.e. Should the fork property be inherited by replies in the forked thread (until there is another thread fork which would set a different value for the fork property)? Or should fork only be set when one wants to declare a new thread fork?

@mix @andre @keks

@mix

@cel I think for all messages (other than the root message), we should set root = A.

If you change the root (as in your two examples above) you are forcing recurssive querying.

Should a reply to a message with a fork property (e.g. E) use the value of E's fork property as its root property?
I think we should do

// a reply with a new declared root
F: {
  root: A,
  fork: C,    // this is the 'newRoot' of this thread
  branch: E
}

I wasn't clear on the scenario of G - at a guess it looks like you're making a fork off of E. I think I would code that as :

// a reply with a new declared root (which happens to have a new root, C)
G: {
  root: A,
  fork: E,    // this is the 'newRoot' of this thread
  branch: E
}

It seems like whichever way we go it would be good to have a module which queries a root (or fork) and builds a tree. Notice that if fork is declared on all messages, then it would be easy to get the subtree below that point.

@andrestaltz it will be interesting to see how general this particular datastructure is for the purposes of visualisation. I think we should have all the visualisations for all the different brains

@jiangplus
Voted this
@mikey
Voted this
@cel

you are forcing recurssive querying

When I fork a thread it is because I want to reply to a post or topic in a thread but consider it no longer relevant to the original thread. Therefore I would think the fork reply and/or its replies would not need to show up by default in the original thread - but having an indication for the thread fork (to click to see the subthread) I would see as useful. Having a fork property for the initial thread fork message would accomplish this. But perhaps I am misunderstanding the use case here.

@andrestaltz

Changing the root, as @cel mentioned, is inevitable in #against-consensus. Because it's possible, it will eventually happen. And it may make sense for the use case cel described: starting a new thread, ignoring the previous thread.

I was thinking about branch vs fork and I have a hard time understanding how they are conceptually different. I know the intent is to indicate a new (forked) thread, but I think that is not the raw data, that's knowledge derived from the raw data.

With raw data, I mean that threads are nodes in a directed acyclic graph (DAG), where edges are the branch relationship. And we can assume the DAG has only one root, for now. Rendering the threads is about finding a linearization of the DAG, with topological sorting:

images.duckduckgo.com.png

That's what ssb-sort does. If the DAG has multiple "heads" (ssb-sort terminology), then there are multiple threads, and we can render one thread per head. In the diagram above, there is only one head. To do the rendering, we need to know which nodes to ignore, and that should happen before ssb-sort.

Here's another case where we need to know which nodes to ignore. If someone changes the root field, we're dealing with a subgraph of the DAG, and this sub-DAG may have multiple roots. And we want to ignore the subgraphs stemming from other roots.

Example below:

Screen Shot 2017-10-18 at 12.56.05.png

Given the two heads J and K, we want to render just the J (left side) subgraph as a thread. How do we know that we should ignore every node on the right? Traversing the graph bottom-up? Then, suppose that F,G,J have root: D. Do we also traverse the graph bottom-up and stop when D is reached? Is there no other way than doing recursive traversal bottom-up? (Although I think one could convert "recursive" to iterative)

Another question: given all this above, what does the new fork actually... represent? A special type of edge on the graph? And what would that edge inform to ssb-sort and to the bottom-up traversal?

@andrestaltz

Oh, another question: when B and C arrived after A, this could have happened for two reasons:

  • B and C were concurrent and due to gossiping, they were not aware of each other when they were authored
  • B and C happened "at very different times", so they were aware of each other's existence, but explicitly decided not to do B=>C nor C=>D, instead they explicitly want to fork the conversation

How do we know that we are dealing with the first case or the second case? Does it make sense to even separate these two cases if someone may come along and anyway just make message D that links branch: [B,C]?

@andrestaltz

The more I think about this, the more questions (not answers :smile:) I get.

What if the split B / C happened due to network partition (e.g. two different LAN parties, both discussing topic A). When these two worlds are brought together, should the conversation be considered an accidental fork (due to unintended network partition) or intentional fork?

@andrestaltz

And by accidental fork, I mean that the SSB client would like to merge those two worlds together. Accidental mini-forks already happen, and we already use mini-merges. For instance, in my blue DAG above, F and G are an accidental minifork and are merge with J. Should we allow macromerges for accidental macroforks? That would be a message L merging J and K.

@mix

Is bottom up the current implementation ?

re: not-aware vs aware, using an explicity property like fork would signal intention to fork.

someone making D: { branch: [B,C] } means that anyone who see D can just post E: { branch: D } and they have reduced the divergence, back into a single 'head' (leading node of the conversation).

re accidental fork you be able to reconstruct that there were different contrexts by following the listed branches. If you follow J back up the tree and find it is not linked to C (and there was no fork mentioned) then there was an accidental fork.

...
damn it you're typing faster than I can respond @andrestaltz !

@mix

L shouldn't merge J and K if D and E were declared deliberate forks ?

@andrestaltz

Shouldn't, but what if it does due to #against-consensus ?

@andrestaltz

Is bottom up the current implementation ?

As far as I understood from ssb-sort, it doesn't traverse the graph much, it just checks some edges of the graph to count some links. Otherwise it maintains all the graph in memory and creates a separate dictionary data structure counts with additional metadata on the graph. Then it iterates over all the nodes. Sorts the nodes, as an array.

@mix

@cel whether this is a github fork or a fork in the conversational track, or a nested thread, I think declaring a fork explicitly would can make it clear that there's a difference, and then it's up to the client how to lay that out - whether they are nest or in totally different spaces.
One question about forking is which message is the fork head - is it the message in the main thread which had some other-topic elements, or is it the one that replies to that and draws out that sub-theme into another space?

I think it should be the one in the main stream that is the 'root of the fork' - because that message probably has progenitor context. If you didn't want to do that you would start a totally new thread and say "I was thinking about this thing in response to %link".

I think the use case is meant to be flexible to either nesting or forking ...

@andrestaltz

I almost feel like there should be some academic paper on topologically sorting threads in a distributed untrusted environment. Maybe if we search properly we'll find something because SSB logs aren't that different to generic "logs".

@mix

yeahhh .. #against-consensus is cool, but in reality I'm talking to y'all because i want #pseudo-consensus ... the way language and patterns work is that if enough people use them, they're useful.

I'm aware that with some of this stuff, if we diverge too much that the clients are going to have to cover painful amounts of ground... unless we have modules for all the different possible ways people are using posts. Already the clients diverge in implementation - patchbay is fucking up some patchwork threads and vice versa. fun!

@Dominic

okay, we are current talking about 3 different ways something can "fork"

@mixmix is talking about a thread with nested replies, a la reddit or hackernews. It's all displayed on the screen together, but the subthreads are nested under the specific comment they reply to. mix wanted to call this "fork" I suggested "subthread". Since the intention is to display everything on one screen, I suggested everything using the same root so all messages can be retrived in a single query.

@cel is talking about creating a new thread where the root is a comment in another thread. This was a behaviour that patchbay@7 was easy to do by accident, but patchfoo has as a specific feature. The intention here is to not display the whole thing on one screen (although, maybe show the other discussion as a backlink)

@andrestaltz is talking about concurrency. you can reply to a thread while offline, and your branch will only point back to the messages you have seen. if you later come back online, receive messages and reply again, branch will be an array because the concurrent parts of the thread are now merged.
A peer could certainly pretend that they never saw a message (maybe they blocked it's author). It would probably still be merged by another message replying to it. branch is used to help figure out the best order to display messages (although there is nothing that actually shows two messages where concurrent in the interface).

branch points backwards to all the messages in the thread, which means you are signing the entire thread, not just the root. This will be essential for out-of-order messages because you'll be able to traverse those messages and securely get everything in that thread. (ps. gatherings isn't doing this correctly at the moment)

@mix

will read yours when back @dominic


just got off the phone with @matt - he points out that changing the root is fine, because the backlinks plugin makes getting the who tree a piece of cake and not costly at all.

to clarify - we don't need a seperate fork key to do fast lookups.

all good ?

@mix

@dominic - mix is talking about nested and cel's 'seperate fork'. I believe both can be handled at the same time and the UI can resolve how to display.

thanks for the note on gatherings, @piet and have been talking about looping back and doing more work on that.

@mix

@matt trying implemented with backlines.
Notice feed.obs.thread provides replies, but these are only one deep (if we're mutating the root to fork).

This means I have to do this for every comment :

const backlinks = api.backlinks.obs.for(msg.key)

const nestedReplies = computed(backlinks, backlinks => { 
  return backlinks.filter(backlinker => {
    const { type, root } = backlinker.value.content
    return type === 'post' && root === msg.key      
  })
)

This feels like an ok first pass solution, but has me wondering whether we should have something in patchcore for this, or whether this is clumsy?

@mmckegg

@mix

In patchwork I handle this in modules/message/html/backlinks.js.

Yeah it could be handled in patchcore somewhere. I think I thought about doing it there but it made things more complicated for some reason I can't remember. We can talk about this tonight.

@mikey

@mixxx and i just discussed the difference between "nested reply" and "forks", based on my feelings without catching up on the latest gossip here.

in the "nested reply", you want to post within the current conversation context in reply to a specific post.

in the "fork", you want to create a new conversation context, based on inspiration from a specific post. an example of this is the "like or not to like" thread.

here's a diagram, where E is a "nested reply" and F is a "fork".

forks-and-nested-replies.jpg

@Dominic

ssb-backlinks is just a single index. You can query all the messages that point to a given message, and it's a single disk seek then returns results. If any message can be a root also, you must do one query, which could return N messages, then you must do N more queries. Even if those messages don't have any backlinks, and the query answers "none" that still has a cost. doing another empty query for every item in a successful query is much slower than just doing one query.

We had this design in patchwork@2 (see legacy api: relatedMessages) it made some big threads really slow to load.

@Dominic

in the image @dinosaur just posted, the fork "F" is a totally different thing.
To reflect patchfoo's behaviour, F.root=C; G.root=C; G.branch=F

@mix

@dominic I find that last part confusing - is that leaning on branch to specify a last message ? My conception of branch was that it's more of an eventually consistent thing that you should not touch manually (as in let ssb-sort figure out the branch heads)

@Dominic

@mixmix yes branch means the last message(s). In the case of concurrent messages (someone replies while offline), there will be more than one branch. This is just about the causal order, and isn't used for rendering, but this will become important from a distributed system perspective, so we need to get it right now.

@mix

talked with @dominic on the phone.
Going forward with implementing @mikey 's suggestions (just the nested reply part first), with the correction that E should be

E = {
  root: A,
  branch: D,   // assuming they've seen A-D
  fork: B
}
@mix

seems like a decent idea! (test of the new nested reply style)

@mix

taly ho!

@cinnamon

Naive question:

What if posts just had a single field called "parent" which pointed only to the immediate parent of that post? Why do we need multiple fields (branch, root, fork)?

This would make a simple tree structure. Not even a DAG since each post can only have one parent. It should be easy to follow the chain of parents up to find the root message when you need to know it. And a "top level" or "root" message is simply on that has parent=null.

I'm concerned that the complexity of three fields (branch, root, fork) will inevitably result in clients using them differently, making it very hard to know how to display them.

@cinnamon

Ah, I guess this would lead to nested replies like hacker news / reddit, though it could be displayed in various ways.

If we want it to feel like a chatroom which has a single thread although people post when out of sync, then a post might want to somehow indicate which other posts were known at the time it was written.

But that can just be done by allowing parents to be an array, so we get a DAG where posts can pull the leaves of the tree back together and "resolve forks". I don't think we need multiple kinds of fields (root, branch, fork).

Forgive me if I'm speaking without enough context :)

@neftaly

This is a WIP for work, so it's not perfect, but here's an example of a nice big parent-pointer tree (as well as an index and a collection of orphaned sub-trees). You could very easily implement multiple parents in a big efficient cyclic graph for free, thanks to persistent data structures.

const entries = [
  {
    id: 'hash of { meta, data } prop',
    meta: {
      parent: 'parentId'
    },
    data: {
      ...
    }
    ...
  },
  ...
];

const tree = R.reduce(
  hydrate,
  immutable.fromJS({
    root: undefined,
    index: {},
    orphans: {}
  }),
  entries
);
import { fromJS, Map, List } from 'immutable';
import * as R from 'ramda';

const getPath = (entryId, parentId, parentPath) => {
  if (parentId === null) return List([ 'root' ]);
  return parentPath
    ? parentPath.concat([ 'tree', entryId ])
    : List([ 'orphans', parentId, entryId ]);
};

const getChildIds = tree => tree.toList(
  undefined
).flatMap(
  child => getChildIds(
    child.get('tree', Map())
  ).push(
    child.get('id')
  )
);

const updateIndexes = R.curry(
  (orphan, parentPath, index) => getChildIds(
    orphan
  ).reduce(
    (index, id) => index.update(
      id,
      path => parentPath.push(
        'tree'
      ).concat(
        path.splice(0, 2)
      )
    ),
    index
  )
);

const hydrate = (state, entry) => {
  const {
    id,
    meta: {
      parent
    }
  } = entry;
  const path = getPath(id, parent, state.getIn([ 'index', parent ]));
  const orphan = state.getIn([ 'orphans', id ], Map());
  return state.withMutations(
    state => state.update(
      'index',
      Map(),
      updateIndexes(orphan, path)
    ).setIn(
      [ 'index', id ], path // Add index
    ).setIn(
      // Add entry
      path, fromJS(entry)
    ).update(
      // Move children
      state => state.setIn(
        path.push('tree'),
        orphan
      ).deleteIn(
        [ 'orphans', id ]
      )
    )
  );
};

export default hydrate;
@mikey
Re: %SQo6QFGd5

@matt %+fBXl12...

@tim
Voted this
@Aljoscha
Re: %I4DuXYTI7

The readme on tangles will be interesting to those who discussed the design of the post-message's branch property: CC @mix @Dominic @cel @keks @andrestaltz @dinosaur


Completely unrelated: @cft, have you by any chance interacted with Uwe Nestmann? I vaguely told him about ssb a year ago, but the group headed into traditional blockchain-country instead. Well, and it is still a theory chair. But it is nice to see a CS prof from the distributed systems/network community here on ssb. It would be a shame if bitcoin and ethereum completely overshadowed all the other interesting stuff happening in the wild right now.

@mikey
Re: %I4DuXYTI7

hi @cft, i finally got around to reading the tangle paper in depth.

My recommendation is to separate the reply information from the tangle-forming aspect: instead of branch there should be the ancestors field which strictly points to some recent tip nodes, and a separate discussion-specific reply field that points to one or more tangle entries which can be quite old at the time of posting.

while not documented well, as far as i understand branch is the same as your ancestors, as in it's only used for tangle-forming. @mix added a fork field (i think for TickTack?) to signal discussion flows, which is similar to your reply field: %+fBXl12.... confusingly, @matt added a reply field for Patchwork which does something different: %5mjm4vF....

For example, about records are used for implementing event-participation where peers declare that they will be attendees: However, based on the available information in the logs, it is not possible to deduce strong happened-before relations among about events i.e. who committed first

yes, while it's probably hidden, there's rough consensus that we should use root and branch (as described in your tangle paper as root and ancestors) for just about every message type: about, vote (like: %zrIKdNx...), etc.

generally, am stoked that you took the time to document this! keen to find a way to incorporate this into the overall Scuttlebutt documentation, as we've independently come to the same conclusions. :smiley_cat:

@mix
Re: %SR5RcMaRl

hey @rabble / @Tom Coates do you know about how scuttlebutt apps currently track the weird temporal nature of this space ?

It's best seen in this description : https://github.com/cn-uofbasel/ssbdrv/blob/master/doc/tangle.md

The short of it is that each post references the "HEAD" of the growing history for a thread. In the event there are 2 heads (because 2 people were offline or posted at exactly the same time, then you received them both), then your new message references both those heads as the latest message. This "merges" the branching, and your message is then a new singular head.

The language in that doc is a little different to scuttlebutt, I think it's partly a proposal to normalise some naming Here's a comparison

Scuttlebutt Tangle doc Concept
thread tangle collection of ordered messages for e.g. a conversation / edit history
root root the first message
branch ancestors what we call the field which points back to the message(s) before

I'm happy to go into this in more detail. There's also a draft spec for how to do nested replies and forks within a discussion thread, you can read about it here (with diagrams)

@punkmonk.phone
Voted this
@rabble
{
  "type": "tag",
  "version": 1,
  "tagged": true,
  "message": "%+fBXl12aV1wpAdD62RMl1WRhwthDMuAuHH4iNWgB7jA=.sha256",
  "root": "%DU7cKioe8mLIt4BtH88o8mHKXGxBTGlyZs20TXWU9n0=.sha256",
  "branch": [
    "%bKaq+8p7IVTu9UnoNokutalz8D9C2WJWHkSm/PNBSIQ=.sha256"
  ]
}
@rabble
{
  "type": "tag",
  "version": 1,
  "tagged": true,
  "message": "%+fBXl12aV1wpAdD62RMl1WRhwthDMuAuHH4iNWgB7jA=.sha256",
  "root": "%QssRDUTXIk3PIjxD4eKai5VcBZvJ8WrQJ+6I0OKdSRM=.sha256",
  "branch": [
    "%DMOOFubIgOdJ7P80n2pYl/ENj4Kqsc4t56XimyPHq7Q=.sha256"
  ]
}
@rabble
{
  "type": "tag",
  "version": 1,
  "tagged": true,
  "message": "%+fBXl12aV1wpAdD62RMl1WRhwthDMuAuHH4iNWgB7jA=.sha256",
  "root": "%NzCJjgBtJq8mY0MlMosuv7zTW5aadsH3RUwM7L1iwSI=.sha256",
  "branch": [
    "%psQxzmT3LDhlDVi3PAGHeEFQYy9HZHXTTgAxDiSq8fY=.sha256"
  ]
}
@rabble
{
  "type": "tag",
  "version": 1,
  "tagged": true,
  "message": "%+fBXl12aV1wpAdD62RMl1WRhwthDMuAuHH4iNWgB7jA=.sha256",
  "root": "%LmcWpHk5M4+3YrzHwHIRNbaBYeX2i1HUI8mr3zDgyAw=.sha256",
  "branch": []
}
@rabble
{
  "type": "tag",
  "version": 1,
  "tagged": true,
  "message": "%+fBXl12aV1wpAdD62RMl1WRhwthDMuAuHH4iNWgB7jA=.sha256",
  "root": "%z/aR1wHG2wtYTHxfXSt0iNT3PbDaQdzXD3OJ/GlA3+A=.sha256",
  "branch": []
}
@rabble
{
  "type": "tag",
  "version": 1,
  "tagged": true,
  "message": "%+fBXl12aV1wpAdD62RMl1WRhwthDMuAuHH4iNWgB7jA=.sha256",
  "root": "%0kolUkvwK5ZE70EMVZCFLVSd1ZrDKss4oENTBxTE4tw=.sha256",
  "branch": []
}
@cryptix
Voted this
@Pinky
Voted this
@mikey
Re: %TOqTRFI+w

@dominic this is great! :smiley_cat:

i was hoping to get onto this when the Sunrise Choir started

  • capture the existing Scuttlebutt status quo
    • what is the problem space we are exploring?
    • what are our orthogonal solutions to cover the problem space?
    • what design hypotheses are our solutions trying to test?
    • what existing prior art exists to solve similar problems?
    • what have we learned so far from our exploration?
    • do we want to change any fundamental design decisions?
    • do we need to adapt any design details?

so i'm happy you got us started with something. now i'm trying to think of what else is missing:

hmm :heart:

Join Scuttlebutt now