From 848c4ad019f067eb3a43f5e3e1ba8fb94f05815c Mon Sep 17 00:00:00 2001 From: Joscha Date: Mon, 23 Oct 2023 19:44:14 +0200 Subject: [PATCH] Document commit ordering --- scripts/graph.ts | 55 +++++++++++++++++++++++++++++++++--------------- 1 file changed, 38 insertions(+), 17 deletions(-) diff --git a/scripts/graph.ts b/scripts/graph.ts index d2756fb..85d0401 100644 --- a/scripts/graph.ts +++ b/scripts/graph.ts @@ -12,30 +12,30 @@ The graph should be fast. This requires a bit of careful thinking around data formats and resource usage. My plan is to get as far as possible without any sort of pagination or range -limits. The plot should always display data for the repo's entire history -(unless zoomed in). This will force me to optimize the entire pipeline. If -the result is not fast enough, I can still add in range limits. +limits. The graph should always display data for the repo's entire history +(unless zoomed in). This will force me to optimize the entire pipeline. If the +result is not fast enough, I can still add in range limits. uPlot is pretty fast at rendering large amounts of data points. It should be -able to handle medium-sized git repos (tens of thousands of commits) -displaying multiple metrics with no issues. The issue now becomes retrieving -the data from the server. +able to handle medium-sized git repos (tens of thousands of commits) displaying +multiple metrics with no issues. The issue now becomes retrieving the data from +the server. -Since the graph should support thousands of metrics, it can't simply fetch -all values for all metrics upfront. Instead, it must fetch metrics as the -user selects them. It follows that when fetching a metric, +Since the graph should support thousands of metrics, it can't simply fetch all +values for all metrics upfront. Instead, it must fetch metrics as the user +selects them. It follows that when fetching a metric, 1. the server should have little work to do, 2. the amount of data sent over the network should be small, and 3. the client should have little work to do. -The costs when initially loading the graph may be higher since it happens -less frequently. We can fetch some more data and do some preprocessing to -improve performance while interacting with the graph. +The costs when initially loading the graph may be higher since it happens less +frequently. We can fetch some more data and do some preprocessing to improve +performance while interacting with the graph. -Since we fetch data across multiple requests, we need some way to detect if -all the data we have is consistent (at least in cases where things might -otherwise break). +Since we fetch data across multiple requests, we need some way to detect if all +the data we have is consistent (at least in cases where things might otherwise +break). Implementation ============== @@ -49,7 +49,8 @@ The data for the graph consists of three main parts: Data consistency ---------------- -Each response by the server includes a graph id and a data id. +Responses by the server include a graph id (for 2. and 3.) and a data id (for 1. +and 3.). The graph id is incremented when the commit graph structure changes. Responses to 2. and 3. MUST have the same graph id. When they don't, the client must @@ -67,7 +68,7 @@ Data flow │ /graph/metrics │ │ /graph/measurements │ │ /graph/commits │ └──┬─────────────┘ └──┬──────────────────┘ └──────────┬─────┘ │ Server │ │ -───┼────────────────────┼─────────Requests─────────────────┼────── +───┼────────────────────┼──────────────────────────────────┼────── │ │ │ ┌──▼─────┐ ┌─────────▼────┐ ┌─────────────────┐ ┌─────▼─────┐ │ metric │ │ measurements │ │ permute by-hash ◄──┤commit info│ @@ -90,6 +91,26 @@ Data flow │ selector │ │ checkbox │ └──────┘ └──────────┘ └─────────────────┘ +Commit orders +------------- + +There are two main orders for commits in this implementation. + +The first order ("by hash") is simply sorting by commit hash. All data returned +by the server is in this order. It easy for the server to retrieve data in this +order, and the order is unambiguous. + +The second order ("by graph") is the order used to display commits in the graph. +Points in the graph must be ordered in ascending order along the x axis or uPlot +will produce graphical glitches. For this graph, the x value is either their +exact committer time or, in day-equidistant mode, the committer day and a unique +offset per commit in the same day. + +Multiple commits may share exactly the same committer time. To break such ties +in day-equidistant mode, the commits are ordered in topological order. While +this change is only relevant in day-equidistant mode, we can reuse the same +ordering for both display modes, so this is also relevant to the normal mode. + */ // https://sashamaps.net/docs/resources/20-colors/