* [Tarantool-patches] [PATCH 1/1] rfc: describe an inter-fiber debugger
@ 2021-02-26 17:15 Sergey Ostanevich via Tarantool-patches
0 siblings, 0 replies; only message in thread
From: Sergey Ostanevich via Tarantool-patches @ 2021-02-26 17:15 UTC (permalink / raw)
To: tarantool-patches
Subject:
An RFC on bringing debugger facility into Tarantool.
Part of #5857
---
doc/rfc/inter-fiber-debugger.md | 204 ++++++++++++++++++++++++++++++++
1 file changed, 204 insertions(+)
create mode 100644 doc/rfc/inter-fiber-debugger.md
diff --git a/doc/rfc/inter-fiber-debugger.md b/doc/rfc/inter-fiber-debugger.md
new file mode 100644
index 000000000..e4b64490c
--- /dev/null
+++ b/doc/rfc/inter-fiber-debugger.md
@@ -0,0 +1,204 @@
+# Inter-fiber Debugger for Tarantool
+* **Status**: In progress
+* **Start date**: 20-01-2021
+* **Authors**: Sergey Ostanevich @sergos sergos@tarantool.org,
+ Igor Munkin @imun imun@tarantool.org
+* **Discussion**: https://github.com/tarantool/tarantool/discussions/5857
+
+[TOC]
+
+### Rationale
+
+To make Tarantool platform developer-friendly we should provide a set of basic
+developer tools. One of such tool is debugger. There are number of debuggers
+available for the Lua environments, although all of them are missing the
+critical feature needed for the Tarantool platform: they should not cause a
+full-stop of the debugged program during the debug session.
+
+In this RFC I propose to overcome the problem with a solution that will stop
+only the fiber to be debugged. It will allow developers to debug their
+application, while Tarantool can keep processing requests, perform replication
+and so on.
+
+### Approach
+
+To do not reinvent the debugger techniques we may borrow the already existent
+Lua debugger, put the rules about fiber use, data manipulation tweaks and so
+on.
+
+Every fiber can be considered as a 'debuggee' or a regular fiber, switching
+from one state to the other. To control the status we can either patch fiber
+machinery - which seems excessive as fibers can serve pure C tasks - or tweak
+the breakpoint hook to employ the fiber yield. The fiber will appear in a state
+it waits for commands from the debugger and set the LuaJIT machinery hooks to
+be prepared for the next fiber to be scheduled.
+
+### Debug techniques
+
+Regular debuggers provide interruption for all threads at once hence they don't
+distinguish breakpoints appearance across the threads - they just stop
+execution. For our case we have to introduce some specifics so that debugger
+will align with the fiber nature of the server behavior. Let's consider some
+techniques we can propose to the user.
+
+#### 1) Break first fiber met
+
+User puts a breakpoint that triggers once, stopping the first fiber the break
+happens in. After breakpoint is met the fiber reports its status to the
+debugger server, put itself in a wait state, clears the breakpoint and yields.
+As soon as server issue a command, the debuggee will reset the breakpoint,
+handle the command and proceed with execution or yield again.
+
+#### 2) Regular breakpoint
+
+This mode will start the same way as previous mode, but keep the breakpoint
+before yield, so that the breakpoint still can trigger in another fiber. As the
+server may deliver huge number of fibers during its performance, we have to set
+up a user-configurable limit for the number of debuggee fibers can be set at
+once. As soon as limit is reached the debuggee fiber starts behave exactly as
+in previous mode, clearing the breakpoint before the yield from the debuggee.
+
+#### 3) Run a function under debug session
+
+This is the most straightforward way to debug a function: perform a call
+through the debug interface. A new fiber will be created and break will appear
+at the function entrance. The limit of debuggee fibers should be increased and
+the fiber will behave similar to the modes above.
+
+#### 4) Attach debugger to a fiber by ID
+
+Every fiber has its numerical ID, so debugger can provide interface to start
+debugging for a particular fiber. The fiber will be put in a wait state as soon
+as it start execution after the debugger is attached.
+
+### Basic mechanism
+
+The Tarantool side of the debugger will consist of a dedicated fiber named
+DebugSRV that will handle requests from the developer and make bookkeeping of
+debuggee fibers and their breakpoints and a Lua function DebugHook set as a
+hook in Lua debug [https://www.lua.org/pil/23.html] library. Users should not
+use this hook for the period of debugging to avoid interference. The external
+interface can be organized over arbitrary protocol, be it a socket connection,
+console or even IPROTO (using IPROTO_CALL).
+
+Debuggee fiber will be controlled by a debug hook function named DebugHook. It
+is responsibility of the DebugHook to set the debuggee fiber status, check the
+breakpoints appearance, its condition including the ignore count and update
+hit_count. As soon as breakpoint is met, the DebugHook has to put its state to
+pending and wait for command from the DebugSRV.
+
+Communication between DebugSRV and the debuggee fiber can be done via
+fiber.channel mechanism. It will simplify the wait-for semantics.
+
+#### Data structure
+
+Every debuggee fiber is present in the corresponding table in the DebugSRV
+fiber. The table has the following format:
+
+```
+debuggees = {
+ max_debuggee = number,
+ preserved_hook = {
+ [1] = function,
+ [2] = type,
+ [3] = number
+ }
+ fibers = {
+ [<fiber_id>] = {
+ state = ['pending'|'operation'],
+ current_breakpoint = <breakpoint_id>,
+ channel = fiber.channel,
+ breakpoints = {
+ [<breakpoint_id>] = {
+ type = ['l'|'c'|'r'|'i'],
+ value = [number|string]
+ condition = function,
+ hit_count = number,
+ ignore_count = number
+ }
+ }
+ }
+ }
+ global_breakpoints = {
+ [<breakpoint_id>] = {
+ type = ['l'|'c'|'r'|'i'],
+ value = [number|string]
+ condition = function,
+ hit_count = number,
+ ignore_count = number
+ }
+}
+```
+As DebugSRV receives commands it updates the structure of the debuggees and
+forces the fiber wakeup to reset its hook state. The state of the debuggee is
+one of the following:
+
+- 'operation': the fiber is already in the debuggees list, but it issued yield
+ without any breakpoint met
+- 'pending': DebugHook waits for a new command from the channel in the
+ debuggees.fibers of its own ID
+
+
+#### DebugHook behavior
+
+For the techniques 3) and 4) fiber appears in the list of debuggees.fibers
+first, with its status set as 'operation' with a list of breakpoints set.
+
+For the techniques 1) and 2) there is a list of global_breakpoints that should
+be checked by every fiber.
+
+In case a fiber receives control from the debug machinery it should check if it
+is present in ```debuggees.fibers[ID]```. If it is - it should check if its
+current position meets any breakpoint from the
+```debuggees.fibers[ID].breakpoints``` or ```debuggees.global_breakponts```. If
+breakpoint is met, the fiber sets its state into 'pending' and waits for a
+command from the ```debuggees.fibers[ID].channel```.
+
+In case a fiber is not present in the ```debuggees.fibers[ID]``` it should
+check that the number of fibers entries in the debuggees structure is less than
+max_debuggee. In such a case it checks if it met any of the
+```global_breakpoint``` it and put itself into the fibers list, updating the
+array size [https://www.lua.org/pil/19.1.html]. Also it should open a channel
+to the DebugSVR and put itself into the 'pending' state.
+
+#### DebugSRV behavior
+
+DebugSRV handles the input from the user and supports the following list of
+commands (as mentioned, it can be used from any interface, so commands are
+function calls for general case):
+
+- ```break_info([fiber ID])``` - list all breakpoints with counts and
+ conditions, limits output for the fiber with ID
+- ```break_cond(<breakpoint id>, <condition>)``` - set a condition for the
+ breakpoint, condition should be Lua code evaluating into a boolean value
+- ```break_ignore(<breakpoint id>, <count>)``` - ignore the number of
+ breakpoint executions
+- ```break_delete(<breakpoint id>)``` - removes a breakpoint
+- ```step(<fiber ID>)``` - continue execution, stepping into the call
+- ```step_over(<fiber ID>)``` - continue execution until the next source line,
+ skip calls
+- ```step_out(<fiber ID>)``` - continue execution until return from the current
+ function
+
+The functions above are common for many debuggers, just some tweaks to adopt
+fibers. Functions below are more specific, so let's get into some details:
+
+- ```set_max_debuggee(number)``` - set the number of fibers can be debugged
+ simultaneously. It modifies the ```debuggees.max_debuggee``` so that new fibers
+ will respect the amount of debuggees. For example, if at some point of
+ debugging there were 5 debuggee fibers user can set this value to 3 - it will
+ not cause any problem, just a new fiber will not become a debuggee if it meet
+ some global breakpoint.
+- ```debug_eval(<fiber ID>, <code>)``` - allows to evaluate the code in the
+ context of the debuggee fiber if it is in 'pending' mode. User can issue a
+ ```debug_eval(113, function() return fiber.id() end)``` to receive 113 as a
+ result
+- ```break(<breakpoint description>, [fiber ID])``` - add a new breakpoint in
+ the fiber's breakpoint list on in the global list if no fiber ID provided
+- ```debug_start()``` - starts debug session: creates debuggees structure,
+ preserve current debug hook in ```debuggees.preserved_hook``` and sets
+ DebugHook as the current hook
+- ```debug_stop()``` - quits debug session: resets the debug hook, clears
+ debuggees structure
+
+
--
2.24.3 (Apple Git-128)
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2021-02-26 17:16 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-26 17:15 [Tarantool-patches] [PATCH 1/1] rfc: describe an inter-fiber debugger Sergey Ostanevich via Tarantool-patches
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox