Tarantool discussions archive
 help / color / mirror / Atom feed
From: Sergey Ostanevich via Tarantool-discussions <tarantool-discussions@dev.tarantool.org>
To: tarantool-discussions@dev.tarantool.org
Subject: [Tarantool-discussions] [RFC] describe an inter-fiber debugger
Date: Sat, 27 Feb 2021 17:57:06 +0300
Message-ID: <DBFFE3C7-F210-4531-B765-671476670B02@tarantool.org> (raw)

[-- Attachment #1: Type: text/plain, Size: 10150 bytes --]

Subject: 
An RFC on bringing debugger facility into Tarantool.

Part of #5857
---
doc/rfc/inter-fiber-debugger.md | 204 ++++++++++++++++++++++++++++++++
1 file changed, 204 insertions(+)
create mode 100644 doc/rfc/inter-fiber-debugger.md

diff --git a/doc/rfc/inter-fiber-debugger.md b/doc/rfc/inter-fiber-debugger.md
new file mode 100644
index 000000000..e4b64490c
--- /dev/null
+++ b/doc/rfc/inter-fiber-debugger.md
@@ -0,0 +1,204 @@
+# Inter-fiber Debugger for Tarantool
+* **Status**: In progress
+* **Start date**: 20-01-2021
+* **Authors**: Sergey Ostanevich @sergos sergos@tarantool.org <mailto:sergos@tarantool.org>,
+               Igor Munkin @imun imun@tarantool.org <mailto:imun@tarantool.org>
+* **Discussion**: https://github.com/tarantool/tarantool/discussions/5857 <https://github.com/tarantool/tarantool/discussions/5857>
+
+[TOC]
+
+### Rationale
+
+To make Tarantool platform developer-friendly we should provide a set of basic
+developer tools. One of such tool is debugger. There are number of debuggers
+available for the Lua environments, although all of them are missing the
+critical feature needed for the Tarantool platform: they should not cause a
+full-stop of the debugged program during the debug session.
+
+In this RFC I propose to overcome the problem with a solution that will stop
+only the fiber to be debugged. It will allow developers to debug their
+application, while Tarantool can keep processing requests, perform replication
+and so on.
+
+### Approach
+
+To do not reinvent the debugger techniques we may borrow the already existent
+Lua debugger, put the rules about fiber use, data manipulation tweaks and so
+on.
+
+Every fiber can be considered as a 'debuggee' or a regular fiber, switching
+from one state to the other. To control the status we can either patch fiber
+machinery - which seems excessive as fibers can serve pure C tasks - or tweak
+the breakpoint hook to employ the fiber yield. The fiber will appear in a state
+it waits for commands from the debugger and set the LuaJIT machinery hooks to
+be prepared for the next fiber to be scheduled.
+
+### Debug techniques
+
+Regular debuggers provide interruption for all threads at once hence they don't
+distinguish breakpoints appearance across the threads - they just stop
+execution. For our case we have to introduce some specifics so that debugger
+will align with the fiber nature of the server behavior. Let's consider some
+techniques we can propose to the user.
+
+#### 1) Break first fiber met
+
+User puts a breakpoint that triggers once, stopping the first fiber the break
+happens in. After breakpoint is met the fiber reports its status to the
+debugger server, put itself in a wait state, clears the breakpoint and yields.
+As soon as server issue a command, the debuggee will reset the breakpoint,
+handle the command and proceed with execution or yield again.
+
+#### 2) Regular breakpoint
+
+This mode will start the same way as previous mode, but keep the breakpoint
+before yield, so that the breakpoint still can trigger in another fiber. As the
+server may deliver huge number of fibers during its performance, we have to set
+up a user-configurable limit for the number of debuggee fibers can be set at
+once. As soon as limit is reached the debuggee fiber starts behave exactly as
+in previous mode, clearing the breakpoint before the yield from the debuggee.
+
+#### 3) Run a function under debug session
+
+This is the most straightforward way to debug a function: perform a call
+through the debug interface. A new fiber will be created and break will appear
+at the function entrance. The limit of debuggee fibers should be increased and
+the fiber will behave similar to the modes above.
+
+#### 4) Attach debugger to a fiber by ID
+
+Every fiber has its numerical ID, so debugger can provide interface to start
+debugging for a particular fiber. The fiber will be put in a wait state as soon
+as it start execution after the debugger is attached.
+
+### Basic mechanism
+
+The Tarantool side of the debugger will consist of a dedicated fiber named
+DebugSRV that will handle requests from the developer and make bookkeeping of
+debuggee fibers and their breakpoints and a Lua function DebugHook set as a
+hook in Lua debug [https://www.lua.org/pil/23.html <https://www.lua.org/pil/23.html>] library. Users should not
+use this hook for the period of debugging to avoid interference. The external
+interface can be organized over arbitrary protocol, be it a socket connection,
+console or even IPROTO (using IPROTO_CALL).
+
+Debuggee fiber will be controlled by a debug hook function named DebugHook. It
+is responsibility of the DebugHook to set the debuggee fiber status, check the
+breakpoints appearance, its condition including the ignore count and update
+hit_count. As soon as breakpoint is met, the DebugHook has to put its state to
+pending and wait for command from the DebugSRV.
+
+Communication between DebugSRV and the debuggee fiber can be done via
+fiber.channel mechanism. It will simplify the wait-for semantics.
+
+#### Data structure
+
+Every debuggee fiber is present in the corresponding table in the DebugSRV
+fiber. The table has the following format:
+
+```
+debuggees = {
+    max_debuggee = number,
+    preserved_hook = {
+        [1] = function,
+        [2] = type,
+        [3] = number
+    }
+    fibers = {
+        [<fiber_id>] = {
+            state = ['pending'|'operation'],
+            current_breakpoint = <breakpoint_id>,
+            channel = fiber.channel,
+            breakpoints = {
+                [<breakpoint_id>] = {
+                    type = ['l'|'c'|'r'|'i'],
+                    value = [number|string]
+                    condition = function,
+                    hit_count = number,
+                    ignore_count = number
+                }
+            }
+        }
+    }
+    global_breakpoints = {
+        [<breakpoint_id>] = {
+            type = ['l'|'c'|'r'|'i'],
+            value = [number|string]
+            condition = function,
+            hit_count = number,
+            ignore_count = number
+    }
+}
+```
+As DebugSRV receives commands it updates the structure of the debuggees and
+forces the fiber wakeup to reset its hook state. The state of the debuggee is
+one of the following:
+
+- 'operation': the fiber is already in the debuggees list, but it issued yield
+  without any breakpoint met
+- 'pending': DebugHook waits for a new command from the channel in the
+  debuggees.fibers of its own ID
+
+
+#### DebugHook behavior
+
+For the techniques 3) and 4) fiber appears in the list of debuggees.fibers
+first, with its status set as 'operation' with a list of breakpoints set.
+
+For the techniques 1) and 2) there is a list of global_breakpoints that should
+be checked by every fiber.
+
+In case a fiber receives control from the debug machinery it should check if it
+is present in ```debuggees.fibers[ID]```. If it is - it should check if its
+current position meets any breakpoint from the
+```debuggees.fibers[ID].breakpoints``` or ```debuggees.global_breakponts```. If
+breakpoint is met, the fiber sets its state into 'pending' and waits for a
+command from the ```debuggees.fibers[ID].channel```.
+
+In case a fiber is not present in the ```debuggees.fibers[ID]``` it should
+check that the number of fibers entries in the debuggees structure is less than
+max_debuggee. In such a case it checks if it met any of the
+```global_breakpoint``` it  and put itself into the fibers list, updating the
+array size [https://www.lua.org/pil/19.1.html <https://www.lua.org/pil/19.1.html>]. Also it should open a channel
+to the DebugSVR and put itself into the 'pending' state.
+
+#### DebugSRV behavior
+
+DebugSRV handles the input from the user and supports the following list of
+commands (as mentioned, it can be used from any interface, so commands are
+function calls for general case):
+
+- ```break_info([fiber ID])``` - list all breakpoints with counts and
+  conditions, limits output for the fiber with ID
+- ```break_cond(<breakpoint id>, <condition>)``` - set a condition for the
+  breakpoint, condition should be Lua code evaluating into a boolean value
+- ```break_ignore(<breakpoint id>, <count>)``` - ignore the number of
+  breakpoint executions
+- ```break_delete(<breakpoint id>)``` - removes a breakpoint
+- ```step(<fiber ID>)``` - continue execution, stepping into the call
+- ```step_over(<fiber ID>)``` - continue execution until the next source line,
+  skip calls
+- ```step_out(<fiber ID>)``` - continue execution until return from the current
+  function
+
+The functions above are common for many debuggers, just some tweaks to adopt
+fibers. Functions below are more specific, so let's get into some details:
+
+- ```set_max_debuggee(number)``` - set the number of fibers can be debugged
+  simultaneously. It modifies the ```debuggees.max_debuggee``` so that new fibers
+  will respect the amount of debuggees. For example, if at some point of
+  debugging there were 5 debuggee fibers user can set this value to 3 - it will
+  not cause any problem, just a new fiber will not become a debuggee if it meet
+  some global breakpoint.
+- ```debug_eval(<fiber ID>, <code>)``` - allows to evaluate the code in the
+  context of the debuggee fiber if it is in 'pending' mode. User can issue a
+  ```debug_eval(113, function() return fiber.id <http://fiber.id/>() end)``` to receive 113 as a
+  result
+- ```break(<breakpoint description>, [fiber ID])``` - add a new breakpoint in
+  the fiber's breakpoint list on in the global list if no fiber ID provided
+- ```debug_start()``` - starts debug session: creates debuggees structure,
+  preserve current debug hook in ```debuggees.preserved_hook``` and sets
+  DebugHook as the current hook
+- ```debug_stop()``` - quits debug session: resets the debug hook, clears
+  debuggees structure
+
+
--
2.24.3 (Apple Git-128)

[-- Attachment #2: Type: text/html, Size: 35746 bytes --]

             reply	other threads:[~2021-02-27 14:57 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-27 14:57 Sergey Ostanevich via Tarantool-discussions [this message]
2021-03-25 10:07 ` Igor Munkin via Tarantool-discussions

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DBFFE3C7-F210-4531-B765-671476670B02@tarantool.org \
    --to=tarantool-discussions@dev.tarantool.org \
    --cc=sergos@tarantool.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Tarantool discussions archive

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://lists.tarantool.org/tarantool-discussions/0 tarantool-discussions/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 tarantool-discussions tarantool-discussions/ https://lists.tarantool.org/tarantool-discussions \
		tarantool-discussions@dev.tarantool.org.
	public-inbox-index tarantool-discussions

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git