>> Instead, I propose a simple one line change. Could you please describe >> the reproduction of the probable problem step by step? And what’s the problem with the current Leonid’s solution after all? > On 11 Dec 2020, at 15:51, Nikita Pettik wrote: > > On 11 Dec 02:08, Leonid Vasiliev wrote: >> Hi! Thank you for review. >> >> On 10.12.2020 19:39, Sergey Ostanevich wrote: >>> Thanks for the patch! >>> >>> I tend to the 1st alternative, although the code using the name generated >>> is hairy. I believe the same resolution as for the first part: if we’re >>> in a rush - LGTM, better solution is desirable otherwise. >> >> As I wrote in the commit message, I think using `O_TMPFILE` or >> `tmpfile()` is the best solution to work with temporary files. >> But we already have the code in which the filename is passed from one >> function to another, and some logic over all this. So, replacing the use of >> a named file with a real temporary file will lead to refactoring, >> changing some logic and increasing the number of differences with >> SQLite. This can lead to degradation and complication of the review. > > Anyway, there's already huge difference between our codebases, so > don't worry about incompatibility between SQLite and Tarantool. > If you don't want fix this problem now, could you please open issue > and describe problem there (copy-paste alternatives you suggested > in the previous letter)? > >> Instead, I propose a simple one line change. Could you please describe >> the reproduction of the probable problem step by step? >> >>> Sergos >>> >>> >>>> On 8 Dec 2020, at 22:59, Leonid Vasiliev wrote: >>>> >>>> The bug was consisted in fail when working with temporary files >>>> created by VDBE to sort large result of a `SELECT` statement with >>>> `ORDER BY`, `GROUP BY` clauses. >>>> >>>> Whats happen (step by step): >>>> - We have two instances on one node (sharded cluster). >>>> - A query is created that executes on both. >>>> - The first instance creates the name of the temporary file and >>>> checks a file with such name on existence. >>>> - The second instance creates the name of the temporary file >>>> (the same as in first instance) and checks a file with such name >>>> on existence. >>>> - The first instance creates a file with the `SQL_OPEN_DELETEONCLOSE` >>>> flag. >>>> - The second instance opens(try to open) the same file. >>>> - The first instance closes (and removes) the temporary file. >>>> - The second instance tries to work with the file and fails. >>>> >>>> Why did it happen: >>>> The temporary file name format has a random part, but the random >>>> generator uses a fixed seed. >>>> >>>> When it was decided to use a fixed seed: >>>> 32cb1ad298b2b55d8536a85bdfb3827c8c8739e1 >>>> >>>> How the patch fixes the problem: >>>> The patch injects the PID in the temporary file name format. >>>> The generated name is unique for a single process (due to a random part) >>>> and unique between processes (due to the PID part). >>>> >>>> Alternatives: >>>> 1) Use `O_TMPFILE` or `tmpfile()` (IMHO the best way to work with >>>> temporary files). In both cases, we need to update a significant >>>> part of the code, and some degradation can be added. It's hard to >>>> review. >>>> 2) Return a random seed for the generator. As far as I understand, >>>> we want to have good reproducible system behavior, in which case >>>> it's good to use a fixed seed. >>>> 3) Add reopening file with the flags `O_CREAT | O_EXCL` until we >>>> win the fight. Now we set such flags when opening a temporary file, >>>> but after that we try to open the file in `READONLY` mode and >>>> if ok - return the descriptor. This is strange logic for me and I >>>> don't want to add any aditional logic here. Also, such solution will >>>> add additional attempts to open the file. >>>> >>>> So, it look like such minimal changes will work fine and are simple >>>> to review. >>>> >>>> Co-authored-by: Mergen Imeev >>>> >>>> Fixes #5537 >>>> --- >>>> src/box/sql/os_unix.c | 4 ++-- >>>> 1 file changed, 2 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/src/box/sql/os_unix.c b/src/box/sql/os_unix.c >>>> index 557d709..ce415cb 100644 >>>> --- a/src/box/sql/os_unix.c >>>> +++ b/src/box/sql/os_unix.c >>>> @@ -1483,8 +1483,8 @@ unixGetTempname(int nBuf, char *zBuf) >>>> assert(nBuf > 2); >>>> zBuf[nBuf - 2] = 0; >>>> sql_snprintf(nBuf, zBuf, >>>> - "%s/" SQL_TEMP_FILE_PREFIX "%llx%c", zDir, >>>> - r, 0); >>>> + "%s/" SQL_TEMP_FILE_PREFIX "%ld_%llx%c", zDir, >>>> + (long)randomnessPid, r, 0); >>>> if (zBuf[nBuf - 2] != 0 || (iLimit++) > 10) >>>> return -1; >>>> } while (access(zBuf, 0) == 0); >>>> -- >>>> 2.7.4