mingw: reencode environment variables on the fly (UTF-16 <-> UTF-8)
On Windows, the authoritative environment is encoded in UTF-16. In Git
for Windows, we convert that to UTF-8 (because UTF-16 is *such* a
foreign idea to Git that its source code is unprepared for it).
Previously, out of performance concerns, we converted the entire
environment to UTF-8 in one fell swoop at the beginning, and upon
putenv() and run_command() converted it back.
Having a private copy of the environment comes with its own perils: when
a library used by Git's source code tries to modify the environment, it
does not really work (in Git for Windows' case, libcurl, see
https://github.com/git-for-windows/git/compare/bcad1e6d58^...bcad1e6d58^2
for a glimpse of the issues).
Hence, it makes our environment handling substantially more robust if we
switch to on-the-fly-conversion in `getenv()`/`putenv()` calls. Based
on an initial version in the MSVC context by Jeff Hostetler, this patch
makes it so.
Surprisingly, this has a *positive* effect on speed: at the time when
the current code was written, we tested the performance, and there were
*so many* `getenv()` calls that it seemed better to convert everything
in one go. In the meantime, though, Git has obviously been cleaned up a
bit with regards to `getenv()` calls so that the Git processes spawned
by the test suite use an average of only 40 `getenv()`/`putenv()` calls
over the process lifetime.
Speaking of the entire test suite: the total time spent in the
re-encoding in the current code takes about 32.4 seconds (out of 113
minutes runtime), whereas the code introduced in this patch takes only
about 8.2 seconds in total. Not much, but it proves that we need not be
concerned about the performance impact introduced by this patch.
Helped-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
diff --git a/compat/mingw.h b/compat/mingw.h
index 571019d..be6f3a0 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -257,11 +257,35 @@ char *mingw_mktemp(char *template);
char *mingw_getcwd(char *pointer, int len);
#define getcwd mingw_getcwd
+#ifdef NO_UNSETENV
+#error "NO_UNSETENV is incompatible with the Windows-specific startup code!"
+#endif
+
+/*
+ * We bind *env() routines (even the mingw_ ones) to private mingw_ versions.
+ * These talk to the CRT using UNICODE/wchar_t, but maintain the original
+ * narrow-char API.
+ *
+ * Note that the MSCRT maintains both ANSI (getenv()) and UNICODE (_wgetenv())
+ * routines and stores both versions of each environment variable in parallel
+ * (and secretly updates both when you set one or the other), but it uses CP_ACP
+ * to do the conversion rather than CP_UTF8.
+ *
+ * Since everything in the git code base is UTF8, we define the mingw_ routines
+ * to access the CRT using the UNICODE routines and manually convert them to
+ * UTF8. This also avoids round-trip problems.
+ *
+ * This also helps with our linkage, since "_wenviron" is publicly exported
+ * from the CRT. But to access "_environ" we would have to statically link
+ * to the CRT (/MT).
+ *
+ * We require NO_SETENV (and let gitsetenv() call our mingw_putenv).
+ */
+#define getenv mingw_getenv
+#define putenv mingw_putenv
+#define unsetenv mingw_putenv
char *mingw_getenv(const char *name);
-#define getenv mingw_getenv
-int mingw_putenv(const char *namevalue);
-#define putenv mingw_putenv
-#define unsetenv mingw_putenv
+int mingw_putenv(const char *name);
int mingw_gethostname(char *host, int namelen);
#define gethostname mingw_gethostname