utf8: handle systems that don't write BOM for UTF-16

When serializing UTF-16 (and UTF-32), there are three possible ways to
write the stream. One can write the data with a BOM in either big-endian
or little-endian format, or one can write the data without a BOM in
big-endian format.

Most systems' iconv implementations choose to write it with a BOM in
some endianness, since this is the most foolproof, and it is resistant
to misinterpretation on Windows, where UTF-16 and the little-endian
serialization are very common. For compatibility with Windows and to
avoid accidental misuse there, Git always wants to write UTF-16 with a
BOM, and will refuse to read UTF-16 without it.

However, musl's iconv implementation writes UTF-16 without a BOM,
relying on the user to interpret it as big-endian. This causes t0028 and
the related functionality to fail, since Git won't read the file without
a BOM.

Add a Makefile and #define knob, ICONV_OMITS_BOM, that can be set if the
iconv implementation has this behavior. When set, Git will write a BOM
manually for UTF-16 and UTF-32 and then force the data to be written in
UTF-16BE or UTF-32BE. We choose big-endian behavior here because the
tests use the raw "UTF-16" encoding, which will be big-endian when the
implementation requires this knob to be set.

Update the tests to detect this case and write test data with an added
BOM if necessary. Always write the BOM in the tests in big-endian
format, since all iconv implementations that omit a BOM must use
big-endian serialization according to the Unicode standard.

Preserve the existing behavior for systems which do not have this knob
enabled, since they may use optimized implementations, including
defaulting to the native endianness, which may improve performance.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
3 files changed
tree: a83eddab899e28c0823b5a879d70bc1e0782650a
  1. .github/
  2. block-sha1/
  3. builtin/
  4. ci/
  5. compat/
  6. contrib/
  7. Documentation/
  8. ewah/
  9. git-gui/
  10. gitk-git/
  11. gitweb/
  12. mergetools/
  13. negotiator/
  14. perl/
  15. po/
  16. ppc/
  17. refs/
  18. sha1dc/
  19. sha256/
  20. t/
  21. templates/
  22. vcs-svn/
  23. xdiff/
  24. .clang-format
  25. .editorconfig
  26. .gitattributes
  27. .gitignore
  28. .gitmodules
  29. .mailmap
  30. .travis.yml
  31. .tsan-suppressions
  32. abspath.c
  33. aclocal.m4
  34. advice.c
  35. advice.h
  36. alias.c
  37. alias.h
  38. alloc.c
  39. alloc.h
  40. apply.c
  41. apply.h
  42. archive-tar.c
  43. archive-zip.c
  44. archive.c
  45. archive.h
  46. argv-array.c
  47. argv-array.h
  48. attr.c
  49. attr.h
  50. azure-pipelines.yml
  51. banned.h
  52. base85.c
  53. bisect.c
  54. bisect.h
  55. blame.c
  56. blame.h
  57. blob.c
  58. blob.h
  59. branch.c
  60. branch.h
  61. builtin.h
  62. bulk-checkin.c
  63. bulk-checkin.h
  64. bundle.c
  65. bundle.h
  66. cache-tree.c
  67. cache-tree.h
  68. cache.h
  69. chdir-notify.c
  70. chdir-notify.h
  71. check-builtins.sh
  72. check-racy.c
  73. check_bindir
  74. checkout.c
  75. checkout.h
  76. color.c
  77. color.h
  78. column.c
  79. column.h
  80. combine-diff.c
  81. command-list.txt
  82. commit-graph.c
  83. commit-graph.h
  84. commit-reach.c
  85. commit-reach.h
  86. commit-slab-decl.h
  87. commit-slab-impl.h
  88. commit-slab.h
  89. commit.c
  90. commit.h
  91. common-main.c
  92. config.c
  93. config.h
  94. config.mak.dev
  95. config.mak.in
  96. config.mak.uname
  97. configure.ac
  98. connect.c
  99. connect.h
  100. connected.c
  101. connected.h
  102. convert.c
  103. convert.h
  104. copy.c
  105. COPYING
  106. credential-cache--daemon.c
  107. credential-cache.c
  108. credential-store.c
  109. credential.c
  110. credential.h
  111. csum-file.c
  112. csum-file.h
  113. ctype.c
  114. daemon.c
  115. date.c
  116. decorate.c
  117. decorate.h
  118. delta-islands.c
  119. delta-islands.h
  120. delta.h
  121. detect-compiler
  122. diff-delta.c
  123. diff-lib.c
  124. diff-no-index.c
  125. diff.c
  126. diff.h
  127. diffcore-break.c
  128. diffcore-delta.c
  129. diffcore-order.c
  130. diffcore-pickaxe.c
  131. diffcore-rename.c
  132. diffcore.h
  133. dir-iterator.c
  134. dir-iterator.h
  135. dir.c
  136. dir.h
  137. editor.c
  138. entry.c
  139. environment.c
  140. exec-cmd.c
  141. exec-cmd.h
  142. fast-import.c
  143. fetch-negotiator.c
  144. fetch-negotiator.h
  145. fetch-object.c
  146. fetch-object.h
  147. fetch-pack.c
  148. fetch-pack.h
  149. fmt-merge-msg.h
  150. fsck.c
  151. fsck.h
  152. fsmonitor.c
  153. fsmonitor.h
  154. fuzz-commit-graph.c
  155. fuzz-pack-headers.c
  156. fuzz-pack-idx.c
  157. generate-cmdlist.sh
  158. gettext.c
  159. gettext.h
  160. git-add--interactive.perl
  161. git-archimport.perl
  162. git-bisect.sh
  163. git-compat-util.h
  164. git-cvsexportcommit.perl
  165. git-cvsimport.perl
  166. git-cvsserver.perl
  167. git-difftool--helper.sh
  168. git-filter-branch.sh
  169. git-instaweb.sh
  170. git-legacy-rebase.sh
  171. git-merge-octopus.sh
  172. git-merge-one-file.sh
  173. git-merge-resolve.sh
  174. git-mergetool--lib.sh
  175. git-mergetool.sh
  176. git-p4.py
  177. git-parse-remote.sh
  178. git-quiltimport.sh
  179. git-rebase--am.sh
  180. git-rebase--common.sh
  181. git-rebase--preserve-merges.sh
  182. git-remote-testgit.sh
  183. git-request-pull.sh
  184. git-send-email.perl
  185. git-sh-i18n.sh
  186. git-sh-setup.sh
  187. git-stash.sh
  188. git-submodule.sh
  189. git-svn.perl
  190. GIT-VERSION-GEN
  191. git-web--browse.sh
  192. git.c
  193. git.rc
  194. gpg-interface.c
  195. gpg-interface.h
  196. graph.c
  197. graph.h
  198. grep.c
  199. grep.h
  200. hash.h
  201. hashmap.c
  202. hashmap.h
  203. help.c
  204. help.h
  205. hex.c
  206. http-backend.c
  207. http-fetch.c
  208. http-push.c
  209. http-walker.c
  210. http.c
  211. http.h
  212. ident.c
  213. imap-send.c
  214. INSTALL
  215. interdiff.c
  216. interdiff.h
  217. iterator.h
  218. json-writer.c
  219. json-writer.h
  220. khash.h
  221. kwset.c
  222. kwset.h
  223. levenshtein.c
  224. levenshtein.h
  225. LGPL-2.1
  226. line-log.c
  227. line-log.h
  228. line-range.c
  229. line-range.h
  230. linear-assignment.c
  231. linear-assignment.h
  232. list-objects-filter-options.c
  233. list-objects-filter-options.h
  234. list-objects-filter.c
  235. list-objects-filter.h
  236. list-objects.c
  237. list-objects.h
  238. list.h
  239. ll-merge.c
  240. ll-merge.h
  241. lockfile.c
  242. lockfile.h
  243. log-tree.c
  244. log-tree.h
  245. ls-refs.c
  246. ls-refs.h
  247. mailinfo.c
  248. mailinfo.h
  249. mailmap.c
  250. mailmap.h
  251. Makefile
  252. match-trees.c
  253. mem-pool.c
  254. mem-pool.h
  255. merge-blobs.c
  256. merge-blobs.h
  257. merge-recursive.c
  258. merge-recursive.h
  259. merge.c
  260. mergesort.c
  261. mergesort.h
  262. midx.c
  263. midx.h
  264. name-hash.c
  265. notes-cache.c
  266. notes-cache.h
  267. notes-merge.c
  268. notes-merge.h
  269. notes-utils.c
  270. notes-utils.h
  271. notes.c
  272. notes.h
  273. object-store.h
  274. object.c
  275. object.h
  276. oidmap.c
  277. oidmap.h
  278. oidset.c
  279. oidset.h
  280. pack-bitmap-write.c
  281. pack-bitmap.c
  282. pack-bitmap.h
  283. pack-check.c
  284. pack-objects.c
  285. pack-objects.h
  286. pack-revindex.c
  287. pack-revindex.h
  288. pack-write.c
  289. pack.h
  290. packfile.c
  291. packfile.h
  292. pager.c
  293. parse-options-cb.c
  294. parse-options.c
  295. parse-options.h
  296. patch-delta.c
  297. patch-ids.c
  298. patch-ids.h
  299. path.c
  300. path.h
  301. pathspec.c
  302. pathspec.h
  303. pkt-line.c
  304. pkt-line.h
  305. preload-index.c
  306. pretty.c
  307. pretty.h
  308. prio-queue.c
  309. prio-queue.h
  310. progress.c
  311. progress.h
  312. prompt.c
  313. prompt.h
  314. protocol.c
  315. protocol.h
  316. quote.c
  317. quote.h
  318. range-diff.c
  319. range-diff.h
  320. reachable.c
  321. reachable.h
  322. read-cache.c
  323. README.md
  324. rebase-interactive.c
  325. rebase-interactive.h
  326. ref-filter.c
  327. ref-filter.h
  328. reflog-walk.c
  329. reflog-walk.h
  330. refs.c
  331. refs.h
  332. refspec.c
  333. refspec.h
  334. remote-curl.c
  335. remote-testsvn.c
  336. remote.c
  337. remote.h
  338. replace-object.c
  339. replace-object.h
  340. repository.c
  341. repository.h
  342. rerere.c
  343. rerere.h
  344. resolve-undo.c
  345. resolve-undo.h
  346. revision.c
  347. revision.h
  348. run-command.c
  349. run-command.h
  350. send-pack.c
  351. send-pack.h
  352. sequencer.c
  353. sequencer.h
  354. serve.c
  355. serve.h
  356. server-info.c
  357. setup.c
  358. sh-i18n--envsubst.c
  359. sha1-array.c
  360. sha1-array.h
  361. sha1-file.c
  362. sha1-lookup.c
  363. sha1-lookup.h
  364. sha1-name.c
  365. sha1dc_git.c
  366. sha1dc_git.h
  367. shallow.c
  368. shell.c
  369. shortlog.h
  370. sideband.c
  371. sideband.h
  372. sigchain.c
  373. sigchain.h
  374. split-index.c
  375. split-index.h
  376. strbuf.c
  377. strbuf.h
  378. streaming.c
  379. streaming.h
  380. string-list.c
  381. string-list.h
  382. sub-process.c
  383. sub-process.h
  384. submodule-config.c
  385. submodule-config.h
  386. submodule.c
  387. submodule.h
  388. symlinks.c
  389. tag.c
  390. tag.h
  391. tar.h
  392. tempfile.c
  393. tempfile.h
  394. thread-utils.c
  395. thread-utils.h
  396. tmp-objdir.c
  397. tmp-objdir.h
  398. trace.c
  399. trace.h
  400. trailer.c
  401. trailer.h
  402. transport-helper.c
  403. transport-internal.h
  404. transport.c
  405. transport.h
  406. tree-diff.c
  407. tree-walk.c
  408. tree-walk.h
  409. tree.c
  410. tree.h
  411. unicode-width.h
  412. unimplemented.sh
  413. unix-socket.c
  414. unix-socket.h
  415. unpack-trees.c
  416. unpack-trees.h
  417. upload-pack.c
  418. upload-pack.h
  419. url.c
  420. url.h
  421. urlmatch.c
  422. urlmatch.h
  423. usage.c
  424. userdiff.c
  425. userdiff.h
  426. utf8.c
  427. utf8.h
  428. varint.c
  429. varint.h
  430. version.c
  431. version.h
  432. versioncmp.c
  433. walker.c
  434. walker.h
  435. wildmatch.c
  436. wildmatch.h
  437. worktree.c
  438. worktree.h
  439. wrap-for-bin.sh
  440. wrapper.c
  441. write-or-die.c
  442. ws.c
  443. wt-status.c
  444. wt-status.h
  445. xdiff-interface.c
  446. xdiff-interface.h
  447. zlib.c
README.md

Build Status

Git - fast, scalable, distributed revision control system

Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.

Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.

See Documentation/gittutorial.txt to get started, then see Documentation/giteveryday.txt for a useful minimum set of commands, and Documentation/git-.txt for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial or git help tutorial, and the documentation of each command with man git-<commandname> or git help <commandname>.

CVS users may also want to read Documentation/gitcvs-migration.txt (man gitcvs-migration or git help cvs-migration if git is installed).

The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission). To subscribe to the list, send an email with just “subscribe git” in the body to majordomo@vger.kernel.org. The mailing list archives are available at https://public-inbox.org/git/, http://marc.info/?l=git and other archival sites.

Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.

The maintainer frequently sends the “What's cooking” reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.

The name “git” was given by Linus Torvalds when he wrote the very first version. He described the tool as “the stupid content tracker” and the name as (depending on your mood):

  • random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of “get” may or may not be relevant.
  • stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
  • “global information tracker”: you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
  • “goddamn idiotic truckload of sh*t”: when it breaks