From debbugs-submit-bounces@debbugs.gnu.org Tue Sep 10 22:15:31 2019 Received: (at 37363) by debbugs.gnu.org; 11 Sep 2019 02:15:31 +0000 Received: from localhost ([127.0.0.1]:41896 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1i7sAM-0003dT-Rv for submit@debbugs.gnu.org; Tue, 10 Sep 2019 22:15:31 -0400 Received: from imta-35.everyone.net ([216.200.145.35]:38986 helo=imta-38.everyone.net) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1i7sAI-0003dH-BL for 37363@debbugs.gnu.org; Tue, 10 Sep 2019 22:15:27 -0400 Received: from pps.filterd (m0004961.ppops.net [127.0.0.1]) by imta-38.everyone.net (8.16.0.27/8.16.0.27) with SMTP id x8B2EPcQ008482; Tue, 10 Sep 2019 19:15:24 -0700 X-Eon-Originating-Account: 9EzDkPnh3Wzg2LnIcsVkUagUxGyM0iX8hiyuVaStvcA X-Eon-Dm: m0116787.ppops.net Received: by m0116787.mta.everyone.net (EON-AUTHRELAY2 - 32d0d199) id m0116787.5d70550c.50f82e; Tue, 10 Sep 2019 19:15:22 -0700 X-Eon-Sig: AQMHrIJdeFi6nrKniQIAAAAC,2568811b7f6e7c3b286322ea1b21a8a3 X-Eip: DtUZwz4Cc8Gk26jZtwOD4dtDXbNcHgP4QkmVEbK82BA Date: Tue, 10 Sep 2019 19:15:03 -0700 From: Bengt Richter To: quiliro@riseup.net Subject: Re: bug#37363: emacs and other programs do not display special characters Message-ID: <20190911021503.GA1118@PhantoNv4ArchGx.localdomain> References: <5b7a274019a6659a19b1193a154fd6c4.squirrel@sm.riseup.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <5b7a274019a6659a19b1193a154fd6c4.squirrel@sm.riseup.net> User-Agent: Mutt/1.12.1 (2019-06-15) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-09-10_13:, , signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1034 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1906280000 definitions=main-1909110016 X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 37363 Cc: 37363@debbugs.gnu.org X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Bengt Richter Errors-To: debbugs-submit-bounces@debbugs.gnu.org Sender: "Debbugs-submit" X-Spam-Score: -1.7 (-) On +2019-09-09 19:13:08 -0500, quiliro@riseup.net wrote: > As per nckx's question on IRC, this is the output to locale on both Emacs > shell and BASh: > > quiliro@GSD3 ~/magit/prueba0$ locale > LANG=es_EC.UTF-8 > LC_CTYPE="es_EC.UTF-8" > LC_NUMERIC="es_EC.UTF-8" > LC_TIME="es_EC.UTF-8" > LC_COLLATE="es_EC.UTF-8" > LC_MONETARY="es_EC.UTF-8" > LC_MESSAGES="es_EC.UTF-8" > LC_PAPER="es_EC.UTF-8" > LC_NAME="es_EC.UTF-8" > LC_ADDRESS="es_EC.UTF-8" > LC_TELEPHONE="es_EC.UTF-8" > LC_MEASUREMENT="es_EC.UTF-8" > LC_IDENTIFICATION="es_EC.UTF-8" > LC_ALL= > > Hi, I have been having locale-related problems too, so maybe we can bounce enough clues around that we can advance a little. [ later ... I'll have to come back to locale per se, but hope the following is useful for poking around with fonts and unicode character and their glyphs ] [ To the advanced, please don't be insulted by my posting obvious stuff, as if you didn't know how to use grep and sed and especially guix better than my examples show -- it is motivated by wanting to exchange helpful methods and info with others also coming to guix, who might benefit from my recent newbie experiences trying to find my way into guix city, in the Commonwealth of FOSS :) Hm, I wonder if we could use postgresql plus postgis to do an openstreetmap map of guix city stores and pubs -- and potholes and contruction blockages ;-) ] Anyway, I have a little script which may be helpful in generating utf8 characters for display in your various contexts (what this (emacs) context is I'll show below): $ uchr 229 10 å $ which -a uchr /home/bokr/bin/uchr $ cat ~/bin/uchr #!/home/bokr/.guix-profile/bin/bash # 2019-08-19 22:25:34 ## was: #!/usr/bin/bash # ~/bin/uchr -- print unicode characters from numeric args # uchr 65 67 10 | od -a -t x1 # 0000000 A C nl # 41 43 0a # 0000003 cc="$( printf '\\u%x' "$@" )" echo -en "$cc" Those last two lines do all the work ;-) (printf is a bash built-in -- type -"help printf" at the bash command line. (By -"foo" I mean "foo" minus the quotes :) printf re-uses its format for each arg it encounters, so it converts all the integers according to '\\u%x' above in uchr. $ uchr {192..255} 10 ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ $ What do you get if you try that from your login console, not gnome? To make various fonts available to the console, you should be able to use setfont from console bash -- see -"man setfont" (remembering my minus-the-quotes notation :) The {FIRST_INTEGER..LAST_INTEGER} of course generates individual integer arguments including first and last. The 10 is a newline. I'm pretty sure I did -"setfont sun12x22" from the console bash, which gives you most of the 12x22 font built into the kernel. It's got 256 character cells for its 12x22 pixel glyphs, each represented by 22 16-bit integers using the ms12 bits with 1 as foreground, IIRC. The sun12x22 font is pretty good, with box-drawing characters as well most things you need in European languages (I'm familiar with it because I wrote a little script to display the glyphs on the frame buffer, in the pursuit of independence from huge blobs of gooey GUI software :) After having done -"setfont sun12x22" you can do -"setfont -ou glyph-code-to-unicodepoint.txt" which will give you a tab-delimited table starting ... ending like: 0x00 U+0000 0x20 U+0020 0x21 U+0021 0x22 U+0022 0x23 U+0023 0x24 U+0024 0x25 U+0025 ... 0xdf U+2580 0xdc U+2584 0xdb U+2588 0xdd U+258c 0xde U+2590 0xb0 U+2591 0xb1 U+2592 0xb2 U+2593 0x01 U+263a 0x5f U+f804 (BTW, this would be really easy to snarf and convert to an assoc list mapping unicode code points to glyph indices) That glyph 0x01 has a unicode we can discover, even though the console font you get from -"setfont sun12x22" does not have the glyph that is in the kernel version's glyph table: $ $ unicode-info "$(uchr 0x263a)" "☺": glyph codepoint .....int name... _☺_ +U00263a 9786 WHITE SMILING FACE $ The glyph is in the kernel's 256-glyph bit-map for sun12x22 though, and it should be visible in a gui browser with good unicode coverage. You can find the kernel's bitfont defined in kernel sources .../linux-4.14.3/lib/fonts/font_sun12x22.c (or change the kernel version -- 4.14.3 is the last one I grop^H^Hepped around in looking for stuff to "steal" :) Ok, back to $ uchr {192..255} 10 ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ $ The first four A's above are really not As, as another little script of mine that shows unicode information will show: (you can find the source archived in a recent post of mine https://lists.gnu.org/archive/html/guix-devel/2019-09/msg00115.html if curious) $ uchr {192..202} | unicode-info "ÀÁÂÃÄÅÆÇÈÉÊ": glyph codepoint .....int name... _À_ +U0000c0 192 LATIN CAPITAL LETTER A WITH GRAVE _Á_ +U0000c1 193 LATIN CAPITAL LETTER A WITH ACUTE _Â_ +U0000c2 194 LATIN CAPITAL LETTER A WITH CIRCUMFLEX _Ã_ +U0000c3 195 LATIN CAPITAL LETTER A WITH TILDE _Ä_ +U0000c4 196 LATIN CAPITAL LETTER A WITH DIAERESIS _Å_ +U0000c5 197 LATIN CAPITAL LETTER A WITH RING ABOVE _Æ_ +U0000c6 198 LATIN CAPITAL LETTER AE _Ç_ +U0000c7 199 LATIN CAPITAL LETTER C WITH CEDILLA _È_ +U0000c8 200 LATIN CAPITAL LETTER E WITH GRAVE _É_ +U0000c9 201 LATIN CAPITAL LETTER E WITH ACUTE _Ê_ +U0000ca 202 LATIN CAPITAL LETTER E WITH CIRCUMFLEX $ The above is copy/pasted from a shell window I got by -"M-x shell" To show the pid genealogy of that shell context, I'll -"C-x o" over there and output about the context and paste it back here: $ $ # typing pidgeny (for pid genealogy :) we get: $ pidgeny pidgeny pts/0 23549 S+ /home/bokr/.guix-profile/bin/bash /home/bokr bash pts/0 16231 Ss /home/bokr/.guix-profile/bin/bash --noeditin .emacs-26.3-rea tty1 16204 Sl+ /gnu/store/ki85c221k56y6hnp7qyx42q2qmra4w4s- mutt tty1 1118 S mutt bash tty1 537 Ss -bash login ? 521 Ss login -- bokr systemd ? 1 Ss /sbin/init \EFI\PhantoNv4ArchGx\vmlinuz-linu $ which pidgeny|xargs realpath /home/bokr/bin/pidgeny $ ## --- pidgeny --- $ which pidgeny|xargs cat #!/home/bokr/.guix-profile/bin/bash # 2019-08-19 07:16:38 -- was: #!/usr/bin/bash # ~/bin/pidgeny pid=${1:-$$} #this process if no pid specified as $1 while [ $(($pid)) -gt 0 ]; do ps h -p $pid -o comm,tt,pid,stat,args pid=$(ps -q $pid -o ppid=) done $ ## hm, monkeyed with that hashbang too, need a better idea :) $ $ ## pidgeny output doesn't show full path on mutt 1118,bash 537, $ ## or login 521, but we can get them easily: $ realpath /proc/1118/exe /gnu/store/bsd34k2v78mi0wxk85rz32xaminls9nb-mutt-1.12.1/bin/mutt $ ## that was a guix version $ realpath /proc/537/exe /usr/bin/bash $ ## that was the inital shell of the "foreign distro" -- in my case: $ uname -a Linux PhantoNv4ArchGx 5.2.9-arch1-1-ARCH #1 SMP PREEMPT Fri Aug 16 11:29:43 UTC 2019 x86_64 GNU/Linux $ realpath /proc/521/exe realpath: /proc/521/exe: Permission denied $ ## need permission $ su -c 'realpath /proc/521/exe' /usr/bin/login $ ## also built by the foreign distro's building tool chain and libraries -- I guess I will feel better $ ## when I replace the foreigner with linux-libre ;-) $ Well, HTH you to probe the state of your system vis-a-vis utf8, glyphs, fonts etc. I'll come back with some locale mystery, which will probably wind up being something I thought I did but didn't ;-/ But now I need to go do some things IRL ;-) Regards, Bengt Richter