Gauche > Archives > 2012/02/26

2012/02/26 00:28:42 UTCsakae
#
FreeBSD 9.0R に、0.9.2をtar玉からいれようとしたら、途中でSEGVしちゃいました。http://space.geocities.jp/hamesspam/hes2012/120226.html 早く直るといいな!
2012/02/26 00:30:03 UTCshiro
#
それはgcのknown problemだと思う。
#
でもHEADをコンパイルするには0.9.2が入ってないとだめなので困りますな。早く0.9.3出さないと。
2012/02/26 00:35:53 UTCsakae
#
やはりそうですか。0.9.3期待して待ってます。
2012/02/26 01:39:34 UTCshiro
#
現在OSXで、HEADのext/tlsテストが止まる問題は原因がわかりました (axTLS自体のテストの問題)。FreeBSDの方はまだわからない。
2012/02/26 01:50:40 UTCkirill
#
Hi Shiro!
#
Are you around? =)
2012/02/26 01:53:58 UTCshiro
#
yup
2012/02/26 01:54:24 UTCkirill
#
Great
#
So now that I've convinced myself that axTLS builds properly on the Mac, I'm getting a nice and predictable segfault in some axTLS code when running Gauche stuff via the C api
#
It's happening downstream from Scm_TLSConnect, inside add_packet when calling MD5_Update
#
Have you seen similar crashes at all?
#
I'll note that TLS-using http transfers work 100% fine via the repl
2012/02/26 01:56:08 UTCshiro
#
(looking at the code...)
2012/02/26 01:56:20 UTCkirill
#
tls1.c, in function add_packet
#
when I print the value of the ssl pointer, it's quoted as "0x18"... which is probably not good =)
2012/02/26 01:57:25 UTCshiro
#
Mmmm gc hazard again?
2012/02/26 01:57:36 UTCkirill
#
yeah perhaps
#
seems to be supported by the idea that it works fine in the REPL
#
why don't I disable GC and get back to you =)
2012/02/26 01:59:15 UTCshiro
#
ScmTLS has finalizer attached, so you can also put a printf there to see if it is collected prematurely.
2012/02/26 01:59:29 UTCkirill
#
yeah
#
but why would that happen?
#
anyway let me turn off GC first so that we know we're looking at the right idea
#
hmm, xcode crashed. everything is crashing here =P
#
yes, so the SSL pointer is 0
#
... even when I call GC_Disable(). maybe gauche calls enable somewhere?
2012/02/26 02:03:53 UTCshiro
#
No I don't think gauche turn on gc implicitly.
2012/02/26 02:06:04 UTCkirill
#
right
#
it's strange because as I go down the stack things are okay, but then I hit send_packet and suddenly the ssl pointer is 0
2012/02/26 02:07:01 UTCshiro
#
So somebody's overwriting it maybe?
2012/02/26 02:07:27 UTCkirill
#
but I imagine the TLS code is thread safe...
#
plus it does work in the REPL
2012/02/26 02:11:45 UTCkirill
#
any ideas of what else could be happening?
2012/02/26 02:13:07 UTCshiro
#
Above, you said it crashes in Scm_TLSConnect. So the stack is Scm_TLSConnect -> ssl_client_new -> do_client_connect?
2012/02/26 02:13:24 UTCkirill
#
sec, I'll provide a proper dump
#
add_packet [inlined] at /Users/k/sources/gauche/ext/tls/axTLS/ssl/tls1.c:747
send_packet ()
send_client_hello [inlined] ()
do_client_connect ()
ssl_client_new ()
Scm_TLSConnect ()
rfc__tls_25tls_connect ()
run_loop ()
user_eval_inner ()
Scm_ApplyRec ()
2012/02/26 02:15:09 UTCshiro
#
And at the time it reaches add_packet, ssl pointer gets 0?
2012/02/26 02:15:23 UTCkirill
#
well, I don't think the stack trace is meaningful
#
I just ran it with a slightly different thread count, and suddenly it's crashing with a segfault in some other place
#
I am sure the code in axTLS is correct. The issue must be somewhere else
2012/02/26 02:16:09 UTCshiro
#
ok.
2012/02/26 02:21:01 UTCshiro
#
ext/tls code looks innocent... so some thread handling code in Gauche is doing silly thing...
2012/02/26 02:21:11 UTCkirill
#
such as?
2012/02/26 02:22:32 UTCshiro
#
no idea yet. but apparently somebody steps on to an unrelated memory location.
2012/02/26 02:23:01 UTCkirill
#
hmm
2012/02/26 02:24:29 UTCshiro
#
if there's a bug that overwrites specific offset of ScmTLS (somehow), then one probe is to add dummy fields in ScmTLS definition between SCM_HEADER and other fields and see if crash pattern changes (meaningfully)... of course if the bug overwrites random places it won't give a clue, but...
2012/02/26 02:24:49 UTCkirill
#
hmm
#
why does it work in the interpreter though?
#
running it in a single thread only in my program also crashes it
2012/02/26 02:25:42 UTCshiro
#
so it may not even be a thread issue. I'm truly perplexed.
2012/02/26 02:28:43 UTCkirill
#
is axtls built with the same compiler gauche is built with, in the Makefiles?
#
because I set CC and CXX before I run configure, so I hope those get propagated to axTLS?
2012/02/26 02:29:21 UTCshiro
#
good point, let me check...
#
well actually you can look at the make output?
2012/02/26 02:30:12 UTCkirill
#
where can I check this?
#
I mean, without rerunning make =)
2012/02/26 02:31:10 UTCshiro
#
you don't need to. it does use the same CC, for it doesn't use axTLS's makefile
#
so could be that axtls needs some special flag and gauche isn't giving it?
2012/02/26 02:31:54 UTCkirill
#
who knows
#
the config file is identical to the one generated with menuconfig
2012/02/26 02:33:18 UTCshiro
#
axTLS original source does have some conditional stuff to add CFLAGS but I don't see anything special there
2012/02/26 02:34:25 UTCkirill
#
it's got some curious LDFLAGS.. are those relevant to gauche?
2012/02/26 02:35:17 UTCshiro
#
which is... (I don't see OSX specific stuff, do you use Linux one?)
2012/02/26 02:40:11 UTCkirill
#
yeah, essentially the same flags
#
except the "soname" flag, which needs to be "install_name" on the Mac
#
are we linking axtls statically?
#
is it remarkable at all that the function "send_packet" where ssl == 0 on entry (before anything happens) is also the first function in the trace that's inlined?
2012/02/26 02:41:48 UTCshiro
#
well, gauche's makefile compiles individual axtls .c files into .o, then links them together with gauche-specific files to make rfc--tls.so.
#
so soname won't matter in our case, I think
2012/02/26 02:46:01 UTCkirill
#
maybe it's a compiler error?
#
I'm using clang, not gcc
2012/02/26 02:46:21 UTCshiro
#
If we want to make sure the compiler is somehow getting in our way... you can edit ext/Makefile.ext to remove -O2 and -fomit-frame-pointer, and in ext/tls run make clean; make, and see the difference. (But axTLS original makefile does use both flags, though)
2012/02/26 02:48:46 UTCkirill
#
hmm. sounds dubious but I'll try
#
yeah, no difference.
#
maybe I'll write a tiny isolated test program that will do a single TLS request and link against my build..
2012/02/26 02:53:55 UTCshiro
#
I don't see any clue now. More isolation needed, I agree.
2012/02/26 02:53:59 UTCkirill
#
but... with the -O0 build there's no more inlining, and now it turns out that ssl isn't null after all
#
the debugger was getting confused by the inlined function so it thought that ssl was 0... silly gdb
2012/02/26 02:54:19 UTCshiro
#
and you still get crash?
2012/02/26 02:54:41 UTCkirill
#
so it's crashing in MD5_Update
2012/02/26 02:55:18 UTCshiro
#
ok so let's dump the memory overwriting hypothesis :)
2012/02/26 02:56:28 UTCkirill
#
MD5_Update is in crypto.h
#
maybe it's including the wrong header if say, openssl also has crypto/crypto.h? =)
#
or maybe there are two functions with the same name for some reason..
#
OpenSSL does have a function called MD5_Update =P
2012/02/26 02:58:08 UTCshiro
#
There are.
#
I remember I had a similar problem before...
2012/02/26 02:58:44 UTCkirill
#
So maybe it's calling the wrong one and obviously crashing =)
#
the project does link against libcrypto dynamically
2012/02/26 03:01:07 UTCshiro
#
you can add #define MD5_Update MD5_Update_axtls in some header and see the difference maybe
#
Actually Gauche also has MD5_Update in ext/digest. If rfc.md5 is loaded it is loaded, too.
2012/02/26 03:02:34 UTCkirill
#
I'm assuming this is a problem?
#
well let's see... if it was calling the wrong function, would this be visible in the symbol table of the rfc--tls.o file?
#
wouldn't we see external references to MD5_Update ?
2012/02/26 03:04:03 UTCshiro
#
well, i don't know. as far as the signature and MD5_CTX definition matches, it shouldn't be a problem even if it is calling another one.
2012/02/26 03:04:13 UTCkirill
#
(because there aren't any references to MD5 in rfc--tls.o's symbol table)
#
heh, so it's running the wrong code for sure
2012/02/26 03:05:49 UTCshiro
#
rfc--tls.o doesn't call it directly (it is merely a compilation of rfc--tls.c, which is generated by tls.scm). you meant rfc--tls.so?
2012/02/26 03:06:23 UTCkirill
#
yeah
#
I'm looking through the asm dump that's crashing, and it makes references to a function called "md5_block_data_order"
#
... which does not exist in Gauche or axTLS
#
the symbol is present in openssl, though =)
#
so how do I force it to call the right one? this is lunacy =)
2012/02/26 03:08:46 UTCshiro
#
the easier path is to rename *our* MD5_* stuff (by #define magic)
2012/02/26 03:09:35 UTCkirill
#
hmm... so should I add it to axTLS?
#
could I specify this as part of the build command line?
2012/02/26 03:11:03 UTCshiro
#
Let's see if you can edit ext/Makefile.ext to add -DMD5_Update=MD5_Update_axtls etc into CFLAGS...
#
Just to see if that's the cause.
2012/02/26 03:15:56 UTCkirill
#
hmph
#
Well hold on, let's think about it more
#
The issue is that I have an executable that links against both libcrypto and rfc--tls.so , and both contain MD5_Update. correct?
2012/02/26 03:21:15 UTCkirill
#
I'll put a print message inside md5_update from axTLS and we'll see =)
2012/02/26 03:21:45 UTCshiro
#
yeah, let's make sure what we think is actually happening.
2012/02/26 03:33:05 UTCkirill
#
yes it's not calling the right one
#
the address of the instruction in the debugger where the crash happens is not the address of MD5_Update when the symbol is printed
#
... and my little print hello message didn't get printed, either.
2012/02/26 03:35:16 UTCshiro
#
But what's weird is that sometimes it works and sometimes it crashes, right?
2012/02/26 03:35:24 UTCkirill
#
well
#
I haven't gotten it to work yet =)
#
but changing thread stuff causes different crashes
#
presumably due to stuff getting loaded in a different order
#
DLL hell all over again =)
2012/02/26 03:37:25 UTCshiro
#
ah ok. Is it difficult to try renaming axtls's MD5_* stuff (and maybe SHA1_* stuff as well) with #define magic?
2012/02/26 03:37:53 UTCkirill
#
yes, because the author of axTLS thought it would be funny to mirror the OpenSSL interface directly as a base feature, not an addon
#
so many other functions have names that coincide with other OpenSSL stuff... it's a total disaster.
#
I wonder if a linker flag could fix the problem, or something
2012/02/26 03:39:10 UTCshiro
#
ah so it's not only MD5_* functions but it's all over.
2012/02/26 03:39:16 UTCkirill
#
yes.
#
everything inside axTLS/crypto is potentially dangerous.
#
I'll make sure it's not including the wrong "crypto.h" file
2012/02/26 03:40:27 UTCshiro
#
There should be some flags, but the treatment of symbol crashes among dlopen'ed modules differ among OSes...
2012/02/26 03:40:34 UTCkirill
#
since they have the same signature, the linker won't mind since it's a .o file
#
I don't think this is a runtime issue
#
I think at compile time (or link time) it calls MD5_Update from the wrong library
#
maybe the "-undefined suppress" is not contributing here =)
2012/02/26 03:43:49 UTCshiro
#
Hmm. If this is indeed the culprit, I may be able to preprocess axtls source to replace dangerous symbols (like I'm doing in ssltest.c)
2012/02/26 03:44:05 UTCkirill
#
hmm
#
that would be not bad, although the preprocessing would be heavy and annoying to maintain
2012/02/26 03:44:49 UTCshiro
#
yeah... if one linker flag solves this, that would be much better.
2012/02/26 03:52:53 UTCkirill
#
perhaps if object files from crypto were listed before the object files from ssl, things would work better
#
... considering that it's a one-pass linker
2012/02/26 03:54:21 UTCshiro
#
Could you try? That sounds fragile, but who knows...
2012/02/26 03:54:28 UTCkirill
#
=)
#
ok, rebuilding
2012/02/26 03:59:07 UTCkirill
#
yeah not surprisingly that didn't work =)
2012/02/26 04:00:55 UTCshiro
#
d'oh. I gotta leave for a couple of hours. I might write up some code to rename crypto things later.
2012/02/26 04:00:59 UTCkirill
#
hmm perhaps renaming is the right thing to do... but sounds nasty
#
yeah I gotta' go anyway
#
night!