jesse_m at October 4th, 2005 13:13 — #1
Can anyone point my way to a decent C parser/tokenizer written in C? That may sound like an odd request, I know. Thanks in advance.
zavie at October 4th, 2005 13:54 — #2
I don't know any parser dedicated to C. But Lex/Yacc is a reference parser, and it is written in C. See also Flex/Bison.
bladder at October 4th, 2005 21:24 — #3
Flex and Bison very hard to find. I think they're archived somewhere on the xengine sf page somewhere. Instead of Flex and Bison you can use Spirit from the boost library which IMO is way better and easier to use.
As for the actual C parser written in C... don't know of any.
ed_mack at October 5th, 2005 01:51 — #4
oisyn at October 5th, 2005 07:01 — #5
Or use AntLR, an LL(k) parser generator that supports EBNF syntax. Can only output C++ code though (among other languages, but just not C)
jesse_m at October 5th, 2005 12:23 — #6
Gcc is written in C, correct? How difficult would it be to tear out the parser framework and form a library with it? Anybody think it would be superfluous to just write the whole project myself?
oisyn at October 5th, 2005 17:22 — #7
If you want your code to be GPL'ed, that's an option, yes. I personally don't like GPL. At all
bladder at October 6th, 2005 10:23 — #8
sorry, meant flex++ and bison++.
(c++ version of flex and bison)
samuraicrow at October 14th, 2005 22:52 — #9
Hmm... VBCC is written entirely in C and isn't GPL but unfortunately it is closed source and I haven't seen the source since version 0.4.
My next-best suggestion is to look into writing a trie map structure and have all of the tokens terminate with a null by checking the character type on the first tier of the trie. Essentially cascade a partial trie (with all of the ascii characters) into a full trie for all of the multibyte tokens. It's tricky but can be accomplished in constant time.
roxtar at October 14th, 2005 23:38 — #10
What do you need the parser for? If you are looking for parser generators and lexical analyzer generators there are quite a few of them as mentioned by the previous posts. If you are going to attempt to write your own I suggest that you do a recursive descent parser for starters, as it is the most easy to understand and implement (IMHO). It has certain disadvantages though, like it cannot parse left recursive grammars.Try getting hold of the dragon book, if you can.
eddie at October 28th, 2005 13:16 — #11
Jesse, as others have mentioned, I recommend bison/flex. I'm not sure the license that comes with code generated *by* bison/flex (I've only used it in companies that use it for internal tools, so I don't believe we had to worry about licensing restrictions... IANAL tho), so you should probably check that out.
That said, roxtar asks a very intelligent question. What are you attempting to do? There might be an easier way of going about it if we know your situation or goal, as well as your constraints.
That said, there's a site here that has the grammar for C in Lexx/Yacc http://www.lysator.liu.se/c/c-faq/c-17.html#17-25. At least that should cut some work out for you.