Can anyone point my way to a decent C parser/tokenizer written in C? That may sound like an odd request, I know. Thanks in advance.
I don't know any parser dedicated to C. But Lex/Yacc is a reference parser, and it is written in C. See also Flex/Bison.
Flex and Bison very hard to find. I think they're archived somewhere on the xengine sf page somewhere. Instead of Flex and Bison you can use Spirit from the boost library which IMO is way better and easier to use.
As for the actual C parser written in C... don't know of any.
Or use AntLR, an LL(k) parser generator that supports EBNF syntax. Can only output C++ code though (among other languages, but just not C)
Gcc is written in C, correct? How difficult would it be to tear out the parser framework and form a library with it? Anybody think it would be superfluous to just write the whole project myself?
If you want your code to be GPL'ed, that's an option, yes. I personally don't like GPL. At all
sorry, meant flex++ and bison++.
(c++ version of flex and bison)
Hmm... VBCC is written entirely in C and isn't GPL but unfortunately it is closed source and I haven't seen the source since version 0.4.
My next-best suggestion is to look into writing a trie map structure and have all of the tokens terminate with a null by checking the character type on the first tier of the trie. Essentially cascade a partial trie (with all of the ascii characters) into a full trie for all of the multibyte tokens. It's tricky but can be accomplished in constant time.
What do you need the parser for? If you are looking for parser generators and lexical analyzer generators there are quite a few of them as mentioned by the previous posts. If you are going to attempt to write your own I suggest that you do a recursive descent parser for starters, as it is the most easy to understand and implement (IMHO). It has certain disadvantages though, like it cannot parse left recursive grammars.Try getting hold of the dragon book, if you can.
Jesse, as others have mentioned, I recommend bison/flex. I'm not sure the license that comes with code generated *by* bison/flex (I've only used it in companies that use it for internal tools, so I don't believe we had to worry about licensing restrictions... IANAL tho), so you should probably check that out.
That said, roxtar asks a very intelligent question. What are you attempting to do? There might be an easier way of going about it if we know your situation or goal, as well as your constraints.
That said, there's a site here that has the grammar for C in Lexx/Yacc http://www.lysator.liu.se/c/c-faq/c-17.html#17-25. At least that should cut some work out for you.