• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

C++ Text Parsing.

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

InThrees

Member
Joined
Feb 14, 2003
Location
Southeast US
I pretty much need to learn how to do this fairly quickly.

I'm not looking for anyone HERE to go into a detailed explanation (well, if you WANT to...) but rather, a link to a good guide on it.

Specifically, I need to parse webform output c programs would recieve via stdin into the separate variables.

I've never done any sort of text processing (we didn't cover it in class b/c MS Visual C++'s entire string library sucked, and caused crashes, so the instructor didn't even mess with it.)

I'll be using gcc, though.

I can work with incremental or conditional loops, which I understand is how you do these things, but the actual text/string processing functions I know nothing of.

Anyone got a good link or a good shove in the right direction, to get me rolling?
 
Well, I'm wanting to write cgi scripts in C/C++ mainly because I know a lot more C than Perl (I don't know any perl) and also because C is much more processor efficient than any script language.

Right now, the only way I could possibly think of to do it would be to take a webform output string like

name1=value1&name2=value2&name3=value3

and read characters on at a time into a string variable, until the + or & character is reached, discard those, and then either discard the string if it's a variable NAME, or assign it to a proper variable if it's a varValue.
like,

while(inputChar != '=' && inputChar != '&')
{
testString = testString + inputChar;
(assign next character in inputString to inputChar, which I don't even know how to do.);
}

that's REALLY rough, hahaha, but you get the idea. that would be a function called to parse the input string and then assign values to variables.

You can probably tell by now that I really DON'T have any practical experience (any, really) parsing or processing strings.

I don't even know the includes or functions that are normally used to do things to strings.
 
Do it in Perl and you'll be a lot better off. If you pick up a copy of Learning Perl, I'll bet you can figure out how to parse those forms in one day.

Why are you worried about processor efficiency when dealing with web-forms anyways? Are you planning on getting several thousand per minute or something?
 
No, just being nerdy I guess. The server in question is multi-user, however, and once I learn how to do this, I plan to build on it and do grander things like a posting script for blog-type sites, and a control panel interface for newer-user types.
 
Based on what you want to do, I'm still going to suggest that Perl would be better. I think you could learn what you need to and write the script in less time than it would take to write the program in C++.

While in theory you can write a more efficient C program, working with strings in C is no fun. Text processing is what Perl does best, and you'd have a hard time doing it better with C (let alone spend less time coding it).
 
Perl is also easier to learn. I'll try to answer your question, though. Probably the most basic way to do this would be to use the basic string functions in the standard string.h. (google for that header for more info). To append strings, you'd use strcat(); its two arguments are both char pointers, and the string from the second is copied into the end of the first. strcpy() is the same, except the first string is overwritten and not appended. strcmp() returns 0 if both strings in its argument are exactly equal.

Anyways, a very rudimentary (a.k.a. I don't know what I'm talking about :D) way to do this would be this:

// assume str is a char pointer to the input string
char *bg; // beginning of name/value group
char *end; // end of the name/value group
char *div; // divider between the name and value (where the "=" is )
int counter = 0; // which name/value group this is
char prchar; // previous char before an overwrite
char varnames[100][100]; // names of the variables
char varvalues[100][100]; // values of the variables

bg = str; // the first name/value group is at the beginning of the string

for (counter = 0;;++counter) {
for (end = bg; *end && *end != '&'; ++end); // find the end
prchar = *end; // remember what it stopped on so that the loop can
// end if the end of the input string was reached

*end = '\0'; // terminate the name/value string
for (div = bg; *div && *div != '='; ++div); // find the divider
*div = '\0'; // divide the name and value

strcpy(varnames[counter], bg); // copy the var name to the proper place
strcpy(varvalues[counter], div+1); // ditto with the value

if (!prchar) //end if the end of the input string was reached
break;
bg = end+1; // continue to the next name/value group

}

Of course, this code wouldn't stand any bugs or bad input whatsoever, and would segfault very easily.
 
Last edited:
Back