Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 3 of 3

Thread: HTML table to 2D array parser

  1. #1
    Junior Member
    Join Date
    Jan 2011
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Post HTML table to 2D array parser

    Hello everyone!
    I'm new to here, but hope you will help.
    The problem is to parse the HTML table into the 2D array.
    The parser should handle rowspans, collspans, nested tables and so on. That means, everything that suites HTML standard. The thing is that the tables I need to parse are automatically generated by another program, so they are rather complex and excessive.
    To date I found a solutions on php (JS_Extractor JS_Extractor! And the death of Table Extractor - Jack Sleight), and Java (Java HTML Table parser Simbiosis), but is seems like they don't suite my requirements.
    In addition, I found JWebUnit and HTTPUnit, but I didn't found how to use them for my purposes (and I am not even sure that it is possible).
    I'll be happy for any help, as I can't believe this problem was not solved yet!
    Thanks in advance!
    P.S. The project shold be written in Java.
    Last edited by Neruk; January 28th, 2011 at 07:16 AM.


  2. #2
    Crazy Cat Lady KevinWorkman's Avatar
    Join Date
    Oct 2010
    Location
    Washington, DC
    Posts
    5,424
    My Mood
    Hungover
    Thanks
    144
    Thanked 636 Times in 540 Posts

    Default Re: HTML table to 2D array parser

    What about them doesn't suit your requirements?
    Useful links: How to Ask Questions the Smart Way | Use Code Tags | Java Tutorials
    Static Void Games - Play indie games, learn from game tutorials and source code, upload your own games!

  3. #3
    Junior Member
    Join Date
    Jan 2011
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Re: HTML table to 2D array parser

    Quote Originally Posted by KevinWorkman View Post
    What about them doesn't suit your requirements?
    Thanks for response, I didn't notice I have not wrote some additional requirements:
    1. The program must be written in Java;
    2. If the table contains spans, the value of the spanned cell should be put only ones in array in the left up corner of cells of this array, the other cells, correspondent to the spanned cell in HTML table should be nulled (maybe it is hard to understand, see examples below).
    So, the JS_Extractor is not valid, as it is written on PHP and doesn't handle inherited tables. Java HTML Table parser Simbiosis doesn't handle even spans.
    Today I tried to use HTTPUnit, but the results are disappointing too. The simple tables are parsed correctly, but the complex one not.
    E.g.
    Table code:
    HTML Code:
    <html>
    	<body>
    	<table   border="2" width="20%" height="20%">
    		<tr bgcolor="red">
    			<td colspan="2" rowspan="2">
    				<span>1</span>
    			</td>
    			<td>
    				<span>2</span>
    			</td>
    			<td>
    				<span>3</span></td>
    			<td>
    				<span>4.1</span>
    			</td>
    			<td>
    				<span>5.1</span>
    			</td>
    			<td>
    				<span>6 last</span>
    			</td>
    		</tr>
    		<tr bgcolor="green">
    			<td rowspan="2">
    				<span>1</span>
    			</td>
    			<td>
    				<span>2.4x</span>
    			</td>
    			<td>
    				<span>3.3x</span>
    			</td>
    			<td>
    				<span>4</span>
    			</td>
    			<td>
    				<span>5 last</span>
    			</td>
    		</tr>
    		<tr bgcolor="ffcc00">
    			<td>
    				<span>1x</span>
    			</td>
    			<td>
    				<span>2</span>
    			</td>
    			<td>
    				<span>3</span>
    			</td>
    			<td>
    				<span>4</span>
    			</td>
    			<td>
    				<span>5.8</span>
    			</td>
    			<td>
    				<span>6 last</span>
    			</td>
    		</tr>
    		<tr bgcolor="yellow">
    			<td><span>1</span></td>
    			<td><span>2</span></td>
    			<td><span>3</span></td>
    			<td><span>4</span></td>
    			<td><span>5</span></td>
    			<td><span>6</span></td>
    			<td><span>7 last</span></td>
    		</tr>
    	</table>	
    </body>
    </html>
    This is a table, shown in Chrome:

    An array I want to see as a result:

    Here you can see what I meant in the requirement number 2. "1" from the fist span and "1" from the second areas are put in the left upper corner of the spanned area, while the rest cells in this area are null.
    The result, given by HTTPUnit:

    As you can see, even if we throw the requirement 2 away, we have an error in the third row here.
    And this is a rather simple example, without inherited tables, with them it is terribly wrong.
    What can you recomend me in that case?
    Last edited by Neruk; January 28th, 2011 at 07:18 AM.

Similar Threads

  1. sql/jdbc parser
    By anand.stk in forum Member Introductions
    Replies: 1
    Last Post: January 27th, 2011, 09:43 AM
  2. DOM XML parser question
    By LDM91 in forum Java Theory & Questions
    Replies: 3
    Last Post: December 30th, 2010, 08:34 AM
  3. Java Parser
    By sid0009 in forum File I/O & Other I/O Streams
    Replies: 3
    Last Post: August 20th, 2010, 11:06 AM
  4. XML DOM Parser problem
    By kanishktew in forum What's Wrong With My Code?
    Replies: 0
    Last Post: April 10th, 2010, 09:42 PM
  5. where can i get a Dom Parser jar ?
    By chinni in forum File I/O & Other I/O Streams
    Replies: 3
    Last Post: November 26th, 2009, 03:41 AM

Tags for this Thread