Code Newbie
News     Forums     Search     Members     Sign Up    

My Code Newbie
Username

Password

Articles/Snippets
ASP Classic
ASP.NET
C
C#
C++
HTML / CSS
Java
Javascript
Linux / BSD
Perl
PHP
Python
Ruby
SQL
VB 6
VB.NET

C.N. Friends
  Planet Rome

Link to Us!
Code Newbie
  Code Newbie
    vb
  » Capturing web page text
      by vaclav
 Page 1 of 1 
   

(Login to remove green text ads)
Use the Microsoft WebBrowser Control to capture the text of almost any web page.

Does your application need to capture or manipulate the text on a web page? If you do not have any idea why I can hardly help. But if you know WHY, but not HOW, start reading. I assume that you are almost absolute beginner. This is not an advanced topic, all you need are 4 simple steps:

1. Add a Microsoft WebBrowser control to your project.
2. Navigate the control to the desired page.
3. Wait on document complete.
4. Use the control to capture the text.

OK, so what about this WebBrowser thing? Open new project an look at the ToolBox window ... ehm.. PictureBox, TextBox, Label but no WebBrowser! No problem. Left click the ToolBox, and select Components in the left button menu. You will se a long list, the name of Web Browser control may vary, (it is Microsoft Internet Controls on my computer) but it will always be a nickname for the SHDOCVW.dll. Mark the check box next to the control and close the Components window. A nice small globe should appear at the ToolBox. That is it.

After you add the Web Browser to a form, it looks almost like an ordinary TextBox. But don’t be mistaken, you have just got access to much of the power of Microsoft Internet Explorer. And if you happen to know the IE object model, JavaScript and some HTML ... well not a subject for this short N.U.M.

Name the control Web1 and add a command button and text box to the form. Name the TextBox Text1 and make sure that it its multiline property is on. Now add this declaration to the top section of your form:

Code:
Private Const DesiredURL As String = "http://www.microsoft.com/"
You need to tell WebBrowser where it should navigate. Therefore, modify the command button click event so it looks like this

Code:
Private Sub Command1_Click() Web1.Navigate DesiredURL End Sub
Test it. When you press the button, the Web Browser will pay a visit to its Microsoft homeland. Or any other address you pass to the navigate method.

Before we are ready to capture the text from the page, we need to make sure that the navigation is complete. This may become involved, but here we use the simplest and less flexible method. Modify the Browser Document complete event so it looks like this:

Code:
Private Sub Web1_DocumentComplete(ByVal pDisp As Object, URL As Variant) If URL = DesiredURl then Call Capture End Sub
As you probably guess, Capture is the sub which will do the hard work. There it is:

Code:
Private Sub Capture() Dim v As Variant Set v = Web1.Document.body Text1.Text = v.innertext End Sub
This extreme simplicity makes WebBrowser my favourite control. But remember, power should be used with responsibility. Whereas stock-quoting app. is almost always OK, there are some ethical measures to consider before writing a database-robber or mass-downloader. Wait! Now I see it. I forgot to mention how to interact with web pages, fill text and press submit buttons in code. Well, perhaps next time.




 
 Page 1 of 1 
   

Rate This Article
1 2 3 4 5 6 7 8 9 10





Copyright © 2000-2006, Milano Interactive
Web Hosting provided by Portal 360 Web Hosting
Open Circle