The problem is in the result variable.
There are more then some places with jpg.
What I want is to get all the places ending with jpg but as string.
I mean that result will have one link ending with jpg then again result will be with another link ending with jpg.
it’s like:
and I want in result to get each time:
then in the next iterate:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.IO;
using System.Linq;
using System.Net;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
using System.Windows.Forms;
namespace Testing
{
public partial class Form1 : Form
{
private List<string> links = new List<string>();
string htmlCode;
public Form1()
{
InitializeComponent();
GetLinks();
}
private void GetLinks()
{
using (WebClient client = new WebClient()) // WebClient class inherits IDisposable
{
htmlCode = client.DownloadString("https://test.com/my-site");
}
int index1 = 0;
using (StringReader reader = new StringReader(htmlCode))
{
string line;
while ((line = reader.ReadLine()) != null)
{
int index = line.IndexOf("https://test.com");
if (index != -1)
{
index1 = line.IndexOf("png", index);
}
if (index != -1 && index1 != -1)
{
string result = line.Substring(index, index1);
}
}
}
}
private void Form1_Load(object sender, EventArgs e)
{
}
}
}
2
Answers
The better way to extract image url within html code is using Regular Expression.
Image url extraction regular expression:
For how to use Regular Expressions in C#:
https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference
You’re passing in the web page’s html to the
File.ReadAllLines
method as if it’s a file name. You already have the html content as a string variable. Remove the line, and rename ‘content’ to ‘htmlCode’:A regex to find everything starting with
https://test.com/
and ending with.jpg
could look like this:.
is a special character in a regex, which matches anything. The*
after the dot means ‘one or more of the preceeding pattern’. The next.
before the jpg extension has to be escaped with a back slash because it’s a special character. Note that when putting into a C# stirng literal, the back slashes then have to be escaped: