Im trying to parse an input file as follows:
#*Nonmonotonic logic - context-dependent reasoning.
#@Victor W. Marek,Miroslaw Truszczynski
#t1993
#cArtificial Intelligence
#index3003478
#%3005567
#%3005568
#%3005569
#!abstracst
#*Wissensrepräsentation und Inferenz - eine grundlegende Einführung.
#@Wolfgang Bibel,Steffen Hölldobler,Torsten Schaub
#t1993
#cArtificial Intelligence
#index3005557
#%3005567
#!abstracts2
Im creating the parser for this file and Im looking for an output as follows:
Title: Nonmonotonic logic - context-dependent reasoning.
Author: Victor W. Marek,Miroslaw Truszczynski
Year: 1993
Domain: Artificial Intelligence
Index: 3003478
Citation: 3005567
Citation: 3005568
Citation: 3005569
Abstract: Abstract
Title: Wissensrepräsentation und Inferenz - eine grundlegende Einführung.
Author: Wolfgang Bibel,Steffen Hölldobler,Torsten Schaub
Year: 1993
Domain: Artificial Intelligence
Index: 3005557
Citation: 3005567
Abstract: Abstract2
The code that I create so far is below but it produced a totally different output that what I expected and I could not figure out why the scanner reads it the wrong way. It seems to only read the first character of each line as title, not the first line of every part.
Im thinking that maybe the scanner would not read the “#” sign but I guess I might be wrong as well. To make it clear whats wrong, for example, if I only wanna print out the title, the output I got is
Title:*
Title:@
Title:t
Title:c
Title:i
Title:%
Title:!
Title:
Title:*
Title:@
Title:t
Title:c
Title:i
Title:i
Title:%
Title:!
Title:
Done.
And if I tried to print out title and author the output I got is as follows:
Title:*
Author:Nonmonotonic logic - context-dependent reasoning.
Title:@
Author:Victor W. Marek,Miroslaw Truszczynski
Title:t
Author:1993
Title:c
Author:Artificial Intelligence
Title:i
Author:ndex3003478
Title:%
Author:
Title:!
Title:
Author:
Title:*
Author:Wissensrepr?sentation und Inferenz - eine grundlegende Einf?hrung.
Title:@
Author:Wolfgang Bibel,Steffen H?lldobler,Torsten Schaub
Title:t
Author:1993
Title:c
Author:Artificial Intelligence
Title:i
Author:ndex3005557
Title:i
Author:ndex3005557
Title:%
Author:
Title:!
Title:
Author:
Done.
The code is as follows:
import java.sql.*;
import java.util.Scanner;
import java.io.*;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.nio.file.Path;
import java.nio.file.Paths;
public class Citation{
public static void main (String[] args) throws SQLException,
ClassNotFoundException, IOException{
Citation parser = new Citation("D:/test.txt");
parser.processLineByLine();
log("Done.");
}
public Citation(String aFileName){
fFilePath = Paths.get(aFileName);
}
public final void processLineByLine() throws IOException {
try (Scanner scanner = new Scanner(fFilePath, ENCODING.name())){
while (scanner.hasNextLine()){
processLine(scanner.nextLine());
}
}
}
protected void processLine(String aLine){
Scanner scanner = new Scanner(aLine);
scanner.useDelimiter("n");
while(scanner.hasNext()){
// Scanner scanner = new Scanner(aLine);
scanner.useDelimiter("#*");
if(scanner.hasNext()){
String title = scanner.next();
System.out.println("Title:" + title);
}
// Scanner scanner3 = new Scanner(aLine);
scanner.useDelimiter("#@");
if(scanner.hasNext()){
String author = scanner.next();
// System.out.println(author);
}
// Scanner scanner4 = new Scanner(aLine);
scanner.useDelimiter("#t");
if(scanner.hasNext()){
String year = scanner.next();
// System.out.println(year);
}
// Scanner scanner5 = new Scanner(aLine);
scanner.useDelimiter("#c");
if(scanner.hasNext()){
String domain = scanner.next();
// System.out.println(domain);
}
// Scanner scanner6 = new Scanner(aLine);
scanner.useDelimiter("#index");
if(scanner.hasNext()){
String index = scanner.next();
// System.out.println(index);
}
// Scanner scanner7 = new Scanner(aLine);
scanner.useDelimiter("#%");
if(scanner.hasNext()){
String cite = scanner.next();
// System.out.println(cite);
}
// Scanner scanner8 = new Scanner(aLine);
scanner.useDelimiter("#!");
if(scanner.hasNext()){
String abstracts = scanner.next();
// System.out.println(abstracts);
}
}
}
// PRIVATE
private final Path fFilePath;
private final static Charset ENCODING = StandardCharsets.UTF_8;
private static void log(Object aObject){
System.out.println(String.valueOf(aObject));
}
}
When I changed the "#*"
delimiter as "#//*"
delimiter, the title read, but then every line is read as titles as well. It does not detect my other delimiters. The output I got is as follows:
Title:Nonmonotonic logic - context-dependent reasoning.
Title:#@Victor W. Marek,Miroslaw Truszczynski
Title:#t1993
Title:#cArtificial Intelligence
Title:#index3003478
Title:#%
Title:#!
Title:
Title:Wissensrepr?sentation und Inferenz - eine grundlegende Einf?hrung.
Title:#@Wolfgang Bibel,Steffen H?lldobler,Torsten Schaub
Title:#t1993
Title:#cArtificial Intelligence
Title:#index3005557
Title:#index3005557
Title:#%
Title:#!
Title:
2
Answers
Assuming the file format isn’t changing soon, modify as below
The problem is that you are using
scanner.useDelimiter("#*");
. This method requires a regular expression, where * symbol means zero ore more occurencies of symbol(in your case #). So, usescanner.useDelimiter("#\*");
in your case.