Parser using Scanner not working for "#" sign - Artificial Intelligence

fuschia
October 25, 2014
82 views
0 votes
2 Answers

Im trying to parse an input file as follows:

#*Nonmonotonic logic - context-dependent reasoning.
#@Victor W. Marek,Miroslaw Truszczynski
#t1993
#cArtificial Intelligence
#index3003478
#%3005567
#%3005568
#%3005569
#!abstracst

#*Wissensrepräsentation und Inferenz - eine grundlegende Einführung.
#@Wolfgang Bibel,Steffen Hölldobler,Torsten Schaub
#t1993
#cArtificial Intelligence
#index3005557
#%3005567
#!abstracts2

Im creating the parser for this file and Im looking for an output as follows:

Title: Nonmonotonic logic - context-dependent reasoning.
Author: Victor W. Marek,Miroslaw Truszczynski
Year: 1993
Domain: Artificial Intelligence
Index: 3003478
Citation: 3005567
Citation: 3005568
Citation: 3005569
Abstract: Abstract

Title: Wissensrepräsentation und Inferenz - eine grundlegende Einführung.
Author: Wolfgang Bibel,Steffen Hölldobler,Torsten Schaub
Year: 1993
Domain: Artificial Intelligence
Index: 3005557
Citation: 3005567
Abstract: Abstract2

The code that I create so far is below but it produced a totally different output that what I expected and I could not figure out why the scanner reads it the wrong way. It seems to only read the first character of each line as title, not the first line of every part.
Im thinking that maybe the scanner would not read the “#” sign but I guess I might be wrong as well. To make it clear whats wrong, for example, if I only wanna print out the title, the output I got is

Title:*
Title:@
Title:t
Title:c
Title:i
Title:%
Title:!
Title: 
Title:*
Title:@
Title:t
Title:c
Title:i
Title:i
Title:%
Title:!
Title:
Done.

And if I tried to print out title and author the output I got is as follows:

Title:*
Author:Nonmonotonic logic - context-dependent reasoning.
Title:@
Author:Victor W. Marek,Miroslaw Truszczynski
Title:t
Author:1993
Title:c
Author:Artificial Intelligence
Title:i
Author:ndex3003478
Title:%
Author: 
Title:!
Title: 
Author: 
Title:*
Author:Wissensrepr?sentation und Inferenz - eine grundlegende Einf?hrung.
Title:@
Author:Wolfgang Bibel,Steffen H?lldobler,Torsten Schaub
Title:t
Author:1993
Title:c
Author:Artificial Intelligence
Title:i
Author:ndex3005557
Title:i
Author:ndex3005557
Title:%
Author: 
Title:!
Title: 
Author: 
Done.

The code is as follows:

import java.sql.*;
import java.util.Scanner;
import java.io.*;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.nio.file.Path;
import java.nio.file.Paths;

public class Citation{

    public static void main (String[] args) throws SQLException,
    ClassNotFoundException, IOException{

        Citation parser = new Citation("D:/test.txt");
        parser.processLineByLine();
        log("Done.");

    }

     public Citation(String aFileName){
         fFilePath = Paths.get(aFileName);
     }

     public final void processLineByLine() throws IOException {
         try (Scanner scanner =  new Scanner(fFilePath, ENCODING.name())){
              while (scanner.hasNextLine()){
                  processLine(scanner.nextLine());
              }      
            }
     }

     protected void processLine(String aLine){


                Scanner scanner = new Scanner(aLine);
                scanner.useDelimiter("n");

                while(scanner.hasNext()){

                //   Scanner scanner = new Scanner(aLine);
                     scanner.useDelimiter("#*");
                     if(scanner.hasNext()){
                         String title = scanner.next();
                         System.out.println("Title:" + title);

                     }

                //   Scanner scanner3 = new Scanner(aLine);
                     scanner.useDelimiter("#@");
                     if(scanner.hasNext()){
                         String author = scanner.next();
                //       System.out.println(author);
                     }

                //   Scanner scanner4 = new Scanner(aLine);
                     scanner.useDelimiter("#t");
                     if(scanner.hasNext()){
                         String year = scanner.next();
                //       System.out.println(year);
                     }
                //   Scanner scanner5 = new Scanner(aLine);
                     scanner.useDelimiter("#c");
                     if(scanner.hasNext()){
                         String domain = scanner.next();
                    //   System.out.println(domain);

                     }
                //   Scanner scanner6 = new Scanner(aLine);
                     scanner.useDelimiter("#index");
                     if(scanner.hasNext()){
                         String index = scanner.next();
                        // System.out.println(index);
                     }               
                //   Scanner scanner7 = new Scanner(aLine);
                     scanner.useDelimiter("#%");
                     if(scanner.hasNext()){
                         String cite = scanner.next();
                    //   System.out.println(cite);

                     }
                //   Scanner scanner8 = new Scanner(aLine);
                     scanner.useDelimiter("#!");
                     if(scanner.hasNext()){
                         String abstracts = scanner.next();
                        // System.out.println(abstracts);

                     }



                }





          }

          // PRIVATE 
          private final Path fFilePath;
          private final static Charset ENCODING = StandardCharsets.UTF_8;  

          private static void log(Object aObject){
            System.out.println(String.valueOf(aObject));
          }


        }

When I changed the "#*" delimiter as "#//*" delimiter, the title read, but then every line is read as titles as well. It does not detect my other delimiters. The output I got is as follows:

Title:Nonmonotonic logic - context-dependent reasoning.
Title:#@Victor W. Marek,Miroslaw Truszczynski
Title:#t1993
Title:#cArtificial Intelligence
Title:#index3003478
Title:#% 
Title:#!
Title:  
Title:Wissensrepr?sentation und Inferenz - eine grundlegende Einf?hrung.
Title:#@Wolfgang Bibel,Steffen H?lldobler,Torsten Schaub
Title:#t1993
Title:#cArtificial Intelligence
Title:#index3005557
Title:#index3005557
Title:#% 
Title:#!
Title:

Tags: java parsing

Answers

Assuming the file format isn’t changing soon, modify as below

protected void processLine(String aLine) {
   if (aLine.trim().equals("")) {
       System.out.println();//executed when an empty line is read
   }
   else if (aLine.startsWith("#*")) {
      System.out.println("Title:" + aLine.substring(2)); //or, you can also do
      //System.out.println("Title:" + aLine.substring("#*".length()));
   } else if (aLine.startsWith("otherCases") {
      //proceed for other cases in similar fashion.
   }
   .
   .
   .
}

- StasKolodyuk
- October 26, 2014 at 12:27 am
- 0 votes
0
The problem is that you are using scanner.useDelimiter("#*");. This method requires a regular expression, where * symbol means zero ore more occurencies of symbol(in your case #). So, use scanner.useDelimiter("#\*"); in your case.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Parser using Scanner not working for "#" sign – Artificial Intelligence

Answers