Separating ASCII text from binary content in a file

2022-12-20 03:27 问答作者：

I have a file that has both ASCII text and binary content. I would开发者_高级运维 like to extract the text without having to parse the binary content as the binary content is 180MB. Can I simply extract the text for further manipulation ... what would be the best way of going about it.

The ASCII is at the very beginning of the file.

There are 4 libraries to read FITS files in Java here:

Java

nom.tam.fits classes

A Java FITS library has been developed which provides efficient -- at least for Java -- I/O for FITS images and binary tables. The Java libraries support all basic FITS formats and gzip compressed files. Support for access to data subsets is included and the HIERARCH convention may be used.

eap.fits

Includes an applet and application for viewing and editing FITS files. Also includes a general purpose package for reading and writing FITS data. It can read PGP encrypted files if the optional PGP jar file is available.

jfits

The jfits library supports FITS images and ASCII and binary tables. In-line modification of keywords and data is supported.

STIL

A pure java general purpose table I/O library which can read and write FITS binary tables amongst other table formats. It is efficient and can provide fast sequential or random read access to FITS tables much larger than physical memory. There is no support for FITS images.

I am not aware of any Java classes that will read the ASCII characters and ignore the rest, but the easiest thing I can come up with here is to use the strings utility (assuming you are on a Unix-based system).

SYNOPSIS strings [ - ] [ -a ] [ -o ] [ -t format ] [ -number ] [ -n number ] [--] [file ...]

DESCRIPTION Strings looks for ASCII strings in a binary file or standard input. Strings is useful for identifying random object files and many other things. A string is any sequence of 4 (the default) or more printing characters ending with a newline or a null. Unless the - flag is given, strings looks in all sections of the object files except the (__TEXT,__text) section. If no files are specified standard input is read.

You could then pipe the output to another file and do whatever you want with it.

Edit: with the additional information that all the ASCII comes at the beginning, it would be a little easier to extract the text programmatically; still, this is faster than writing code.

Assuming you can tell where the end of the ASCII content is, just read characters from the file until you find the end of it, and close the file.

Supposing that there is some token which divides the file into the binary and ASCII components (say, "#END#" on a line all by itself), you can do sometihng like the following:

import java.io.*;

// ...

public static void main(String args[]) {
  try {
    FileInputStream f = new FileInputStream("object.bin");
    DataInputStream d = new DataInputStream(f);
    BufferedReader b = new BufferedReader(new InputStreamReader(d));

    String s = "";
    while ((s = b.readLine()) != "#END#") {
      // ASCII contents parsed here.
      System.out.println(s);
    }

    d.close();
  } catch (Exception e) {
      System.err.println("kablammo! " + e.getMessage());
  }
}

Have a method that checks whether a particular character meets your criteria (here, I've covered characters that are found on the keyboard). Once you hit a character for which the method returns false, you know you've hit the binary. Note that valid ASCII characters may also form part of the binary so you may end up with a few extra characters at the end.

static boolean isAsciiCharacter(char c) {
    return (c >= ' ' && c <= '~') ||
            c == '\n' ||
            c == '\r';
}

The first 2880 bytes of a FITS file are ASCII header data, representing 36 80-column "card images". There are no line terminator characters, just a 36x80 ASCII array, padded out with blanks if necessary. There may be additional 2880-byte ASCII headers preceding the binary data; you'd have to parse the first set of headers to know how much ASCII to expect.

But I heartily endorse Oscar Reyes' advice to use an existing package to decode FITS files! Two of the packages he mentioned are hosted by NASA's Goddard Space Flight Center, who are also responsible for maintaining the FITS format. That's about as definitive a source as you can get.

继续阅读：ascii binary extract file

Separating ASCII text from binary content in a file

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？