Example 10.1: Read a file byte by byte



More and more makers of electronic devices use standard storage media to record data. Sometimes this is central to the device’s function, as in a camera, so that the data must be easy to recover. Other times, it’s effectively incidental, and the device maker may not provide easy access to the stored data.

For example, I recently was prescribed a constant positive air pressure (CPAP) machine for
sleep apnea
. My machine, made by Philips, records data onto a SD card. The card’s file system is readable, but Philips provides neither software to read the files, nor a data dictionary to explain what they contain. (I believe Philips sells software that does read it, for ludicrous prices, to physicians who prescribe their machines. Nice racket.)

If you open the files on the card as ASCII files, you get a bunch of gobbledygook. But the data they contain is
mine
, in the most fundamental sense! I want to be able to read it. Fortunately, some folks have done a fair amount of work to reverse engineer the files. Through them, I was able to find some
guidance
for data from a related machine. Now I know what’s in the file, more or less: a header of 25 bytes, 1200 bytes of data– representing 4 minutes of recording–, and a one-byte footer, repeated ad nauseum (ad somnum? somno contingit?).

Today we show how to read bytes stored in a file as signed integers. For this file, (
download
,) we also trim out the header and footer, and make a simple line plot of the recorded data, which appear to be some function of the variable pressure with which the CPAP machine outputs air. Next time we’ll make a more useful plot.

SAS

In SAS, we can use the
infile
statement to read in the data (section 1.1.2).


data test;
infile "c:kencpap000000007.005" recfm=n;
input byte ib1. @@;
run;

The
recfm=n
option tells SAS (for Windows, may differ in other OS) to read the file in binary. The
ib1.
informat tells SAS to read the bytes in the native format. (We cover reading in various formats in section 1.1.3, A.6.4, and several examples.) The
@@
tells SAS to hold this line of input, rather than skipping to a new line, when the data is read. (See
Example 8.1
) . SAS will read bytes until there are no more to read.

Now I have a file with 128,680 observations, each being a signed integer. Some of these are actually nonsense, since the header and footer contain data stored in a variety of formats. To get rid of the header, we’ll use the

_n_
implied variable (section 1.4.15) which is effectively the line number, in conjunction with the
mod
function. While we’re processing the data set anyhow, we’ll also figure out the total elapsed time, which will be useful for plotting.


data cycles;
set test;
if mod(_n_,1226) ge 25 and mod(_n_,1226) lt 1225;
/* otherwise it's a header or the footer */
time_min = (4 * int(_n_/1226)) + (mod(_n_, 1226) - 24 )/(300);
/* 4 minutes for each header-data-footer block +
number of measurements in this data block / 300
(measurements per minute).
run;

Now it’s easy to plot the data– a simple connected line plot across time makes sense, and can be made using the
symbol
statement with the
i=j
(j for join) syntax. The result is shown above.


symbol i = j v = none c = black;
proc gplot data = cycles;
where time_min le 4;
plot byte * time_min;
run;
quit;

R

In R, we’ll use the
readBin()
function to actually read the file, but we need to do a little prep, first. The
readBin()
function requires we input the number of data elements to read. An overestimate is OK, but we can easily find the exact length of the file using the
file.info
function; the resulting object has a
size
constituent with the number of bytes. We’ll also need a “connection” to the file, which is established in a call to the
file()
function.


finfo = file.info("0000000007.005")
toread= file("0000000007.005", "rb")
alldata = readBin(toread, integer(), size=1, n = finfo$size, endian="little")

The
size
option is the length of the elements, in bytes, and the
endian
option helps describe how the bytes should be read.

Analogous to SAS, the
alldata
vector has 128,680 integers. All that remains is to remove the headers and footers. We’ll do that by making a logical test with the
%%
operator, saving the result as a vector, and selecting out the data from among the headers and footers using this logical. All that then remains is to plot the data– we replicate the SAS plot of the first 1200 observations (4 minutes).


keep = 1:finfo$size %% 1226 > 24 & 1:finfo$size %% 1226 < 1225
cycles = alldata[keep]
# cycles gets only the elements of alldata when the corresponding
# element of keep is TRUE

plot(1:1200, cycles[1:1200], type = "l")

The result is shown below. The first four minutes of my night’s sleep were apparently characterized by generally lengthening breaths that became increasingly shallow.



An unrelated note about aggregators
We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences.
SAS and R
is aggregated by
R-bloggers
,
PROC-X
, and
statsblogs
with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at
SAS and R
. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our
license
; if you see advertisements on this page, the aggregator is violating the terms by which we publish our work.

稿源:SAS and R (源链) | 关于 | 阅读提示

本站遵循[CC BY-NC-SA 4.0]。如您有版权、意见投诉等问题,请通过eMail联系我们处理。
酷辣虫 » 综合技术 » Example 10.1: Read a file byte by byte

喜欢 (0)or分享给?

专业 x 专注 x 聚合 x 分享 CC BY-NC-SA 4.0

使用声明 | 英豪名录