I propose to add two function to multiarray module in Numeric:
tofile(array, openedfile)
array = fromfile(opendfile, dimension, typecode = "l")
Now I use these functions in my module.
Here is an implementation from module:
###############################################################################3
#include "Python.h"
#define PY_ARRAY_UNIQUE_SYMBOL PyArray_arrayfile
#include "Numeric/arrayobject.h"
#define MAX_DIMS 40
static PyObject *
PyArray_FromFile(PyObject *self, PyObject *args)
{
PyObject *f, *dimobj, *temp;
PyArray_Descr *descr;
PyArrayObject *ret = NULL;
int i, n, dim[40], nt;
char *typecode = "l";
FILE *fp;
int type_num, N = 1;
size_t nread;
if (!PyArg_ParseTuple(args, "O!O!|s#:fromfile", \
&PyFile_Type, &f, &PyTuple_Type, &dimobj, &typecode, &nt))
return NULL;
fp = PyFile_AsFile(f);
if (fp == NULL) {
PyErr_SetString(PyExc_TypeError, "1st argument must be open file");
return NULL;
}
n = PyTuple_Size(dimobj);
if (n > MAX_DIMS) {
PyErr_SetString(PyExc_TypeError, "dimension is too large");
return NULL;
}
if (n > 0) {
for (i = 0; i < n; i++) {
temp = PyTuple_GetItem(dimobj, i);
dim[i] = (int)PyInt_AsLong(temp);
}
descr = PyArray_DescrFromType(*typecode);
type_num = descr -> type_num;
ret = (PyArrayObject *)PyArray_FromDims(n, dim, type_num);
memcpy(ret->data, descr->zero, N*descr->elsize);
N = 1;
for (i = 0; i < n; i++) N *= dim[i];
nread = fread((char *)ret->data, descr->elsize, N, fp);
if (nread < (size_t)N) {
PyErr_SetString(PyExc_EOFError, "not enough items in file");
return NULL;
}
}
return (PyObject *)ret;
}
static char PyArray_FromFile_doc [] =
"fromfile(f, dimension, typecode = \'l\')\n\
\n\
Create array from open file f with given dimension and typecode.\n\
Note, that file must be open in binary mode.";
static PyObject *
PyArray_ToFile(PyObject *self, PyObject *args)
{
PyObject *f;
PyArrayObject *A;
int N;
FILE *fp;
if (!PyArg_ParseTuple(args, "O!O!:tofile", &PyArray_Type, &A, &PyFile_Type, &f))
return NULL;
fp = PyFile_AsFile(f);
if (fp == NULL) {
PyErr_SetString(PyExc_TypeError, "arg must be open file");
return NULL;
}
N = PyArray_SIZE(A);
if (N > 0) {
if (fwrite(A->data, A->descr->elsize, N, fp) != (size_t)N) {
PyErr_SetFromErrno(PyExc_IOError);
clearerr(fp);
return NULL;
}
}
Py_INCREF(Py_None);
return Py_None;
}
static char PyArray_ToFile_doc [] =
"tofile(array, f)\n\
\n\
Write array to file open f.\n\
Note, that file must be open in binary mode.";
static PyMethodDef arrayfile_module_methods[] = {
{"fromfile", PyArray_FromFile, 1, PyArray_FromFile_doc},
{"tofile", PyArray_ToFile, 1, PyArray_ToFile_doc},
{NULL, NULL} /* sentinel */
};
############################################################################
Zaur Shibzoukhov

Hmmm, aparently, I can cast a long array to a double array but not
to a float array.
Check out the following code from arrayobject.c (Numeric 22.0)
[this is a switch on the type being casted from]
case PyArray_LONG:
/*Shouldn't allow Longs to be cast to Ints, not safe on 64-bit
machines!*/
return (totype >= PyArray_INT) && (totype != PyArray_FLOAT);
This makes absolutely no sense to me. Is this a bug?
Mathew
--
M47h3w `/34735

The following code produces an error
---------------------------------------
a=array(([1,2,3,4,5],[10,9,8,7,6])).astype(Float32)
cond=array(([0,0,0,0,0],[1,1,1,1,1])
b=array(([0,0,0,0,0],[0,0,0,0,0]))
choose(cond,(a,b))
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: Array can not be safely cast to required type
------------------------------
I suppose I might need to cast b to Float32. But, I have a function
which is passed a and b
as args. Ideally, I would like a function which casts a and b to the
highest compatible type since
"choose" doesnt do it. How can I do this simply?
Mathew
--
M47h3w `/34735

[I posted this almost a week ago, but apparently an email problem
prevented it from actually getting posted!]
I think there has been sufficient discussion about rank-0 arrays
to make some decisions about how numarray will handle them.
[If you don't want to wade through the rationale, jump to the
end where there is a short summary of what we plan to do and
what we have questions about]
********************************************************************
First I'd like to take a stab at summarizing the case made for
rank-0 arrays in general and adding some of my own comments
regarding these points.
1) rank-0 arrays are a useful mechanism to avoid having binary
operations with scalars cause unintended promotion of other arrays
to larger numeric types (e.g. 2*arange(10, typecode=Int16) results
in an Int32 result).
*** For numarray this is a non-issue because the coercion rules
prevent scalars from increasing the type of an array if the scalar
is the same kind of number (e.g., Int, Float, Complex) as the
array.
2) rank-0 arrays preserve the type information instead of converting
scalars to Python scalars.
*** This seems of limited value. With only a couple possible exceptions
in the future (and none now), Python scalars are effectively the
largest type available so no information is lost. One can convert
to and from Python scalars and not lose any information. The possible
future exceptions are long doubles and UInt32 (supported in Numeric,
but not numarray yet--but frankly, I'm not yet sure how important
UInt32 is at the moment). It is possible that Python scalars may
move up in size so this may or may not become an issue in the future.
By itself, it does not appear to be a compelling reason.
3) rank-0 arrays allow controlling exceptions (e.g. divide by zero)
in a way different from how Python handles them (exception always)
*** This is a valid concern...maybe. I was more impressed by it
initially, but it occurred to me that most expressions that involve
a scalar exception (NaN, divide-by-zero, overflow, etc.) generally
corrupt everything, unlike exceptions with arrays where only a few
values may be tainted. Unless one is only interested in ignoring some
scalar results in a scalar expression as part of a larger computation,
it seems of very limited use to ignore, or warn on scalar exceptions.
In any event, this is really of no relevance to the use of rank-0
for indexed results or reduction operations.
4) Using rank-0 arrays in place of scalars would promote more
generic programming. This was really the last point of real contention
as far as I was concerned. In the end, it really came down to seeing
good examples of how lacking this caused code to be much worse than
it could be with rank-0 arrays. There really are two cases being
discussed: whether indexing a single item ("complete dereferencing"
in Scott Gilbert's description) returns rank-0, and whether reduction
operations return rank-0.
*** indexing returning rank-0. Amazingly enough no one was able to
provide even one real code example of where rank-0 returns from indexing
was a problem (this includes MA as far as I can tell). In the end, this
has been much ado about nothing. Henceforth, numarray will return
Python scalars when arrays are indexed (as it does currently).
*** reduction operations. There are good examples of where reduction
operations returning rank-0 are made simpler. However, the situation
is muddied quite a bit by other issues which I will discuss below.
This is an area that deserves a bit more discussion in general.
But before I tackle that, there is a point about rank-0 arrays that
needs to be made which I think is in some respects is an obvious point,
but somehow got lost in much of the discussions.
Even if it were true that rank-0 arrays made for much simpler, generic
code, they are far less useful than might appear in simplifying
code. Why? Because even if array operations (whether indexing, reduction
or other functions) were entirely consistent about never returning
scalars, it is a general fact that most Numeric/numarray code must
be prepared to handle Python scalars thrown at it in place of
arrays by the user. Since Python scalars can come leaking into
your code at many points, consistency in Numeric/numarray in avoiding
Python scalars really doesn't solve the issue. I would hazard a
guess that the great majority of the conditional code that exist
today would not be eliminated because of this (e.g., this appears
to be the case for MA)
Reduction operations:
There is a good case to be made that reduction operations should
result in rank-0 arrays rather than scalars (after all, they
are reducing dimensions), but not everyone agrees that is what
should be done. But before deciding what is to be done there,
some problems with rank-0 arrays should be discussed. I think
Konrad and Huaiyu have made very powerful arguments about how
certain operations like indexing, attributes and such should or
shouldn't work. In particular, some current Numeric behaviors
should be changed. Indexing by 0 should not work (instead Scott
Gilbert's suggestion of indexing with an empty tuple sounds
right, if a bit syntatically clumsy due to Python not accepting
an empty index). len should not return 1, etc.
So even if we return rank-0 values in reduction operations, this
appears to still cause problems with some of the examples given
by Eric that depend on len(rank-0) = 1. What should be done
about that? One possibility is to use different numarray
functions designed to help write generic code (e.g., an alternate
function to len). But there is one aspect to this that ought to be
pointed out. Some have asked for rank-0 arrays that can be indexed
with 0 and produce len of 1. There is such an object that does this
and it is a rank-1 len-1 array. One alternative is to have reduction
operations have as their endpoint a rank-1 len-1 array rather than a
rank-0 array. The rank-0 endpoint is more justified conceptually,
but apparently less practical. If a reduction on a 1-d arrray always
produced a 1-d array, then one can always be guaranteed that it can
be indexed, and that len works on it. The drawback is that it can
never be used as a scalar directly as rank-0 arrays could be.
I think this is a case where you can't have it both ways. If you
want a scalar-like object, then some operations that work on higher
rank arrays won't work on it (or shouldn't). If you want something
where these operations do work, don't expect to use it where a
scalar is expected unless you index it.
Is there any interest in this alternate approach to reductions? We plan
to have two reduction methods available, one that results in scalars, and
one in arrays. The main question is which one .reduce maps to, and what
the endpoint is for the method that always returns arrays is.
***********************************************************************
SUMMARY
***********************************************************************
1) Indexing returns Python scalars in numarray. No rank-0 arrays
are ever returned from indexing.
2) rank-0 arrays will be supported, but len(), and indexing will not
work as they do in Numeric. In particular, to get a scalar, one
will have to index with an empty tuple (e.g., x[()], x[0] will
raise an exception), len() will return None.
Questions:
1) given 2, is there still a desire for .reduce() to return
rank-0 arrays (if not, we have .areduce() which is intented to return
arrays always).
2) whichever is the "returns arrays always" reduce method, should the
endpoint be rank-0 arrays or rank-1 len-1 arrays?

I send this on behalf of Cameron Laird <claird(a)phaseit.net>.
Please reply to him, not to me.
I have at least a couple of assignments from magazines such
as IBM's developerWorks to report on matters that involve
Numeric. I'd welcome contact from anyone here who wants to
publicize his or her work with Python and Numeric.
I have a particular interest in advantages Python and Numeric
enjoy over such alternatives as Mathematica, IDL, SAS/IML,
MATLAB, and so on, all of which are more narrowly targeted at
the kinds of scientific and engineering problems tackled by
contributors to this mailing list. What does Python do for
you that the commercial products don't?
I suspect that many of you will mention, in one form or
another, Python's aptness for programming "in the large".
Do you have specific examples of how this is clumsy in
MATLAB, Mathematica, and so on?
Have you tried to interface MATLAB and so on to hardware
instrumentation or other external data sources?
How do the scientists and engineers (as opposed to the
"informaticians" or software developers) on your teams
accept Python, compared to IDL and friends? Do scientists
at your site program?
Is there anything Python's missing in its competition with
MATLAB and so on?
Cameron Laird <Cameron(a)Lairds.com> +1 281 996 8546 FAX
http://phaseit.net/claird/misc.writing/publications.html

Some of the choices between rank-0 arrays and new scalar types
might be resolved by enumerating the properties desired of them.
Most properties of rank-0 arrays could be fixed by consistency
requirements alone, using operations that reduce array dimensions.
Let a = ones((2,3,4))
b = sum(a)
c = sum(b)
d = sum(c)
Property 1: the shape of an array is a tuple of integers
a.shape == (2, 3, 4)
b.shape == (3, 4)
c.shape == (4,)
d.shape == ()
Property 2: rank(a) == len(a.shape)
rank(a) == 3 == len(a.shape)
rank(b) == 2 == len(b.shape)
rank(c) == 1 == len(c.shape)
rank(d) == 0 == len(d.shape)
Property 3: len(a) == a.shape[0]
len(a) == 2 == a.shape[0]
len(b) == 3 == b.shape[0]
len(c) == 4 == c.shape[0]
len(d) == Exception == d.shape[0]
# Currently the last is wrong?
Property 4: size(a) == product(a.shape)
size(a) == 24 == product(a.shape)
size(b) == 12 == product(b.shape)
size(c) == 4 == product(c.shape)
size(d) == 1 == product(d.shape)
# Currently the last is wrong
Property 5: rank-0 array behaves as mutable numbers when used as value
array(2) is similar to 2
array(2.0) is similar to 2.0
array(2j) is similar to 2j
# This is a summary of many concrete properties.
Property 6: Indexing reduces rank. Slicing preserves rank.
a[:,:,:].shape = (2, 3, 4)
a[1,:,:].shape = (3, 4)
a[1,1,:].shape = (4,)
a[1,1,1].shape = ()
Property 7: Indexing by tuple of ints gives scalar.
a[1,1,1] == 1
b[1,1] == 2
c[1,] == 6
d[()] == 24
# So rank-0 array indexed by empty tuple should be scalar.
# Currently the last is wrong
Property 8: Indexing by tuple of slices gives array.
a[:,:,:] == ones((2,3,4))
b[:,:] == ones((3,4)) * 2
c[:] == ones((,4)) * 6
d[()] == ones(()) * 24
# So rank-0 array indexed by empty tuple should be rank-0 array.
# Currently the last is wrong
Property 9: Indexing as lvalues
a[1,1,1] = 2
b[1,1] = 2
c[1,] = 2
d[()] = 2
Property 10: Indexing and slicing as lvalues
a[:,:,:] = ones((2, 3, 4))
a[1,:,:] = ones((3, 4))
a[1,1,:] = ones((4,))
a[1,1,1] = ones(())
# But the last is wrong.
Conclusion 1: rank-0 arrays are equivalent to scalars.
See properties 7 and 8.
Conclusion 2: rank-0 arrays are mutable.
See property 9.
Conclusion 3: shape(scalar), size(scalar) are all defined, but len(scalar)
should not be defined.
See conclusion 1 and properties 1, 2, 3, 4.
Conclusion 4: A missing axis is similar to having dimension 1.
See property 4.
Conclusion 5: rank-0 int arrays should be allowed to act as indices.
See property 5.
Conclusion 6: rank-0 arrays should not be hashable except by object id.
See conclusion 2.
Discussions:
- These properties correspond to the current implementation quite well,
except a few rough edges.
- Mutable scalars are useful in their own rights.
- Is there substantial difference in overhead between rank-0 arrays and
scalars?
- How to write literal values? array(1) is too many characters.
- For rank-1 and rank-0 arrays, Python notation distinguishes:
c[1] vs c[1,]
d[] vs d[()]
Should these be used to handle semantic difference between indexing
and slicing? Should d[] be syntactically allowed?
Hope these observations help.
Huaiyu

Can someone refresh my memory as to why some properties of NumArrays that
are conceptually attributes are accessed as atributes (a.shape, a.rank),
while others are accessed through functions (a.type, a.iscontiguous,...).
Thanks,
-tim

Before we implement what we said we would regarding rank-0
arrays in numarray, there became apparent a couple new issues
that didn't really get considered in the first round of
discussion (at least I don't recall that they did).
To restate the issue: there was a question about whether
an index to an array that identified a single element only
(i.e., not a slice, nor an incomplete index, e.g. x[3] where
x is two dimensional) should return a Python scalar or a
rank-0 array. Currently Numeric is inconsistent on this point.
One usually gets scalars, but on some occasions, rank-0 arrays
are returned. Good arguments are to be had for either alternative.
The primary advantage of returning rank-0 arrays is that they
reduce the need for conditional code checking to see if a result
is a scalar or an array. At the end of the discussion it was
decided to have numarray return rank-0 arrays in all instances of
single item indexing. Since then, a couple potential snags have
arisen. I've already discussed some of these with Paul Dubois
and Eric Jones. I'd like a little wider input before making a
final (or at least experimental) decision.
If we return rank-0 arrays, what should repr return for rank-0
arrays. My initial impression is that the following is highly
undesirable for a interactive session, but maybe it is just me:
>>> x = arange(10)
>>> x[2]
array(2)
We, of course, could arrange __repr__ to return "2" instead,
in other words print the simple scalar for all cases of rank-0
arrays. This would yield the expected output in the above
example. Nevertheless, isn't it violating the intent of repr?
Are there other examples where Python uses repr in a similar,
misleading manner? But perhaps most feel that returning array(2)
is perfectly acceptable and won't annoy users. I am curious
about what people think about this.
The second issue is an efficiency one. Currently numarray uses
Python objects for arrays. If we return rank-0 arrays for
single item indexing, then some naive uses of larger arrays
as sequences may lead to an enormous number of array objects
to be created. True, there will be equivalent means of doing
the same operation that won't result in massive object creations
(such as specifically converting an array to a list, which would
be done much faster). Is this a serious problem?
These two issues led us to question whether we should indeed
return rank-0 arrays. We can live with either solution. But
we do want to make the right choice. We also know that both
functionalities must exist, e.g., indexing for scalars and
indexing for rank-0 arrays and we will provide both. The issue
is what indexing syntax returns. One argument is that it is
not a great burden on programmers to use a method (or other
means) to obtain a rank-0 array always if that is important
for the code they are writing and that we should make the
indexing syntax return what most users (especially less expert
ones) intuitively expect (scalars I presume). But others feel
it is just as important for the syntax that a progammer uses
to be as simple as the interactive user expects (instead of
something like x.getindexasarrayalways(2,4,1) [well, with a much
better, and shorter, name])
Do either of these issues change anyone's opinion? If people
still want rank-0 arrays, what should repr do?
Perry Greenfield