README
/books/
This file contains information about the BOOKS available
in the
Helsinki
Corpus of Swahili
The texts have been pre-processed to make the
automatic processing of the
texts easier. Each sentence is placed on its
own line. There is a tag between
angle brackets in the beginning of each
line indicating the book in question.
Therefore, when retrieving
information with a sentence as context, it is
possible to keep track of
the book which the piece of text is from. In the
book list below such a
tag precedes each book reference.
Individual texts of books listed
below are concatenated together as a
single file 'books-all', in the
order given below. To save space this file
may be in packed form. If this
file is in packed form, its name is
books-all.gz. You may still use it
without unpacking it by using the command
'zcat', which temporarily
unpacks it for you. For example, for retrieving the
strings with
'mwalimu' you may write the command line this way:
tuuri$ zcat books-all.gz | kw-alg '
mwalimu ' > results
The contents of file 'books-all' are also
divided into four separate files with
file names 'books1', 'books2',
'books3' and 'books4'. These files may be packed
as well.
Each
book used to be in a separate file before, but now the books have been
combined
into the files mentioned above. The list of individual book
files is
displayed here in order to give an idea of the size of each book.
By using
the tag one can still search only particular books. For example, if
you
want to use Kezilahabi's book 'Gamba la nyoka' as the source text, you can
use the following command:
tuuri$
zcat books-all.gz | egrep '<GAM>' | kw-alg ' mwalimu ' > results
If
you want to use Kezilahabi's all books as a source text, you may use the
following
command:
tuuri$
zcat books-all.gz | egrep
'(<GAM>|<KIC>|<MZI>|<NAG>|<ROS>)' |
kw-alg ' mwalimu ' >
results
The file 'books1' contains the files (= books)
listed below:
-rw-r----- 1
ahurskai swahil 572738 Apr 23 1993 fIII_FAS.snt
-rw-r----- 1 ahurskai swahil 288071 May 4 1995 kezi_GAM.snt
-rw-r----- 1 ahurskai swahil 247107 Apr 23 1993
lemk_YAR.snt
-rw-r----- 1
ahurskai swahil 358388 Apr 23 1993 LIwe_NYO.snt
-rw-r----- 1 ahurskai swahil 271999 Dec 7 1992 MOha_NYO.snt
-rw-r----- 1 ahurskai swahil 333618 Apr 23 1993
MUng_NJO.snt
-rw-r----- 1
ahurskai swahil 224533 Apr 23 1993 mvun_LWI.snt
The file
'books2' contains the files (= books) listed below:
-rw-r----- 1 ahurskai swahil 345040 Apr 23 1993
nyer_UJA.snt
-rw-r----- 1
ahurskai swahil 235636 Apr 23 1993 seme_NJO.snt
-rw-r----- 1 ahurskai swahil 157312 Apr 23 1993
SHaa_INS.snt
-rw-r----- 1
ahurskai swahil 201195 Apr 23 1993 SHaa_KIE.snt
-rw-r----- 1 ahurskai swahil 95846 Apr 23 1993
SHaa_KUF.snt
-rw-r----- 1
ahurskai swahil 105114 Apr 23 1993 SHaa_KUS.snt
-rw-r----- 1 ahurskai swahil 72041 Apr 23 1993
SHaa_PAM.snt
-rw-r----- 1
ahurskai swahil 67839 Apr 23 1993 SHaa_SAN.snt
-rw-r----- 1 ahurskai swahil 104498 Apr 23 1993
SHaa_WAS.snt
-rw-r----- 1
ahurskai swahil 354278 Jun 13 14:42
moha_TAT.snt
The file 'books3' contains the files (= books)
listed below:
-rw-r----- 1
ahurskai swahil 352030 Oct 12 1993 kezi_KIC.snt
-rw-r----- 1 ahurskai swahil 193878 Oct 12 1993 kezi_MZI.snt
-rw-r----- 1 ahurskai swahil 162681 Oct 12 1993
kezi_NAG.snt
-rw-r----- 1
ahurskai swahil 222015 Oct 12 1993 kezi_ROS.snt
-rw-r----- 1 ahurskai swahil 92598 Oct 12 1993
kiba_MAT.snt
-rw-r----- 1 ahurskai
swahil 94273 Jun 10 16:21
mach_TWE.snt
-rw-r----- 1
ahurskai swahil 176096 Oct 12 1993 mulo_KUN.snt
-rw-r----- 1 ahurskai swahil 122118 Oct 12 1993
yahy_PEP.snt
The file 'books4' contains the files (= books)
listed below:
-rw-r----- 1
ahurskai swahil 85851 Jun 10 16:21
mush_FAR.snt
-rw-r----- 1
ahurskai swahil 86405 Jun 10 16:21
liha_FED.snt
-rw-r----- 1
ahurskai swahil 92869 Mar 24 11:21
shaa-adi.snt
-rw-r----- 1
ahurskai swahil 333550 Jun 12 13:21
mbaa_HIS.snt
-rw-r----- 1
ahurskai swahil 61695 Jun 12 12:55
ngah_HUK.snt
-rw-r----- 1
ahurskai swahil 84379 Jun 12 12:56
huss_KIN.snt
-rw-r----- 1
ahurskai swahil 172953 Jun 12 13:00
kham_LUG.snt
-rw-r----- 1
ahurskai swahil 93179 Jun 10 16:21
huss_MAS.snt
-rw-r----- 1
ahurskai swahil 87615 Jun 12 13:24
abdu_MKE.snt
-rw-r----- 1
ahurskai swahil 26683 Jun 10 16:21
ngom_NGO.snt
-rw-r----- 1
ahurskai swahil 63979 Jun 12 13:26
topa_EPO.snt
-rw-r----- 1
ahurskai swahil 239631 Jun 10 16:21
seme_SEM.snt
-rw-r----- 1
ahurskai swahil 211412 Jun 10 16:21
tipp_TIP.snt
-rw-r----- 1
ahurskai swahil 348639 Jun 12 13:30
senk_USH.snt
All the files listed above are included in
the file:
-rw------- 1
ahurskai swahil 7563461 Jul 7 14:36 books-all
Here
is a bibliographical list of the books in 'books-all'. They are
located
so that books1 is followed by books2, etc. The tag is first on each
entry.
Contents
of the file books1:
<FAS> Makala za Semina ya Kimataifa ya
Waandishi wa Kiswahili III. Fasihi.
Taasisi ya Uchunguzi wa Kiswahili
Chuo Kikuu cha Dar-es-Salaam, 1983.
<GAM> Kezilahabi, Euphrase.
Gamba la nyoka. Eastern Africa Publications,
Dar-es-salaam, 1981
(1979).
<YAR> Lemki, Mark. Maskini Yarabi. East African Literature
Bureau,
Dar-es-salaam, 1976.
<LI-NYO> Liwenga, George. Nyota
ya Huzuni. Tanzania Publishing House,
Dar-es-salaam, 1981.
<MO-NYO>
Mohamed, Suleiman, Mohamed, 1976 (Reprinted 1983). Nyota ya Rehema.
Oxford
University Press, Nairobi.
<MU-NJO> Mung'ong'o, C.G., 1980. Njozi
Iliyopotea. Tanzania Publishing House,
Dar-es-salaam.
<LWI>
Mvungi, Martha, Mlagala. Lwidiko. Tanzania Publishing House,
Dar-es-Salaam.
Contents
of the file books2:
<UJA> Nyerere, Julius K., 1968
(Reprinted 1974). Ujamaa. Oxford University
Press, Dar-es-Salaam.
<NJO>
Seme, William B., 1972 (Reprinted 1975). Njozi za Usiku. Longman
Tanzania
Ltd, Nairobi.
<SH-INS> Robert, Shaaban, 1959 (Reprinted 1971).
Insha na Mashairi,
Diwani ya Shaaban 5. Thomas Nelson and Sons Ltd,
Dar-es-salaam.
<SH-KIE> Robert, Shaaban, 1966 (Reprinted 1989).
Kielezo Cha Insha.
Oxford University Press, Nairobi.
<SH-KUF>
Robert, Shaaban, 1991. Kufikirika. Mkuki na Nyota Pubishers,
Dar-es-Salaam.
<SH-KUS>
Robert, Shaaban, 1990 (First published 1951). Kusadikika
(Nchi Iliyo
Angani), Diwani ya Shaaban. Evans Brothers (Kenya) Limited,
Nairobi.
<SH-PAM> Robert, Shaaban, 1966, Pambo La Lugha. Oxford University
Press,
Nairobi.
<SH-SAN> Robert, Shaaban, 1972. Sanaa ya
Ushairi, Diwani ya Shaaban 7.
Nelson, Nairobi.
<SH-WAS>
Robert, Shaaban, 1991. Wasifu wa Siti binti Saad. Mkuki na Nyota
Publishers,
Dar-es-Salaam, 1991.
<TAT> Mohamed, Said A. Tata za Asumini. Longman
(Kenya).
Contents of the file books3:
<KIC>
Kezilahabi, E.N. 1974. Kichwamaji. Typography Ltd, Nairobi.
<MZI>
Kezilahabi E. 1991. Mzingile. Dar-es-Salaam University Press,
Dar-es-Salaam.
<NAG> Kezilahabi, E. 1990. Nagona. Dar-es-Salaam University Press,
Dar-es-Salaam.
<ROS> Kezilahabi, E. 1971. Rosa Mistika. East
African Litterature Bureau,
Nairobi, Kampala and Dar-es-Salaam.
<MAT>
Kibao, Salim A. 1975. Matatu ya Thamani. Heinemann Educational Books,
Nairobi.
<TWE>
Macha, Freddy, 1984. Twen' Zetu Ulaya ... na hadithi nyingine.
Grand Arts
Promotions.
<KUN> Mulokozi M.M. na Kahigi, K.K. 1979. Kunga za
Ushairi na Diwani Yetu.
Tanzania Publishing House, Dar-es-Salaam.
<PEP>
Yahya, Saad S. 1973. Pepeta. Kenya Litho Ltd, Nairobi.
Contents
of the file books4:
<FAR> Mushi J.S. Baada ya Dhiki,
Faraja. Tanzania Publishing House Limited,
Dar-es-Salaam.
<FED>
Lihamba, Amandina, Hawala ya fedha. Tanzania Publishing House,
Dar-es-Salaam.
<ADI>
Adili na Nduguze, kimeandikwa na Shaaban Robert, East African
Literature
Bureau, 1952
<HIS> Mbaabu, Ireri, 1991. Historia ya Usanifishaji wa
Kiswahili.
Longman, Nairobi.
<HUK> Ngahyoma, Ngalimecha,
1973. Huka. Tanzania Publishing House,
Dar-es-Salaam.
<KIN>
Hussein, Ebrahim N. 1969. Kinjeketile. Oxford University Press,
Dar-es-Salaam
na Nairobi.
<LUG> Khamisi, Abdu, Mtajuka (ed.), 1983. Makala za
Semina ya Kimataifa
ya Waandishi wa Kiswahili I, Lugha ya Kiswahili,
Kiswahili Ikiwa Ni Lugha
Ya Kimataifa. Taasisi ya Uchunguzi wa Kiswahili,
Chuo Kikuu cha Dar-es-Salaam.
<MAS> Hussein, Ebrahim N. Mashetani.
Oxford University Press, Nairobi na
Dar-es-Salaam.
<MKE>
Abdulla, Muhamed, Said, 1975. Mke mmoja waume watatu. East African
Publishing
House, Dar-es-Salaam.
<NGO> Ng'ombe akivunjika mguu ..., Michezo ya Kuigiza 2. Umeandikwa na
Wasichana
wa Matuga Girl's Secondary School. Longman (Kenya).
<EPO> Topan,
Farouk, 1977. Aliyeonja Pepo. Tanzania Publishing House,
Dar-es-Salaam.
<SEM>
Mimi si mutribu ...
<TIP> Maisha ya Hamed bin Muhammed el Murjebi
yaani Tippu Tip Kwa maneno
yake mwenyewe. Kimefasiriwa na W. H. Whitely.
East African Literature Bureau,
Kampala, Nairobi, Dar-es-Salaam.
<USH>
Senkoro, F.E.M.K. 1988. Ushairi. Dar-es-Salaam University Press,
Dar-es-Salaam.
Arvi.Hurskainen@helsinki.fi