README /books/

This file contains information about the BOOKS available in the

Helsinki Corpus of Swahili


The texts have been pre-processed to make the automatic processing of the
texts easier. Each sentence is placed on its own line. There is a tag between
angle brackets in the beginning of each line indicating the book in question.
Therefore, when retrieving information with a sentence as context, it is
possible to keep track of the book which the piece of text is from. In the
book list below such a tag precedes each book reference.

Individual texts of books listed below are concatenated together as a
single file 'books-all', in the order given below. To save space this file
may be in packed form. If this file is in packed form, its name is
books-all.gz. You may still use it without unpacking it by using the command
'zcat', which temporarily unpacks it for you. For example, for retrieving the
strings with 'mwalimu' you may write the command line this way:

            tuuri$ zcat books-all.gz | kw-alg ' mwalimu ' > results

The contents of file 'books-all' are also divided into four separate files with
file names 'books1', 'books2', 'books3' and 'books4'. These files may be packed
as well.

Each book used to be in a separate file before, but now the books have been
combined into the files mentioned above. The list of individual book
files is displayed here in order to give an idea of the size of each book.
By using the tag one can still search only particular books. For example, if
you want to use Kezilahabi's book 'Gamba la nyoka' as the source text, you can
use the following command:

            tuuri$ zcat books-all.gz | egrep '<GAM>' | kw-alg ' mwalimu ' > results

If you want to use Kezilahabi's all books as a source text, you may use the
following command:

            tuuri$ zcat books-all.gz | egrep '(<GAM>|<KIC>|<MZI>|<NAG>|<ROS>)' |
            kw-alg ' mwalimu ' > results



The file 'books1' contains the files (= books) listed below:
-rw-r-----   1 ahurskai swahil     572738 Apr 23  1993 fIII_FAS.snt
-rw-r-----   1 ahurskai swahil     288071 May  4  1995 kezi_GAM.snt
-rw-r-----   1 ahurskai swahil     247107 Apr 23  1993 lemk_YAR.snt
-rw-r-----   1 ahurskai swahil     358388 Apr 23  1993 LIwe_NYO.snt
-rw-r-----   1 ahurskai swahil     271999 Dec  7  1992 MOha_NYO.snt
-rw-r-----   1 ahurskai swahil     333618 Apr 23  1993 MUng_NJO.snt
-rw-r-----   1 ahurskai swahil     224533 Apr 23  1993 mvun_LWI.snt


The file 'books2' contains the files (= books) listed below:

-rw-r-----   1 ahurskai swahil     345040 Apr 23  1993 nyer_UJA.snt
-rw-r-----   1 ahurskai swahil     235636 Apr 23  1993 seme_NJO.snt
-rw-r-----   1 ahurskai swahil     157312 Apr 23  1993 SHaa_INS.snt
-rw-r-----   1 ahurskai swahil     201195 Apr 23  1993 SHaa_KIE.snt
-rw-r-----   1 ahurskai swahil      95846 Apr 23  1993 SHaa_KUF.snt
-rw-r-----   1 ahurskai swahil     105114 Apr 23  1993 SHaa_KUS.snt
-rw-r-----   1 ahurskai swahil      72041 Apr 23  1993 SHaa_PAM.snt
-rw-r-----   1 ahurskai swahil      67839 Apr 23  1993 SHaa_SAN.snt
-rw-r-----   1 ahurskai swahil     104498 Apr 23  1993 SHaa_WAS.snt
-rw-r-----   1 ahurskai swahil     354278 Jun 13 14:42 moha_TAT.snt


The file 'books3' contains the files (= books) listed below:

-rw-r-----   1 ahurskai swahil     352030 Oct 12  1993 kezi_KIC.snt
-rw-r-----   1 ahurskai swahil     193878 Oct 12  1993 kezi_MZI.snt
-rw-r-----   1 ahurskai swahil     162681 Oct 12  1993 kezi_NAG.snt
-rw-r-----   1 ahurskai swahil     222015 Oct 12  1993 kezi_ROS.snt
-rw-r-----   1 ahurskai swahil      92598 Oct 12  1993 kiba_MAT.snt
-rw-r-----   1 ahurskai swahil      94273 Jun 10 16:21 mach_TWE.snt
-rw-r-----   1 ahurskai swahil     176096 Oct 12  1993 mulo_KUN.snt
-rw-r-----   1 ahurskai swahil     122118 Oct 12  1993 yahy_PEP.snt


The file 'books4' contains the files (= books) listed below:

-rw-r-----   1 ahurskai swahil      85851 Jun 10 16:21 mush_FAR.snt
-rw-r-----   1 ahurskai swahil      86405 Jun 10 16:21 liha_FED.snt
-rw-r-----   1 ahurskai swahil      92869 Mar 24 11:21 shaa-adi.snt
-rw-r-----   1 ahurskai swahil     333550 Jun 12 13:21 mbaa_HIS.snt
-rw-r-----   1 ahurskai swahil      61695 Jun 12 12:55 ngah_HUK.snt
-rw-r-----   1 ahurskai swahil      84379 Jun 12 12:56 huss_KIN.snt
-rw-r-----   1 ahurskai swahil     172953 Jun 12 13:00 kham_LUG.snt
-rw-r-----   1 ahurskai swahil      93179 Jun 10 16:21 huss_MAS.snt
-rw-r-----   1 ahurskai swahil      87615 Jun 12 13:24 abdu_MKE.snt
-rw-r-----   1 ahurskai swahil      26683 Jun 10 16:21 ngom_NGO.snt
-rw-r-----   1 ahurskai swahil      63979 Jun 12 13:26 topa_EPO.snt
-rw-r-----   1 ahurskai swahil     239631 Jun 10 16:21 seme_SEM.snt
-rw-r-----   1 ahurskai swahil     211412 Jun 10 16:21 tipp_TIP.snt
-rw-r-----   1 ahurskai swahil     348639 Jun 12 13:30 senk_USH.snt



All the files listed above are included in the file:

-rw-------   1 ahurskai swahil  7563461 Jul  7 14:36 books-all



Here is a bibliographical list of the books in 'books-all'. They are
located so that books1 is followed by books2, etc. The tag is first on each
entry.


Contents of the file books1:

<FAS> Makala za Semina ya Kimataifa ya Waandishi wa Kiswahili III. Fasihi.
Taasisi ya Uchunguzi wa Kiswahili Chuo Kikuu cha Dar-es-Salaam, 1983.
<GAM> Kezilahabi, Euphrase. Gamba la nyoka. Eastern Africa Publications,
Dar-es-salaam, 1981 (1979).
<YAR> Lemki, Mark. Maskini Yarabi. East African Literature Bureau,
Dar-es-salaam, 1976.
<LI-NYO> Liwenga, George. Nyota ya Huzuni. Tanzania Publishing House,
Dar-es-salaam, 1981.
<MO-NYO> Mohamed, Suleiman, Mohamed, 1976 (Reprinted 1983). Nyota ya Rehema.
Oxford University Press, Nairobi.
<MU-NJO> Mung'ong'o, C.G., 1980. Njozi Iliyopotea. Tanzania Publishing House,
Dar-es-salaam.
<LWI> Mvungi, Martha, Mlagala. Lwidiko. Tanzania Publishing House,
Dar-es-Salaam.


Contents of the file books2:

<UJA> Nyerere, Julius K., 1968 (Reprinted 1974). Ujamaa. Oxford University
Press, Dar-es-Salaam.
<NJO> Seme, William B., 1972 (Reprinted 1975). Njozi za Usiku. Longman
Tanzania Ltd, Nairobi.
<SH-INS> Robert, Shaaban, 1959 (Reprinted 1971). Insha na Mashairi,
Diwani ya Shaaban 5. Thomas Nelson and Sons Ltd, Dar-es-salaam.
<SH-KIE> Robert, Shaaban, 1966 (Reprinted 1989). Kielezo Cha Insha.
Oxford University Press, Nairobi.
<SH-KUF> Robert, Shaaban, 1991. Kufikirika. Mkuki na Nyota Pubishers,
Dar-es-Salaam.
<SH-KUS> Robert, Shaaban, 1990 (First published 1951). Kusadikika
(Nchi Iliyo Angani), Diwani ya Shaaban. Evans Brothers (Kenya) Limited,
Nairobi.
<SH-PAM> Robert, Shaaban, 1966, Pambo La Lugha. Oxford University Press,
Nairobi.
<SH-SAN> Robert, Shaaban, 1972. Sanaa ya Ushairi, Diwani ya Shaaban 7.
Nelson, Nairobi.
<SH-WAS> Robert, Shaaban, 1991. Wasifu wa Siti binti Saad. Mkuki na Nyota
Publishers, Dar-es-Salaam, 1991.
<TAT> Mohamed, Said A. Tata za Asumini. Longman (Kenya).


Contents of the file books3:

<KIC> Kezilahabi, E.N. 1974. Kichwamaji. Typography Ltd, Nairobi.
<MZI> Kezilahabi E. 1991. Mzingile. Dar-es-Salaam University Press,
Dar-es-Salaam.
<NAG> Kezilahabi, E. 1990. Nagona. Dar-es-Salaam University Press,
Dar-es-Salaam.
<ROS> Kezilahabi, E. 1971. Rosa Mistika. East African Litterature Bureau,
Nairobi, Kampala and Dar-es-Salaam.
<MAT> Kibao, Salim A. 1975. Matatu ya Thamani. Heinemann Educational Books,
Nairobi.
<TWE> Macha, Freddy, 1984. Twen' Zetu Ulaya ... na hadithi nyingine.
Grand Arts Promotions.
<KUN> Mulokozi M.M. na Kahigi, K.K. 1979. Kunga za Ushairi na Diwani Yetu.
Tanzania Publishing House, Dar-es-Salaam.
<PEP> Yahya, Saad S. 1973. Pepeta. Kenya Litho Ltd, Nairobi.


Contents of the file books4:

<FAR> Mushi J.S. Baada ya Dhiki, Faraja. Tanzania Publishing House Limited,
Dar-es-Salaam.
<FED> Lihamba, Amandina, Hawala ya fedha. Tanzania Publishing House,
Dar-es-Salaam.
<ADI> Adili na Nduguze, kimeandikwa na Shaaban Robert, East African
Literature Bureau, 1952
<HIS> Mbaabu, Ireri, 1991. Historia ya Usanifishaji wa Kiswahili.
Longman, Nairobi.
<HUK> Ngahyoma, Ngalimecha, 1973. Huka. Tanzania Publishing House,
Dar-es-Salaam.
<KIN> Hussein, Ebrahim N. 1969. Kinjeketile. Oxford University Press,
Dar-es-Salaam na Nairobi.
<LUG> Khamisi, Abdu, Mtajuka (ed.), 1983. Makala za Semina ya Kimataifa
ya Waandishi wa Kiswahili I, Lugha ya Kiswahili, Kiswahili Ikiwa Ni Lugha
Ya Kimataifa. Taasisi ya Uchunguzi wa Kiswahili, Chuo Kikuu cha Dar-es-Salaam.
<MAS> Hussein, Ebrahim N. Mashetani. Oxford University Press, Nairobi na
Dar-es-Salaam.
<MKE> Abdulla, Muhamed, Said, 1975. Mke mmoja waume watatu. East African
Publishing House, Dar-es-Salaam.
<NGO> Ng'ombe akivunjika mguu ...,  Michezo ya Kuigiza 2. Umeandikwa na
Wasichana wa Matuga Girl's Secondary School. Longman (Kenya).
<EPO> Topan, Farouk, 1977. Aliyeonja Pepo. Tanzania Publishing House,
Dar-es-Salaam.
<SEM> Mimi si mutribu ...
<TIP> Maisha ya Hamed bin Muhammed el Murjebi yaani Tippu Tip Kwa maneno
yake mwenyewe. Kimefasiriwa na W. H. Whitely. East African Literature Bureau,
Kampala, Nairobi, Dar-es-Salaam.
<USH> Senkoro, F.E.M.K. 1988. Ushairi. Dar-es-Salaam University Press,
Dar-es-Salaam.

Arvi.Hurskainen@helsinki.fi